Inconsistent behavior between different states with ListState.get().iterator().remove()

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistent behavior between different states with ListState.get().iterator().remove()

Ken Krugler
Hi devs,

If you use the FsStateBackend (or MemoryStateBackend), and you have ListState, then you can get an iterator and remove() an entry, and it all works as expected.

If you use the RocksDBStateBackend, the remove() call doesn’t throw an exception, but the ListState isn’t updated.

Seems like either you should get an exception w/the remove() call, or the operation should work as expected.

I see https://issues.apache.org/jira/browse/FLINK-5651 <https://issues.apache.org/jira/browse/FLINK-5651>, though that seems only to be talking about FsStateBackend/MemoryStateBackend.

And I don’t understand the comment on that issue: "Actually, it can be fine to use Iterator#remove() as long as the user does not reply on these changes in the backing store”.

Thanks,

— Ken

PS - I understand there are many reasons to not remove arbitrary elements from a ListState when using RocksDB (serde cost for entire list), so I’d be in favor of the remove() call throwing an exception, at least with RocksDB.

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior between different states with ListState.get().iterator().remove()

Aljoscha Krettek-2
That is a very good observation!

In an ideal world, I would say we disallow #remove() because we cannot
efficiently implement it for RocksDB and we should keep the behaviour
consistent between the backends. Now that we already have the
functionality for the heap-based backends I think we cannot remove it
because some users might have come to rely on it.

The next best thing would be throwing an exception for RocksDB to at
least not silently ignore ineffective #remove() calls.

Best,
Aljoscha

On 23.07.20 20:40, Ken Krugler wrote:

> Hi devs,
>
> If you use the FsStateBackend (or MemoryStateBackend), and you have ListState, then you can get an iterator and remove() an entry, and it all works as expected.
>
> If you use the RocksDBStateBackend, the remove() call doesn’t throw an exception, but the ListState isn’t updated.
>
> Seems like either you should get an exception w/the remove() call, or the operation should work as expected.
>
> I see https://issues.apache.org/jira/browse/FLINK-5651 <https://issues.apache.org/jira/browse/FLINK-5651>, though that seems only to be talking about FsStateBackend/MemoryStateBackend.
>
> And I don’t understand the comment on that issue: "Actually, it can be fine to use Iterator#remove() as long as the user does not reply on these changes in the backing store”.
>
> Thanks,
>
> — Ken
>
> PS - I understand there are many reasons to not remove arbitrary elements from a ListState when using RocksDB (serde cost for entire list), so I’d be in favor of the remove() call throwing an exception, at least with RocksDB.
>
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent behavior between different states with ListState.get().iterator().remove()

Ken Krugler
Thanks for the response, filed https://issues.apache.org/jira/browse/FLINK-18707 <https://issues.apache.org/jira/browse/FLINK-18707> as a minor bug.

— Ken

> On Jul 24, 2020, at 2:03 AM, Aljoscha Krettek <[hidden email]> wrote:
>
> That is a very good observation!
>
> In an ideal world, I would say we disallow #remove() because we cannot efficiently implement it for RocksDB and we should keep the behaviour consistent between the backends. Now that we already have the functionality for the heap-based backends I think we cannot remove it because some users might have come to rely on it.
>
> The next best thing would be throwing an exception for RocksDB to at least not silently ignore ineffective #remove() calls.
>
> Best,
> Aljoscha
>
> On 23.07.20 20:40, Ken Krugler wrote:
>> Hi devs,
>> If you use the FsStateBackend (or MemoryStateBackend), and you have ListState, then you can get an iterator and remove() an entry, and it all works as expected.
>> If you use the RocksDBStateBackend, the remove() call doesn’t throw an exception, but the ListState isn’t updated.
>> Seems like either you should get an exception w/the remove() call, or the operation should work as expected.
>> I see https://issues.apache.org/jira/browse/FLINK-5651 <https://issues.apache.org/jira/browse/FLINK-5651>, though that seems only to be talking about FsStateBackend/MemoryStateBackend.
>> And I don’t understand the comment on that issue: "Actually, it can be fine to use Iterator#remove() as long as the user does not reply on these changes in the backing store”.
>> Thanks,
>> — Ken
>> PS - I understand there are many reasons to not remove arbitrary elements from a ListState when using RocksDB (serde cost for entire list), so I’d be in favor of the remove() call throwing an exception, at least with RocksDB.

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr