Unregistering Managed State in Operator Backend

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Unregistering Managed State in Operator Backend

Paris Carbone
Hi folks,

I have a little question regarding the managed store operator backend, in case someone can help.

Is there some convenient way (planned or under development) to completely unregister a state entry (e.g. a ListState) with a given id from the backend?
It is fairly easy to register new states dynamically (i.e. with getOperatorState(…)), why not being able to discard it as well?

I would find this feature extremely convenient to a fault tolerance related PR I am working on but I can think of many use cases that might need it.


Paris

Reply | Threaded
Open this post in threaded view
|

Re: Unregistering Managed State in Operator Backend

Ufuk Celebi-2
Hey Paris!

As far as I know it's not possible at the moment and not planned. Does
not sound to hard to add though. @Stefan: correct?

You can currently only clear the state via #clear in the scope of the
key for keyed state or the whole operator when used with operator
state. In case of keyed state it's indeed hard to clear all state for
operator state it's slightly better. I'm curious what your use case
is?

– Ufuk


On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]> wrote:

> Hi folks,
>
> I have a little question regarding the managed store operator backend, in case someone can help.
>
> Is there some convenient way (planned or under development) to completely unregister a state entry (e.g. a ListState) with a given id from the backend?
> It is fairly easy to register new states dynamically (i.e. with getOperatorState(…)), why not being able to discard it as well?
>
> I would find this feature extremely convenient to a fault tolerance related PR I am working on but I can think of many use cases that might need it.
>
>
> Paris
>
Reply | Threaded
Open this post in threaded view
|

Re: Unregistering Managed State in Operator Backend

Paris Carbone
Thank you for the answer Ufuk!

To elaborate a bit more, I am not using keyed state, it would be indeed tricky in that case to discard everything.

I need that for operator state, in my loop fault tolerance PR [1].  The idea is to tag a ListState (upstream log) per snapshot id.
When a concurent snapshot is commited I want to simply remove everything related to that ListState (not just clear it). This would also eliminate a memory leak in case many empty logs accumulate in time (and thus state entries).
Hope that makes it a bit more clear. Thanks again :)

Paris

[1] https://github.com/apache/flink/pull/1668


On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:[hidden email]>> wrote:

Hey Paris!

As far as I know it's not possible at the moment and not planned. Does
not sound to hard to add though. @Stefan: correct?

You can currently only clear the state via #clear in the scope of the
key for keyed state or the whole operator when used with operator
state. In case of keyed state it's indeed hard to clear all state for
operator state it's slightly better. I'm curious what your use case
is?

– Ufuk


On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto:[hidden email]>> wrote:
Hi folks,

I have a little question regarding the managed store operator backend, in case someone can help.

Is there some convenient way (planned or under development) to completely unregister a state entry (e.g. a ListState) with a given id from the backend?
It is fairly easy to register new states dynamically (i.e. with getOperatorState(…)), why not being able to discard it as well?

I would find this feature extremely convenient to a fault tolerance related PR I am working on but I can think of many use cases that might need it.


Paris


Reply | Threaded
Open this post in threaded view
|

Re: Unregistering Managed State in Operator Backend

Paris Carbone
Any thoughts/plans?
So should I open a Jira and add this?

Paris

On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto:[hidden email]>> wrote:

Thank you for the answer Ufuk!

To elaborate a bit more, I am not using keyed state, it would be indeed tricky in that case to discard everything.

I need that for operator state, in my loop fault tolerance PR [1].  The idea is to tag a ListState (upstream log) per snapshot id.
When a concurent snapshot is commited I want to simply remove everything related to that ListState (not just clear it). This would also eliminate a memory leak in case many empty logs accumulate in time (and thus state entries).
Hope that makes it a bit more clear. Thanks again :)

Paris

[1] https://github.com/apache/flink/pull/1668


On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:[hidden email]>> wrote:

Hey Paris!

As far as I know it's not possible at the moment and not planned. Does
not sound to hard to add though. @Stefan: correct?

You can currently only clear the state via #clear in the scope of the
key for keyed state or the whole operator when used with operator
state. In case of keyed state it's indeed hard to clear all state for
operator state it's slightly better. I'm curious what your use case
is?

- Ufuk


On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto:[hidden email]>> wrote:
Hi folks,

I have a little question regarding the managed store operator backend, in case someone can help.

Is there some convenient way (planned or under development) to completely unregister a state entry (e.g. a ListState) with a given id from the backend?
It is fairly easy to register new states dynamically (i.e. with getOperatorState(...)), why not being able to discard it as well?

I would find this feature extremely convenient to a fault tolerance related PR I am working on but I can think of many use cases that might need it.


Paris


Reply | Threaded
Open this post in threaded view
|

Re: Unregistering Managed State in Operator Backend

Till Rohrmann
Hi Paris,

if there is no such issue open, then please open one so that we can track
the issue. If you have time to work on that even better :-)

Cheers,
Till

On Tue, Jan 24, 2017 at 10:25 AM, Paris Carbone <[hidden email]> wrote:

> Any thoughts/plans?
> So should I open a Jira and add this?
>
> Paris
>
> On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto:parisc@
> kth.se>> wrote:
>
> Thank you for the answer Ufuk!
>
> To elaborate a bit more, I am not using keyed state, it would be indeed
> tricky in that case to discard everything.
>
> I need that for operator state, in my loop fault tolerance PR [1].  The
> idea is to tag a ListState (upstream log) per snapshot id.
> When a concurent snapshot is commited I want to simply remove everything
> related to that ListState (not just clear it). This would also eliminate a
> memory leak in case many empty logs accumulate in time (and thus state
> entries).
> Hope that makes it a bit more clear. Thanks again :)
>
> Paris
>
> [1] https://github.com/apache/flink/pull/1668
>
>
> On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:uce@
> apache.org>> wrote:
>
> Hey Paris!
>
> As far as I know it's not possible at the moment and not planned. Does
> not sound to hard to add though. @Stefan: correct?
>
> You can currently only clear the state via #clear in the scope of the
> key for keyed state or the whole operator when used with operator
> state. In case of keyed state it's indeed hard to clear all state for
> operator state it's slightly better. I'm curious what your use case
> is?
>
> - Ufuk
>
>
> On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto:
> [hidden email]>> wrote:
> Hi folks,
>
> I have a little question regarding the managed store operator backend, in
> case someone can help.
>
> Is there some convenient way (planned or under development) to completely
> unregister a state entry (e.g. a ListState) with a given id from the
> backend?
> It is fairly easy to register new states dynamically (i.e. with
> getOperatorState(...)), why not being able to discard it as well?
>
> I would find this feature extremely convenient to a fault tolerance
> related PR I am working on but I can think of many use cases that might
> need it.
>
>
> Paris
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Unregistering Managed State in Operator Backend

Paris Carbone-3
Sure Till,

I would love to also make the patch but need to prioritize some other
things these days.
At least I will dig and see how complex this is regarding the different
backends.

I also have some follow-up questions, in case anybody has thought about
these things already (or is simply interested):

- Do you think it would make sense to automatically garbage collect empty
states in general?
- Shouldn't this happen already during snapshot compaction (in rocksdb) and
would that violate any user assumptions in your view?

On Tue, Jan 24, 2017 at 11:44 AM, Till Rohrmann <[hidden email]>
wrote:

> Hi Paris,
>
> if there is no such issue open, then please open one so that we can track
> the issue. If you have time to work on that even better :-)
>
> Cheers,
> Till
>
> On Tue, Jan 24, 2017 at 10:25 AM, Paris Carbone <[hidden email]> wrote:
>
> > Any thoughts/plans?
> > So should I open a Jira and add this?
> >
> > Paris
> >
> > On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto:parisc@
> > kth.se>> wrote:
> >
> > Thank you for the answer Ufuk!
> >
> > To elaborate a bit more, I am not using keyed state, it would be indeed
> > tricky in that case to discard everything.
> >
> > I need that for operator state, in my loop fault tolerance PR [1].  The
> > idea is to tag a ListState (upstream log) per snapshot id.
> > When a concurent snapshot is commited I want to simply remove everything
> > related to that ListState (not just clear it). This would also eliminate
> a
> > memory leak in case many empty logs accumulate in time (and thus state
> > entries).
> > Hope that makes it a bit more clear. Thanks again :)
> >
> > Paris
> >
> > [1] https://github.com/apache/flink/pull/1668
> >
> >
> > On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:uce@
> > apache.org>> wrote:
> >
> > Hey Paris!
> >
> > As far as I know it's not possible at the moment and not planned. Does
> > not sound to hard to add though. @Stefan: correct?
> >
> > You can currently only clear the state via #clear in the scope of the
> > key for keyed state or the whole operator when used with operator
> > state. In case of keyed state it's indeed hard to clear all state for
> > operator state it's slightly better. I'm curious what your use case
> > is?
> >
> > - Ufuk
> >
> >
> > On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto:
> > [hidden email]>> wrote:
> > Hi folks,
> >
> > I have a little question regarding the managed store operator backend, in
> > case someone can help.
> >
> > Is there some convenient way (planned or under development) to completely
> > unregister a state entry (e.g. a ListState) with a given id from the
> > backend?
> > It is fairly easy to register new states dynamically (i.e. with
> > getOperatorState(...)), why not being able to discard it as well?
> >
> > I would find this feature extremely convenient to a fault tolerance
> > related PR I am working on but I can think of many use cases that might
> > need it.
> >
> >
> > Paris
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Unregistering Managed State in Operator Backend

Paris Carbone
In reply to this post by Till Rohrmann
Sure Till,

I would love to also make the patch but need to prioritize some other things these days.
At least I will dig and see how complex this is regarding the different backends.
 
I also have some follow-up questions, in case anybody has thought about these things already (or is simply interested):
 
- Do you think it would make sense to automatically garbage collect empty states in general?
- Shouldn't this happen already during snapshot compaction (in rocksdb) and would that violate any user assumptions in your view?


> On 24 Jan 2017, at 11:44, Till Rohrmann <[hidden email]> wrote:
>
> Hi Paris,
>
> if there is no such issue open, then please open one so that we can track
> the issue. If you have time to work on that even better :-)
>
> Cheers,
> Till
>
> On Tue, Jan 24, 2017 at 10:25 AM, Paris Carbone <[hidden email]> wrote:
>
>> Any thoughts/plans?
>> So should I open a Jira and add this?
>>
>> Paris
>>
>> On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto:parisc@
>> kth.se>> wrote:
>>
>> Thank you for the answer Ufuk!
>>
>> To elaborate a bit more, I am not using keyed state, it would be indeed
>> tricky in that case to discard everything.
>>
>> I need that for operator state, in my loop fault tolerance PR [1].  The
>> idea is to tag a ListState (upstream log) per snapshot id.
>> When a concurent snapshot is commited I want to simply remove everything
>> related to that ListState (not just clear it). This would also eliminate a
>> memory leak in case many empty logs accumulate in time (and thus state
>> entries).
>> Hope that makes it a bit more clear. Thanks again :)
>>
>> Paris
>>
>> [1] https://github.com/apache/flink/pull/1668
>>
>>
>> On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:uce@
>> apache.org>> wrote:
>>
>> Hey Paris!
>>
>> As far as I know it's not possible at the moment and not planned. Does
>> not sound to hard to add though. @Stefan: correct?
>>
>> You can currently only clear the state via #clear in the scope of the
>> key for keyed state or the whole operator when used with operator
>> state. In case of keyed state it's indeed hard to clear all state for
>> operator state it's slightly better. I'm curious what your use case
>> is?
>>
>> - Ufuk
>>
>>
>> On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto:
>> [hidden email]>> wrote:
>> Hi folks,
>>
>> I have a little question regarding the managed store operator backend, in
>> case someone can help.
>>
>> Is there some convenient way (planned or under development) to completely
>> unregister a state entry (e.g. a ListState) with a given id from the
>> backend?
>> It is fairly easy to register new states dynamically (i.e. with
>> getOperatorState(...)), why not being able to discard it as well?
>>
>> I would find this feature extremely convenient to a fault tolerance
>> related PR I am working on but I can think of many use cases that might
>> need it.
>>
>>
>> Paris
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Unregistering Managed State in Operator Backend

Aljoscha Krettek-2
Just a bit of clarification, the OperatorState stuff is independent of
keyed state backends, i.e. even if you use RocksDB the operator state will
not be stored in RocksDB, only keyed state is stored there.

Right now, when an operator state (ListState) is empty we will still write
some meta data about that state. I think it should be easy to
change DefaultOperatorStateBackend to not write anything in case of an
empty state. What do you think, Stefan?

On Tue, 24 Jan 2017 at 12:12 Paris Carbone <[hidden email]> wrote:

> Sure Till,
>
> I would love to also make the patch but need to prioritize some other
> things these days.
> At least I will dig and see how complex this is regarding the different
> backends.
>
> I also have some follow-up questions, in case anybody has thought about
> these things already (or is simply interested):
>
> - Do you think it would make sense to automatically garbage collect empty
> states in general?
> - Shouldn't this happen already during snapshot compaction (in rocksdb)
> and would that violate any user assumptions in your view?
>
>
> > On 24 Jan 2017, at 11:44, Till Rohrmann <[hidden email]> wrote:
> >
> > Hi Paris,
> >
> > if there is no such issue open, then please open one so that we can track
> > the issue. If you have time to work on that even better :-)
> >
> > Cheers,
> > Till
> >
> > On Tue, Jan 24, 2017 at 10:25 AM, Paris Carbone <[hidden email]> wrote:
> >
> >> Any thoughts/plans?
> >> So should I open a Jira and add this?
> >>
> >> Paris
> >>
> >> On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto:
> parisc@
> >> kth.se>> wrote:
> >>
> >> Thank you for the answer Ufuk!
> >>
> >> To elaborate a bit more, I am not using keyed state, it would be indeed
> >> tricky in that case to discard everything.
> >>
> >> I need that for operator state, in my loop fault tolerance PR [1].  The
> >> idea is to tag a ListState (upstream log) per snapshot id.
> >> When a concurent snapshot is commited I want to simply remove everything
> >> related to that ListState (not just clear it). This would also
> eliminate a
> >> memory leak in case many empty logs accumulate in time (and thus state
> >> entries).
> >> Hope that makes it a bit more clear. Thanks again :)
> >>
> >> Paris
> >>
> >> [1] https://github.com/apache/flink/pull/1668
> >>
> >>
> >> On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:uce@
> >> apache.org>> wrote:
> >>
> >> Hey Paris!
> >>
> >> As far as I know it's not possible at the moment and not planned. Does
> >> not sound to hard to add though. @Stefan: correct?
> >>
> >> You can currently only clear the state via #clear in the scope of the
> >> key for keyed state or the whole operator when used with operator
> >> state. In case of keyed state it's indeed hard to clear all state for
> >> operator state it's slightly better. I'm curious what your use case
> >> is?
> >>
> >> - Ufuk
> >>
> >>
> >> On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto:
> >> [hidden email]>> wrote:
> >> Hi folks,
> >>
> >> I have a little question regarding the managed store operator backend,
> in
> >> case someone can help.
> >>
> >> Is there some convenient way (planned or under development) to
> completely
> >> unregister a state entry (e.g. a ListState) with a given id from the
> >> backend?
> >> It is fairly easy to register new states dynamically (i.e. with
> >> getOperatorState(...)), why not being able to discard it as well?
> >>
> >> I would find this feature extremely convenient to a fault tolerance
> >> related PR I am working on but I can think of many use cases that might
> >> need it.
> >>
> >>
> >> Paris
> >>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Unregistering Managed State in Operator Backend

Paris Carbone
Indeed, I noticed that now. Then it should be fairly simple, if you find it reasonable too.

> On 24 Jan 2017, at 14:20, Aljoscha Krettek <[hidden email]> wrote:
>
> Just a bit of clarification, the OperatorState stuff is independent of
> keyed state backends, i.e. even if you use RocksDB the operator state will
> not be stored in RocksDB, only keyed state is stored there.
>
> Right now, when an operator state (ListState) is empty we will still write
> some meta data about that state. I think it should be easy to
> change DefaultOperatorStateBackend to not write anything in case of an
> empty state. What do you think, Stefan?
>
> On Tue, 24 Jan 2017 at 12:12 Paris Carbone <[hidden email]> wrote:
>
>> Sure Till,
>>
>> I would love to also make the patch but need to prioritize some other
>> things these days.
>> At least I will dig and see how complex this is regarding the different
>> backends.
>>
>> I also have some follow-up questions, in case anybody has thought about
>> these things already (or is simply interested):
>>
>> - Do you think it would make sense to automatically garbage collect empty
>> states in general?
>> - Shouldn't this happen already during snapshot compaction (in rocksdb)
>> and would that violate any user assumptions in your view?
>>
>>
>>> On 24 Jan 2017, at 11:44, Till Rohrmann <[hidden email]> wrote:
>>>
>>> Hi Paris,
>>>
>>> if there is no such issue open, then please open one so that we can track
>>> the issue. If you have time to work on that even better :-)
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Jan 24, 2017 at 10:25 AM, Paris Carbone <[hidden email]> wrote:
>>>
>>>> Any thoughts/plans?
>>>> So should I open a Jira and add this?
>>>>
>>>> Paris
>>>>
>>>> On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto:
>> parisc@
>>>> kth.se>> wrote:
>>>>
>>>> Thank you for the answer Ufuk!
>>>>
>>>> To elaborate a bit more, I am not using keyed state, it would be indeed
>>>> tricky in that case to discard everything.
>>>>
>>>> I need that for operator state, in my loop fault tolerance PR [1].  The
>>>> idea is to tag a ListState (upstream log) per snapshot id.
>>>> When a concurent snapshot is commited I want to simply remove everything
>>>> related to that ListState (not just clear it). This would also
>> eliminate a
>>>> memory leak in case many empty logs accumulate in time (and thus state
>>>> entries).
>>>> Hope that makes it a bit more clear. Thanks again :)
>>>>
>>>> Paris
>>>>
>>>> [1] https://github.com/apache/flink/pull/1668
>>>>
>>>>
>>>> On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:uce@
>>>> apache.org>> wrote:
>>>>
>>>> Hey Paris!
>>>>
>>>> As far as I know it's not possible at the moment and not planned. Does
>>>> not sound to hard to add though. @Stefan: correct?
>>>>
>>>> You can currently only clear the state via #clear in the scope of the
>>>> key for keyed state or the whole operator when used with operator
>>>> state. In case of keyed state it's indeed hard to clear all state for
>>>> operator state it's slightly better. I'm curious what your use case
>>>> is?
>>>>
>>>> - Ufuk
>>>>
>>>>
>>>> On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto:
>>>> [hidden email]>> wrote:
>>>> Hi folks,
>>>>
>>>> I have a little question regarding the managed store operator backend,
>> in
>>>> case someone can help.
>>>>
>>>> Is there some convenient way (planned or under development) to
>> completely
>>>> unregister a state entry (e.g. a ListState) with a given id from the
>>>> backend?
>>>> It is fairly easy to register new states dynamically (i.e. with
>>>> getOperatorState(...)), why not being able to discard it as well?
>>>>
>>>> I would find this feature extremely convenient to a fault tolerance
>>>> related PR I am working on but I can think of many use cases that might
>>>> need it.
>>>>
>>>>
>>>> Paris
>>>>
>>>>
>>>>
>>
>>