Hi folks,
I have a little question regarding the managed store operator backend, in case someone can help. Is there some convenient way (planned or under development) to completely unregister a state entry (e.g. a ListState) with a given id from the backend? It is fairly easy to register new states dynamically (i.e. with getOperatorState(…)), why not being able to discard it as well? I would find this feature extremely convenient to a fault tolerance related PR I am working on but I can think of many use cases that might need it. Paris |
Hey Paris!
As far as I know it's not possible at the moment and not planned. Does not sound to hard to add though. @Stefan: correct? You can currently only clear the state via #clear in the scope of the key for keyed state or the whole operator when used with operator state. In case of keyed state it's indeed hard to clear all state for operator state it's slightly better. I'm curious what your use case is? – Ufuk On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]> wrote: > Hi folks, > > I have a little question regarding the managed store operator backend, in case someone can help. > > Is there some convenient way (planned or under development) to completely unregister a state entry (e.g. a ListState) with a given id from the backend? > It is fairly easy to register new states dynamically (i.e. with getOperatorState(…)), why not being able to discard it as well? > > I would find this feature extremely convenient to a fault tolerance related PR I am working on but I can think of many use cases that might need it. > > > Paris > |
Thank you for the answer Ufuk!
To elaborate a bit more, I am not using keyed state, it would be indeed tricky in that case to discard everything. I need that for operator state, in my loop fault tolerance PR [1]. The idea is to tag a ListState (upstream log) per snapshot id. When a concurent snapshot is commited I want to simply remove everything related to that ListState (not just clear it). This would also eliminate a memory leak in case many empty logs accumulate in time (and thus state entries). Hope that makes it a bit more clear. Thanks again :) Paris [1] https://github.com/apache/flink/pull/1668 On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:[hidden email]>> wrote: Hey Paris! As far as I know it's not possible at the moment and not planned. Does not sound to hard to add though. @Stefan: correct? You can currently only clear the state via #clear in the scope of the key for keyed state or the whole operator when used with operator state. In case of keyed state it's indeed hard to clear all state for operator state it's slightly better. I'm curious what your use case is? – Ufuk On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto:[hidden email]>> wrote: Hi folks, I have a little question regarding the managed store operator backend, in case someone can help. Is there some convenient way (planned or under development) to completely unregister a state entry (e.g. a ListState) with a given id from the backend? It is fairly easy to register new states dynamically (i.e. with getOperatorState(…)), why not being able to discard it as well? I would find this feature extremely convenient to a fault tolerance related PR I am working on but I can think of many use cases that might need it. Paris |
Any thoughts/plans?
So should I open a Jira and add this? Paris On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto:[hidden email]>> wrote: Thank you for the answer Ufuk! To elaborate a bit more, I am not using keyed state, it would be indeed tricky in that case to discard everything. I need that for operator state, in my loop fault tolerance PR [1]. The idea is to tag a ListState (upstream log) per snapshot id. When a concurent snapshot is commited I want to simply remove everything related to that ListState (not just clear it). This would also eliminate a memory leak in case many empty logs accumulate in time (and thus state entries). Hope that makes it a bit more clear. Thanks again :) Paris [1] https://github.com/apache/flink/pull/1668 On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:[hidden email]>> wrote: Hey Paris! As far as I know it's not possible at the moment and not planned. Does not sound to hard to add though. @Stefan: correct? You can currently only clear the state via #clear in the scope of the key for keyed state or the whole operator when used with operator state. In case of keyed state it's indeed hard to clear all state for operator state it's slightly better. I'm curious what your use case is? - Ufuk On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto:[hidden email]>> wrote: Hi folks, I have a little question regarding the managed store operator backend, in case someone can help. Is there some convenient way (planned or under development) to completely unregister a state entry (e.g. a ListState) with a given id from the backend? It is fairly easy to register new states dynamically (i.e. with getOperatorState(...)), why not being able to discard it as well? I would find this feature extremely convenient to a fault tolerance related PR I am working on but I can think of many use cases that might need it. Paris |
Hi Paris,
if there is no such issue open, then please open one so that we can track the issue. If you have time to work on that even better :-) Cheers, Till On Tue, Jan 24, 2017 at 10:25 AM, Paris Carbone <[hidden email]> wrote: > Any thoughts/plans? > So should I open a Jira and add this? > > Paris > > On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto:parisc@ > kth.se>> wrote: > > Thank you for the answer Ufuk! > > To elaborate a bit more, I am not using keyed state, it would be indeed > tricky in that case to discard everything. > > I need that for operator state, in my loop fault tolerance PR [1]. The > idea is to tag a ListState (upstream log) per snapshot id. > When a concurent snapshot is commited I want to simply remove everything > related to that ListState (not just clear it). This would also eliminate a > memory leak in case many empty logs accumulate in time (and thus state > entries). > Hope that makes it a bit more clear. Thanks again :) > > Paris > > [1] https://github.com/apache/flink/pull/1668 > > > On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:uce@ > apache.org>> wrote: > > Hey Paris! > > As far as I know it's not possible at the moment and not planned. Does > not sound to hard to add though. @Stefan: correct? > > You can currently only clear the state via #clear in the scope of the > key for keyed state or the whole operator when used with operator > state. In case of keyed state it's indeed hard to clear all state for > operator state it's slightly better. I'm curious what your use case > is? > > - Ufuk > > > On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto: > [hidden email]>> wrote: > Hi folks, > > I have a little question regarding the managed store operator backend, in > case someone can help. > > Is there some convenient way (planned or under development) to completely > unregister a state entry (e.g. a ListState) with a given id from the > backend? > It is fairly easy to register new states dynamically (i.e. with > getOperatorState(...)), why not being able to discard it as well? > > I would find this feature extremely convenient to a fault tolerance > related PR I am working on but I can think of many use cases that might > need it. > > > Paris > > > |
Sure Till,
I would love to also make the patch but need to prioritize some other things these days. At least I will dig and see how complex this is regarding the different backends. I also have some follow-up questions, in case anybody has thought about these things already (or is simply interested): - Do you think it would make sense to automatically garbage collect empty states in general? - Shouldn't this happen already during snapshot compaction (in rocksdb) and would that violate any user assumptions in your view? On Tue, Jan 24, 2017 at 11:44 AM, Till Rohrmann <[hidden email]> wrote: > Hi Paris, > > if there is no such issue open, then please open one so that we can track > the issue. If you have time to work on that even better :-) > > Cheers, > Till > > On Tue, Jan 24, 2017 at 10:25 AM, Paris Carbone <[hidden email]> wrote: > > > Any thoughts/plans? > > So should I open a Jira and add this? > > > > Paris > > > > On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto:parisc@ > > kth.se>> wrote: > > > > Thank you for the answer Ufuk! > > > > To elaborate a bit more, I am not using keyed state, it would be indeed > > tricky in that case to discard everything. > > > > I need that for operator state, in my loop fault tolerance PR [1]. The > > idea is to tag a ListState (upstream log) per snapshot id. > > When a concurent snapshot is commited I want to simply remove everything > > related to that ListState (not just clear it). This would also eliminate > a > > memory leak in case many empty logs accumulate in time (and thus state > > entries). > > Hope that makes it a bit more clear. Thanks again :) > > > > Paris > > > > [1] https://github.com/apache/flink/pull/1668 > > > > > > On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:uce@ > > apache.org>> wrote: > > > > Hey Paris! > > > > As far as I know it's not possible at the moment and not planned. Does > > not sound to hard to add though. @Stefan: correct? > > > > You can currently only clear the state via #clear in the scope of the > > key for keyed state or the whole operator when used with operator > > state. In case of keyed state it's indeed hard to clear all state for > > operator state it's slightly better. I'm curious what your use case > > is? > > > > - Ufuk > > > > > > On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto: > > [hidden email]>> wrote: > > Hi folks, > > > > I have a little question regarding the managed store operator backend, in > > case someone can help. > > > > Is there some convenient way (planned or under development) to completely > > unregister a state entry (e.g. a ListState) with a given id from the > > backend? > > It is fairly easy to register new states dynamically (i.e. with > > getOperatorState(...)), why not being able to discard it as well? > > > > I would find this feature extremely convenient to a fault tolerance > > related PR I am working on but I can think of many use cases that might > > need it. > > > > > > Paris > > > > > > > |
In reply to this post by Till Rohrmann
Sure Till,
I would love to also make the patch but need to prioritize some other things these days. At least I will dig and see how complex this is regarding the different backends. I also have some follow-up questions, in case anybody has thought about these things already (or is simply interested): - Do you think it would make sense to automatically garbage collect empty states in general? - Shouldn't this happen already during snapshot compaction (in rocksdb) and would that violate any user assumptions in your view? > On 24 Jan 2017, at 11:44, Till Rohrmann <[hidden email]> wrote: > > Hi Paris, > > if there is no such issue open, then please open one so that we can track > the issue. If you have time to work on that even better :-) > > Cheers, > Till > > On Tue, Jan 24, 2017 at 10:25 AM, Paris Carbone <[hidden email]> wrote: > >> Any thoughts/plans? >> So should I open a Jira and add this? >> >> Paris >> >> On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto:parisc@ >> kth.se>> wrote: >> >> Thank you for the answer Ufuk! >> >> To elaborate a bit more, I am not using keyed state, it would be indeed >> tricky in that case to discard everything. >> >> I need that for operator state, in my loop fault tolerance PR [1]. The >> idea is to tag a ListState (upstream log) per snapshot id. >> When a concurent snapshot is commited I want to simply remove everything >> related to that ListState (not just clear it). This would also eliminate a >> memory leak in case many empty logs accumulate in time (and thus state >> entries). >> Hope that makes it a bit more clear. Thanks again :) >> >> Paris >> >> [1] https://github.com/apache/flink/pull/1668 >> >> >> On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:uce@ >> apache.org>> wrote: >> >> Hey Paris! >> >> As far as I know it's not possible at the moment and not planned. Does >> not sound to hard to add though. @Stefan: correct? >> >> You can currently only clear the state via #clear in the scope of the >> key for keyed state or the whole operator when used with operator >> state. In case of keyed state it's indeed hard to clear all state for >> operator state it's slightly better. I'm curious what your use case >> is? >> >> - Ufuk >> >> >> On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto: >> [hidden email]>> wrote: >> Hi folks, >> >> I have a little question regarding the managed store operator backend, in >> case someone can help. >> >> Is there some convenient way (planned or under development) to completely >> unregister a state entry (e.g. a ListState) with a given id from the >> backend? >> It is fairly easy to register new states dynamically (i.e. with >> getOperatorState(...)), why not being able to discard it as well? >> >> I would find this feature extremely convenient to a fault tolerance >> related PR I am working on but I can think of many use cases that might >> need it. >> >> >> Paris >> >> >> |
Just a bit of clarification, the OperatorState stuff is independent of
keyed state backends, i.e. even if you use RocksDB the operator state will not be stored in RocksDB, only keyed state is stored there. Right now, when an operator state (ListState) is empty we will still write some meta data about that state. I think it should be easy to change DefaultOperatorStateBackend to not write anything in case of an empty state. What do you think, Stefan? On Tue, 24 Jan 2017 at 12:12 Paris Carbone <[hidden email]> wrote: > Sure Till, > > I would love to also make the patch but need to prioritize some other > things these days. > At least I will dig and see how complex this is regarding the different > backends. > > I also have some follow-up questions, in case anybody has thought about > these things already (or is simply interested): > > - Do you think it would make sense to automatically garbage collect empty > states in general? > - Shouldn't this happen already during snapshot compaction (in rocksdb) > and would that violate any user assumptions in your view? > > > > On 24 Jan 2017, at 11:44, Till Rohrmann <[hidden email]> wrote: > > > > Hi Paris, > > > > if there is no such issue open, then please open one so that we can track > > the issue. If you have time to work on that even better :-) > > > > Cheers, > > Till > > > > On Tue, Jan 24, 2017 at 10:25 AM, Paris Carbone <[hidden email]> wrote: > > > >> Any thoughts/plans? > >> So should I open a Jira and add this? > >> > >> Paris > >> > >> On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto: > parisc@ > >> kth.se>> wrote: > >> > >> Thank you for the answer Ufuk! > >> > >> To elaborate a bit more, I am not using keyed state, it would be indeed > >> tricky in that case to discard everything. > >> > >> I need that for operator state, in my loop fault tolerance PR [1]. The > >> idea is to tag a ListState (upstream log) per snapshot id. > >> When a concurent snapshot is commited I want to simply remove everything > >> related to that ListState (not just clear it). This would also > eliminate a > >> memory leak in case many empty logs accumulate in time (and thus state > >> entries). > >> Hope that makes it a bit more clear. Thanks again :) > >> > >> Paris > >> > >> [1] https://github.com/apache/flink/pull/1668 > >> > >> > >> On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:uce@ > >> apache.org>> wrote: > >> > >> Hey Paris! > >> > >> As far as I know it's not possible at the moment and not planned. Does > >> not sound to hard to add though. @Stefan: correct? > >> > >> You can currently only clear the state via #clear in the scope of the > >> key for keyed state or the whole operator when used with operator > >> state. In case of keyed state it's indeed hard to clear all state for > >> operator state it's slightly better. I'm curious what your use case > >> is? > >> > >> - Ufuk > >> > >> > >> On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto: > >> [hidden email]>> wrote: > >> Hi folks, > >> > >> I have a little question regarding the managed store operator backend, > in > >> case someone can help. > >> > >> Is there some convenient way (planned or under development) to > completely > >> unregister a state entry (e.g. a ListState) with a given id from the > >> backend? > >> It is fairly easy to register new states dynamically (i.e. with > >> getOperatorState(...)), why not being able to discard it as well? > >> > >> I would find this feature extremely convenient to a fault tolerance > >> related PR I am working on but I can think of many use cases that might > >> need it. > >> > >> > >> Paris > >> > >> > >> > > |
Indeed, I noticed that now. Then it should be fairly simple, if you find it reasonable too.
> On 24 Jan 2017, at 14:20, Aljoscha Krettek <[hidden email]> wrote: > > Just a bit of clarification, the OperatorState stuff is independent of > keyed state backends, i.e. even if you use RocksDB the operator state will > not be stored in RocksDB, only keyed state is stored there. > > Right now, when an operator state (ListState) is empty we will still write > some meta data about that state. I think it should be easy to > change DefaultOperatorStateBackend to not write anything in case of an > empty state. What do you think, Stefan? > > On Tue, 24 Jan 2017 at 12:12 Paris Carbone <[hidden email]> wrote: > >> Sure Till, >> >> I would love to also make the patch but need to prioritize some other >> things these days. >> At least I will dig and see how complex this is regarding the different >> backends. >> >> I also have some follow-up questions, in case anybody has thought about >> these things already (or is simply interested): >> >> - Do you think it would make sense to automatically garbage collect empty >> states in general? >> - Shouldn't this happen already during snapshot compaction (in rocksdb) >> and would that violate any user assumptions in your view? >> >> >>> On 24 Jan 2017, at 11:44, Till Rohrmann <[hidden email]> wrote: >>> >>> Hi Paris, >>> >>> if there is no such issue open, then please open one so that we can track >>> the issue. If you have time to work on that even better :-) >>> >>> Cheers, >>> Till >>> >>> On Tue, Jan 24, 2017 at 10:25 AM, Paris Carbone <[hidden email]> wrote: >>> >>>> Any thoughts/plans? >>>> So should I open a Jira and add this? >>>> >>>> Paris >>>> >>>> On Jan 21, 2017, at 5:17 PM, Paris Carbone <[hidden email]<mailto: >> parisc@ >>>> kth.se>> wrote: >>>> >>>> Thank you for the answer Ufuk! >>>> >>>> To elaborate a bit more, I am not using keyed state, it would be indeed >>>> tricky in that case to discard everything. >>>> >>>> I need that for operator state, in my loop fault tolerance PR [1]. The >>>> idea is to tag a ListState (upstream log) per snapshot id. >>>> When a concurent snapshot is commited I want to simply remove everything >>>> related to that ListState (not just clear it). This would also >> eliminate a >>>> memory leak in case many empty logs accumulate in time (and thus state >>>> entries). >>>> Hope that makes it a bit more clear. Thanks again :) >>>> >>>> Paris >>>> >>>> [1] https://github.com/apache/flink/pull/1668 >>>> >>>> >>>> On 21 Jan 2017, at 17:10, Ufuk Celebi <[hidden email]<mailto:uce@ >>>> apache.org>> wrote: >>>> >>>> Hey Paris! >>>> >>>> As far as I know it's not possible at the moment and not planned. Does >>>> not sound to hard to add though. @Stefan: correct? >>>> >>>> You can currently only clear the state via #clear in the scope of the >>>> key for keyed state or the whole operator when used with operator >>>> state. In case of keyed state it's indeed hard to clear all state for >>>> operator state it's slightly better. I'm curious what your use case >>>> is? >>>> >>>> - Ufuk >>>> >>>> >>>> On Fri, Jan 20, 2017 at 5:59 PM, Paris Carbone <[hidden email]<mailto: >>>> [hidden email]>> wrote: >>>> Hi folks, >>>> >>>> I have a little question regarding the managed store operator backend, >> in >>>> case someone can help. >>>> >>>> Is there some convenient way (planned or under development) to >> completely >>>> unregister a state entry (e.g. a ListState) with a given id from the >>>> backend? >>>> It is fairly easy to register new states dynamically (i.e. with >>>> getOperatorState(...)), why not being able to discard it as well? >>>> >>>> I would find this feature extremely convenient to a fault tolerance >>>> related PR I am working on but I can think of many use cases that might >>>> need it. >>>> >>>> >>>> Paris >>>> >>>> >>>> >> >> |
Free forum by Nabble | Edit this page |