Hi every,
I would like to start a discussion on supporting customize state in customized KeyedStateBackend. In Flink, users can customize KeyedStateBackend to support different type of data store. Although we can implement customized StateDescriptors for different kind of data structrues, we do not really have access to such customized state in RichFunctions. I propose to add a getOtherState method in RuntimeContext and DefaultKeyedStateStore which directly takes StateDescriptor as parameter to allow user to get customized state. What do you think? Thanks. Best, Shimin |
Hi Shimin,
Thanks for bring this discussion up. First of all, I'd like to confirm/clarify that this discussion is mainly about managed state with customized state descriptor rather than raw state, right? Asking because raw state was the very first thing came to my mind when seeing the title. And this is actually the first topic/question we need to discuss, that whether we should support user-defined state descriptor and still ask framework to manage the state life cycle. Personally I'm +1 on this since the "official" state (data-structure) types (currently mainly value, list and map) may not be optimized for customer case, but we'd better ask others' opinion. Secondly, if the result of the first question is "Yes", then it's truly a problem that "Although we can implement customized StateDescriptors for different kind of data structures, we do not really have access to such customized state in RichFunctions", and how to resolve it is the second topic/question to discuss. I've noticed your proposal of exposing "getParitionedState" method out in "RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO adding a specific interface like below is better than exposing the internal one: <S extends State, V> State getCustomizedState(StateDescriptor<S, V> stateProperties); Finally, I think this is a user-facing and definitely worthwhile discussion, and requires a FLIP to document the conclusion and design/implementation (if any) down. What's your opinion? Thanks. Best Regards, Yu On Fri, 6 Sep 2019 at 13:27, shimin yang <[hidden email]> wrote: > Hi every, > > I would like to start a discussion on supporting customize state > in customized KeyedStateBackend. > > In Flink, users can customize KeyedStateBackend to support different type > of data store. Although we can implement customized StateDescriptors for > different kind of data structrues, we do not really have access to such > customized state in RichFunctions. > > I propose to add a getOtherState method in RuntimeContext and > DefaultKeyedStateStore which directly takes StateDescriptor as parameter to > allow user to get customized state. > > What do you think? > > Thanks. > > Best, > Shimin > |
Hi all
First of all, I agreed with Yu that we should support to make state type pluginable. If we take a look at current Flink implementation. Users could implement their pluginable state backend to satisfy their own meets now. However, even users could define their own state descriptor, they cannot store the customized state within their state backend. The root cause behind this is that current StateBackendFactory could accept user defined state backend factory while StateFactory (within HeapKeyedStateBackend [1] and RocksDBKeyedStateBackend [2] ) cannot. If we agreed that we should leave the right of implementing customized state backend to users, it's naturally to agree that we should also leave the right of implementing customized states to users. [1] https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L79 [2] https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L114 Best Yun Tang ________________________________ From: Yu Li <[hidden email]> Sent: Monday, September 9, 2019 2:24 To: dev <[hidden email]> Subject: Re: [DISCUSS] Support customize state in customized KeyedStateBackend Hi Shimin, Thanks for bring this discussion up. First of all, I'd like to confirm/clarify that this discussion is mainly about managed state with customized state descriptor rather than raw state, right? Asking because raw state was the very first thing came to my mind when seeing the title. And this is actually the first topic/question we need to discuss, that whether we should support user-defined state descriptor and still ask framework to manage the state life cycle. Personally I'm +1 on this since the "official" state (data-structure) types (currently mainly value, list and map) may not be optimized for customer case, but we'd better ask others' opinion. Secondly, if the result of the first question is "Yes", then it's truly a problem that "Although we can implement customized StateDescriptors for different kind of data structures, we do not really have access to such customized state in RichFunctions", and how to resolve it is the second topic/question to discuss. I've noticed your proposal of exposing "getParitionedState" method out in "RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO adding a specific interface like below is better than exposing the internal one: <S extends State, V> State getCustomizedState(StateDescriptor<S, V> stateProperties); Finally, I think this is a user-facing and definitely worthwhile discussion, and requires a FLIP to document the conclusion and design/implementation (if any) down. What's your opinion? Thanks. Best Regards, Yu On Fri, 6 Sep 2019 at 13:27, shimin yang <[hidden email]> wrote: > Hi every, > > I would like to start a discussion on supporting customize state > in customized KeyedStateBackend. > > In Flink, users can customize KeyedStateBackend to support different type > of data store. Although we can implement customized StateDescriptors for > different kind of data structrues, we do not really have access to such > customized state in RichFunctions. > > I propose to add a getOtherState method in RuntimeContext and > DefaultKeyedStateStore which directly takes StateDescriptor as parameter to > allow user to get customized state. > > What do you think? > > Thanks. > > Best, > Shimin > |
Hi Yu,
For the first question, I would say yes. I was talking about managed states, to be more specific, it's managed keyed states. And the reason why we need the framework to manage life cycle is that we need checkpoint to guarantee exact once semantic in our customized keyed state backend. For the second question, I am quite agree with your proposal. Finally, I would be glad to provide documentation if needed. Best, Shimin Yun Tang <[hidden email]> 于2019年9月9日周一 上午2:46写道: > Hi all > > First of all, I agreed with Yu that we should support to make state type > pluginable. > > If we take a look at current Flink implementation. Users could implement > their pluginable state backend to satisfy their own meets now. However, > even users could define their own state descriptor, they cannot store the > customized state within their state backend. The root cause behind this is > that current StateBackendFactory could accept user defined state backend > factory while StateFactory (within HeapKeyedStateBackend [1] and > RocksDBKeyedStateBackend [2] ) cannot. > > If we agreed that we should leave the right of implementing customized > state backend to users, it's naturally to agree that we should also leave > the right of implementing customized states to users. > > [1] > https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L79 > [2] > https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L114 > > > Best > Yun Tang > > > ________________________________ > From: Yu Li <[hidden email]> > Sent: Monday, September 9, 2019 2:24 > To: dev <[hidden email]> > Subject: Re: [DISCUSS] Support customize state in customized > KeyedStateBackend > > Hi Shimin, > > Thanks for bring this discussion up. > > First of all, I'd like to confirm/clarify that this discussion is mainly > about managed state with customized state descriptor rather than raw state, > right? Asking because raw state was the very first thing came to my mind > when seeing the title. > > And this is actually the first topic/question we need to discuss, that > whether we should support user-defined state descriptor and still ask > framework to manage the state life cycle. Personally I'm +1 on this since > the "official" state (data-structure) types (currently mainly value, list > and map) may not be optimized for customer case, but we'd better ask > others' opinion. > > Secondly, if the result of the first question is "Yes", then it's truly a > problem that "Although we can implement customized StateDescriptors for > different kind of data structures, we do not really have access to such > customized state in RichFunctions", and how to resolve it is the second > topic/question to discuss. > > I've noticed your proposal of exposing "getParitionedState" method out in > "RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO > adding a specific interface like below is better than exposing the internal > one: > <S extends State, V> State getCustomizedState(StateDescriptor<S, V> > stateProperties); > > Finally, I think this is a user-facing and definitely worthwhile > discussion, and requires a FLIP to document the conclusion and > design/implementation (if any) down. What's your opinion? > > Thanks. > > Best Regards, > Yu > > > On Fri, 6 Sep 2019 at 13:27, shimin yang <[hidden email]> wrote: > > > Hi every, > > > > I would like to start a discussion on supporting customize state > > in customized KeyedStateBackend. > > > > In Flink, users can customize KeyedStateBackend to support different type > > of data store. Although we can implement customized StateDescriptors for > > different kind of data structrues, we do not really have access to such > > customized state in RichFunctions. > > > > I propose to add a getOtherState method in RuntimeContext and > > DefaultKeyedStateStore which directly takes StateDescriptor as parameter > to > > allow user to get customized state. > > > > What do you think? > > > > Thanks. > > > > Best, > > Shimin > > > |
Hi Tang,
Actually in my case we implement a totally different KeyedStateBackend and its' factory based on data store other than Heap or RocksDB. Also for state factory of heap and rocksdb, you've made a quite good point and I agree with you opinion. Best, Shimin shimin yang <[hidden email]> 于2019年9月9日周一 下午2:31写道: > Hi Yu, > > For the first question, I would say yes. I was talking about managed > states, to be more specific, it's managed keyed states. And the reason why > we need the framework to manage life cycle is that we need checkpoint to > guarantee exact once semantic in our customized keyed state backend. > > For the second question, I am quite agree with your proposal. > > Finally, I would be glad to provide documentation if needed. > > Best, > Shimin > > Yun Tang <[hidden email]> 于2019年9月9日周一 上午2:46写道: > >> Hi all >> >> First of all, I agreed with Yu that we should support to make state type >> pluginable. >> >> If we take a look at current Flink implementation. Users could implement >> their pluginable state backend to satisfy their own meets now. However, >> even users could define their own state descriptor, they cannot store the >> customized state within their state backend. The root cause behind this is >> that current StateBackendFactory could accept user defined state backend >> factory while StateFactory (within HeapKeyedStateBackend [1] and >> RocksDBKeyedStateBackend [2] ) cannot. >> >> If we agreed that we should leave the right of implementing customized >> state backend to users, it's naturally to agree that we should also leave >> the right of implementing customized states to users. >> >> [1] >> https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L79 >> [2] >> https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L114 >> >> >> Best >> Yun Tang >> >> >> ________________________________ >> From: Yu Li <[hidden email]> >> Sent: Monday, September 9, 2019 2:24 >> To: dev <[hidden email]> >> Subject: Re: [DISCUSS] Support customize state in customized >> KeyedStateBackend >> >> Hi Shimin, >> >> Thanks for bring this discussion up. >> >> First of all, I'd like to confirm/clarify that this discussion is mainly >> about managed state with customized state descriptor rather than raw >> state, >> right? Asking because raw state was the very first thing came to my mind >> when seeing the title. >> >> And this is actually the first topic/question we need to discuss, that >> whether we should support user-defined state descriptor and still ask >> framework to manage the state life cycle. Personally I'm +1 on this since >> the "official" state (data-structure) types (currently mainly value, list >> and map) may not be optimized for customer case, but we'd better ask >> others' opinion. >> >> Secondly, if the result of the first question is "Yes", then it's truly a >> problem that "Although we can implement customized StateDescriptors for >> different kind of data structures, we do not really have access to such >> customized state in RichFunctions", and how to resolve it is the second >> topic/question to discuss. >> >> I've noticed your proposal of exposing "getParitionedState" method out in >> "RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO >> adding a specific interface like below is better than exposing the >> internal >> one: >> <S extends State, V> State getCustomizedState(StateDescriptor<S, V> >> stateProperties); >> >> Finally, I think this is a user-facing and definitely worthwhile >> discussion, and requires a FLIP to document the conclusion and >> design/implementation (if any) down. What's your opinion? >> >> Thanks. >> >> Best Regards, >> Yu >> >> >> On Fri, 6 Sep 2019 at 13:27, shimin yang <[hidden email]> wrote: >> >> > Hi every, >> > >> > I would like to start a discussion on supporting customize state >> > in customized KeyedStateBackend. >> > >> > In Flink, users can customize KeyedStateBackend to support different >> type >> > of data store. Although we can implement customized StateDescriptors for >> > different kind of data structrues, we do not really have access to such >> > customized state in RichFunctions. >> > >> > I propose to add a getOtherState method in RuntimeContext and >> > DefaultKeyedStateStore which directly takes StateDescriptor as >> parameter to >> > allow user to get customized state. >> > >> > What do you think? >> > >> > Thanks. >> > >> > Best, >> > Shimin >> > >> > |
Free forum by Nabble | Edit this page |