[DISCUSS] Support customize state in customized KeyedStateBackend

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Support customize state in customized KeyedStateBackend

shimin yang
Hi every,

I would like to start a discussion on supporting customize state
in customized KeyedStateBackend.

In Flink, users can customize KeyedStateBackend to support different type
of data store. Although we can implement customized StateDescriptors for
different kind of data structrues, we do not really have access to such
customized state in RichFunctions.

I propose to add a getOtherState method in RuntimeContext and
DefaultKeyedStateStore which directly takes StateDescriptor as parameter to
allow user to get customized state.

What do you think?

Thanks.

Best,
Shimin
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

Yu Li
Hi Shimin,

Thanks for bring this discussion up.

First of all, I'd like to confirm/clarify that this discussion is mainly
about managed state with customized state descriptor rather than raw state,
right? Asking because raw state was the very first thing came to my mind
when seeing the title.

And this is actually the first topic/question we need to discuss, that
whether we should support user-defined state descriptor and still ask
framework to manage the state life cycle. Personally I'm +1 on this since
the "official" state (data-structure) types (currently mainly value, list
and map) may not be optimized for customer case, but we'd better ask
others' opinion.

Secondly, if the result of the first question is "Yes", then it's truly a
problem that "Although we can implement customized StateDescriptors for
different kind of data structures, we do not really have access to such
customized state in RichFunctions", and how to resolve it is the second
topic/question to discuss.

I've noticed your proposal of exposing "getParitionedState" method out in
"RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO
adding a specific interface like below is better than exposing the internal
one:
<S extends State, V> State getCustomizedState(StateDescriptor<S, V>
stateProperties);

Finally, I think this is a user-facing and definitely worthwhile
discussion, and requires a FLIP to document the conclusion and
design/implementation (if any) down. What's your opinion?

Thanks.

Best Regards,
Yu


On Fri, 6 Sep 2019 at 13:27, shimin yang <[hidden email]> wrote:

> Hi every,
>
> I would like to start a discussion on supporting customize state
> in customized KeyedStateBackend.
>
> In Flink, users can customize KeyedStateBackend to support different type
> of data store. Although we can implement customized StateDescriptors for
> different kind of data structrues, we do not really have access to such
> customized state in RichFunctions.
>
> I propose to add a getOtherState method in RuntimeContext and
> DefaultKeyedStateStore which directly takes StateDescriptor as parameter to
> allow user to get customized state.
>
> What do you think?
>
> Thanks.
>
> Best,
> Shimin
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

Yun Tang
Hi all

First of all, I agreed with Yu that we should support to make state type pluginable.

If we take a look at current Flink implementation. Users could implement their pluginable state backend to satisfy their own meets now. However, even users could define their own state descriptor, they cannot store the customized state within their state backend. The root cause behind this is that current StateBackendFactory could accept user defined state backend factory while StateFactory (within HeapKeyedStateBackend [1] and RocksDBKeyedStateBackend [2] ) cannot.

If we agreed that we should leave the right of implementing customized state backend to users, it's naturally to agree that we should also leave the right of implementing customized states to users.

[1] https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L79
[2] https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L114


Best
Yun Tang


________________________________
From: Yu Li <[hidden email]>
Sent: Monday, September 9, 2019 2:24
To: dev <[hidden email]>
Subject: Re: [DISCUSS] Support customize state in customized KeyedStateBackend

Hi Shimin,

Thanks for bring this discussion up.

First of all, I'd like to confirm/clarify that this discussion is mainly
about managed state with customized state descriptor rather than raw state,
right? Asking because raw state was the very first thing came to my mind
when seeing the title.

And this is actually the first topic/question we need to discuss, that
whether we should support user-defined state descriptor and still ask
framework to manage the state life cycle. Personally I'm +1 on this since
the "official" state (data-structure) types (currently mainly value, list
and map) may not be optimized for customer case, but we'd better ask
others' opinion.

Secondly, if the result of the first question is "Yes", then it's truly a
problem that "Although we can implement customized StateDescriptors for
different kind of data structures, we do not really have access to such
customized state in RichFunctions", and how to resolve it is the second
topic/question to discuss.

I've noticed your proposal of exposing "getParitionedState" method out in
"RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO
adding a specific interface like below is better than exposing the internal
one:
<S extends State, V> State getCustomizedState(StateDescriptor<S, V>
stateProperties);

Finally, I think this is a user-facing and definitely worthwhile
discussion, and requires a FLIP to document the conclusion and
design/implementation (if any) down. What's your opinion?

Thanks.

Best Regards,
Yu


On Fri, 6 Sep 2019 at 13:27, shimin yang <[hidden email]> wrote:

> Hi every,
>
> I would like to start a discussion on supporting customize state
> in customized KeyedStateBackend.
>
> In Flink, users can customize KeyedStateBackend to support different type
> of data store. Although we can implement customized StateDescriptors for
> different kind of data structrues, we do not really have access to such
> customized state in RichFunctions.
>
> I propose to add a getOtherState method in RuntimeContext and
> DefaultKeyedStateStore which directly takes StateDescriptor as parameter to
> allow user to get customized state.
>
> What do you think?
>
> Thanks.
>
> Best,
> Shimin
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

shimin yang
Hi Yu,

For the first question, I would say yes. I was talking about managed
states, to be more specific, it's managed keyed states. And the reason why
we need the framework to manage life cycle is that we need checkpoint to
guarantee exact once semantic in our customized keyed state backend.

For the second question, I am quite agree with your proposal.

Finally, I would be glad to provide documentation if needed.

Best,
Shimin

Yun Tang <[hidden email]> 于2019年9月9日周一 上午2:46写道:

> Hi all
>
> First of all, I agreed with Yu that we should support to make state type
> pluginable.
>
> If we take a look at current Flink implementation. Users could implement
> their pluginable state backend to satisfy their own meets now. However,
> even users could define their own state descriptor, they cannot store the
> customized state within their state backend. The root cause behind this is
> that current StateBackendFactory could accept user defined state backend
> factory while StateFactory (within HeapKeyedStateBackend [1] and
> RocksDBKeyedStateBackend [2] ) cannot.
>
> If we agreed that we should leave the right of implementing customized
> state backend to users, it's naturally to agree that we should also leave
> the right of implementing customized states to users.
>
> [1]
> https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L79
> [2]
> https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L114
>
>
> Best
> Yun Tang
>
>
> ________________________________
> From: Yu Li <[hidden email]>
> Sent: Monday, September 9, 2019 2:24
> To: dev <[hidden email]>
> Subject: Re: [DISCUSS] Support customize state in customized
> KeyedStateBackend
>
> Hi Shimin,
>
> Thanks for bring this discussion up.
>
> First of all, I'd like to confirm/clarify that this discussion is mainly
> about managed state with customized state descriptor rather than raw state,
> right? Asking because raw state was the very first thing came to my mind
> when seeing the title.
>
> And this is actually the first topic/question we need to discuss, that
> whether we should support user-defined state descriptor and still ask
> framework to manage the state life cycle. Personally I'm +1 on this since
> the "official" state (data-structure) types (currently mainly value, list
> and map) may not be optimized for customer case, but we'd better ask
> others' opinion.
>
> Secondly, if the result of the first question is "Yes", then it's truly a
> problem that "Although we can implement customized StateDescriptors for
> different kind of data structures, we do not really have access to such
> customized state in RichFunctions", and how to resolve it is the second
> topic/question to discuss.
>
> I've noticed your proposal of exposing "getParitionedState" method out in
> "RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO
> adding a specific interface like below is better than exposing the internal
> one:
> <S extends State, V> State getCustomizedState(StateDescriptor<S, V>
> stateProperties);
>
> Finally, I think this is a user-facing and definitely worthwhile
> discussion, and requires a FLIP to document the conclusion and
> design/implementation (if any) down. What's your opinion?
>
> Thanks.
>
> Best Regards,
> Yu
>
>
> On Fri, 6 Sep 2019 at 13:27, shimin yang <[hidden email]> wrote:
>
> > Hi every,
> >
> > I would like to start a discussion on supporting customize state
> > in customized KeyedStateBackend.
> >
> > In Flink, users can customize KeyedStateBackend to support different type
> > of data store. Although we can implement customized StateDescriptors for
> > different kind of data structrues, we do not really have access to such
> > customized state in RichFunctions.
> >
> > I propose to add a getOtherState method in RuntimeContext and
> > DefaultKeyedStateStore which directly takes StateDescriptor as parameter
> to
> > allow user to get customized state.
> >
> > What do you think?
> >
> > Thanks.
> >
> > Best,
> > Shimin
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

shimin yang
Hi Tang,

Actually in my case we implement a totally different KeyedStateBackend and
its' factory based on data store other than Heap or RocksDB.

Also for state factory of heap and rocksdb, you've made a quite good point
and I agree with you opinion.

Best,
Shimin

shimin yang <[hidden email]> 于2019年9月9日周一 下午2:31写道:

> Hi Yu,
>
> For the first question, I would say yes. I was talking about managed
> states, to be more specific, it's managed keyed states. And the reason why
> we need the framework to manage life cycle is that we need checkpoint to
> guarantee exact once semantic in our customized keyed state backend.
>
> For the second question, I am quite agree with your proposal.
>
> Finally, I would be glad to provide documentation if needed.
>
> Best,
> Shimin
>
> Yun Tang <[hidden email]> 于2019年9月9日周一 上午2:46写道:
>
>> Hi all
>>
>> First of all, I agreed with Yu that we should support to make state type
>> pluginable.
>>
>> If we take a look at current Flink implementation. Users could implement
>> their pluginable state backend to satisfy their own meets now. However,
>> even users could define their own state descriptor, they cannot store the
>> customized state within their state backend. The root cause behind this is
>> that current StateBackendFactory could accept user defined state backend
>> factory while StateFactory (within HeapKeyedStateBackend [1] and
>> RocksDBKeyedStateBackend [2] ) cannot.
>>
>> If we agreed that we should leave the right of implementing customized
>> state backend to users, it's naturally to agree that we should also leave
>> the right of implementing customized states to users.
>>
>> [1]
>> https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L79
>> [2]
>> https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L114
>>
>>
>> Best
>> Yun Tang
>>
>>
>> ________________________________
>> From: Yu Li <[hidden email]>
>> Sent: Monday, September 9, 2019 2:24
>> To: dev <[hidden email]>
>> Subject: Re: [DISCUSS] Support customize state in customized
>> KeyedStateBackend
>>
>> Hi Shimin,
>>
>> Thanks for bring this discussion up.
>>
>> First of all, I'd like to confirm/clarify that this discussion is mainly
>> about managed state with customized state descriptor rather than raw
>> state,
>> right? Asking because raw state was the very first thing came to my mind
>> when seeing the title.
>>
>> And this is actually the first topic/question we need to discuss, that
>> whether we should support user-defined state descriptor and still ask
>> framework to manage the state life cycle. Personally I'm +1 on this since
>> the "official" state (data-structure) types (currently mainly value, list
>> and map) may not be optimized for customer case, but we'd better ask
>> others' opinion.
>>
>> Secondly, if the result of the first question is "Yes", then it's truly a
>> problem that "Although we can implement customized StateDescriptors for
>> different kind of data structures, we do not really have access to such
>> customized state in RichFunctions", and how to resolve it is the second
>> topic/question to discuss.
>>
>> I've noticed your proposal of exposing "getParitionedState" method out in
>> "RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO
>> adding a specific interface like below is better than exposing the
>> internal
>> one:
>> <S extends State, V> State getCustomizedState(StateDescriptor<S, V>
>> stateProperties);
>>
>> Finally, I think this is a user-facing and definitely worthwhile
>> discussion, and requires a FLIP to document the conclusion and
>> design/implementation (if any) down. What's your opinion?
>>
>> Thanks.
>>
>> Best Regards,
>> Yu
>>
>>
>> On Fri, 6 Sep 2019 at 13:27, shimin yang <[hidden email]> wrote:
>>
>> > Hi every,
>> >
>> > I would like to start a discussion on supporting customize state
>> > in customized KeyedStateBackend.
>> >
>> > In Flink, users can customize KeyedStateBackend to support different
>> type
>> > of data store. Although we can implement customized StateDescriptors for
>> > different kind of data structrues, we do not really have access to such
>> > customized state in RichFunctions.
>> >
>> > I propose to add a getOtherState method in RuntimeContext and
>> > DefaultKeyedStateStore which directly takes StateDescriptor as
>> parameter to
>> > allow user to get customized state.
>> >
>> > What do you think?
>> >
>> > Thanks.
>> >
>> > Best,
>> > Shimin
>> >
>>
>