[DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

curcur
Hey all,

We would like to start a discussion on how to enable/config Changelog
Statebakcend.

As part of FLIP-158[1], Changelog state backend wraps on top of existing
state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may
expect more) and delegates state changes to the underlying state backends.
This thread is to discuss the problem of how Changelog StateBackend should
be enabled and configured.

Proposed options to enable/config state changelog is listed below:

Option 1: Enable Changelog Statebackend through a Boolean Flag

Option 2: Enable Changelog Statebackend through a Boolean Flag + a Special
Case

Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
ChangelogStateBackend Exposed

Option 4: Explicit Nested Configuration + “changelog.inner” prefix for
inner backend

Option 5: Explicit Nested Configuration + inner state backend configuration
unchanged

Option 6: Config Changelog and Inner Statebackend All-Together

Details of each option can be found here:
https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing

When considering these options, please consider these four dimensions:
1 Consistency
API/config should follow a consistent model and should not have
contradicted logic beneath
2 Simplicity
API should be easy to use and not introduce too much burden on users
3. Explicity
API/config should not contain implicit assumptions and should be intuitive
to users
4. Extensibility
With foreseen future, whether the current setting can be easily extended

Please let us know what do you think and please keep the discussion in this
mailing thread.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints

Best
Yuan
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

roman
Hey Yuan, thanks for the proposal

I think Option 3 is the simplest to use and exposes less details than any other.
It's also consistent with the current way of configuring state
backends, as long as we treat change logging as a common feature
applicable to any state backend, like e.g.
state.backend.local-recovery.

Option 6 seems slightly less preferable as it exposes more details but
I think is the most viable alternative.

Regards,
Roman


On Mon, May 31, 2021 at 8:39 AM Yuan Mei <[hidden email]> wrote:

>
> Hey all,
>
> We would like to start a discussion on how to enable/config Changelog
> Statebakcend.
>
> As part of FLIP-158[1], Changelog state backend wraps on top of existing
> state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may
> expect more) and delegates state changes to the underlying state backends.
> This thread is to discuss the problem of how Changelog StateBackend should
> be enabled and configured.
>
> Proposed options to enable/config state changelog is listed below:
>
> Option 1: Enable Changelog Statebackend through a Boolean Flag
>
> Option 2: Enable Changelog Statebackend through a Boolean Flag + a Special
> Case
>
> Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
> ChangelogStateBackend Exposed
>
> Option 4: Explicit Nested Configuration + “changelog.inner” prefix for
> inner backend
>
> Option 5: Explicit Nested Configuration + inner state backend configuration
> unchanged
>
> Option 6: Config Changelog and Inner Statebackend All-Together
>
> Details of each option can be found here:
> https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing
>
> When considering these options, please consider these four dimensions:
> 1 Consistency
> API/config should follow a consistent model and should not have
> contradicted logic beneath
> 2 Simplicity
> API should be easy to use and not introduce too much burden on users
> 3. Explicity
> API/config should not contain implicit assumptions and should be intuitive
> to users
> 4. Extensibility
> With foreseen future, whether the current setting can be easily extended
>
> Please let us know what do you think and please keep the discussion in this
> mailing thread.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
>
> Best
> Yuan
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

Yun Tang
Hi Yuan, thanks for launching this discussion.

I prefer option-3 as this is the easiest to understand for users.


Best
Yun Tang
________________________________
From: Roman Khachatryan <[hidden email]>
Sent: Monday, May 31, 2021 16:53
To: dev <[hidden email]>
Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

Hey Yuan, thanks for the proposal

I think Option 3 is the simplest to use and exposes less details than any other.
It's also consistent with the current way of configuring state
backends, as long as we treat change logging as a common feature
applicable to any state backend, like e.g.
state.backend.local-recovery.

Option 6 seems slightly less preferable as it exposes more details but
I think is the most viable alternative.

Regards,
Roman


On Mon, May 31, 2021 at 8:39 AM Yuan Mei <[hidden email]> wrote:

>
> Hey all,
>
> We would like to start a discussion on how to enable/config Changelog
> Statebakcend.
>
> As part of FLIP-158[1], Changelog state backend wraps on top of existing
> state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may
> expect more) and delegates state changes to the underlying state backends.
> This thread is to discuss the problem of how Changelog StateBackend should
> be enabled and configured.
>
> Proposed options to enable/config state changelog is listed below:
>
> Option 1: Enable Changelog Statebackend through a Boolean Flag
>
> Option 2: Enable Changelog Statebackend through a Boolean Flag + a Special
> Case
>
> Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
> ChangelogStateBackend Exposed
>
> Option 4: Explicit Nested Configuration + “changelog.inner” prefix for
> inner backend
>
> Option 5: Explicit Nested Configuration + inner state backend configuration
> unchanged
>
> Option 6: Config Changelog and Inner Statebackend All-Together
>
> Details of each option can be found here:
> https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing
>
> When considering these options, please consider these four dimensions:
> 1 Consistency
> API/config should follow a consistent model and should not have
> contradicted logic beneath
> 2 Simplicity
> API should be easy to use and not introduce too much burden on users
> 3. Explicity
> API/config should not contain implicit assumptions and should be intuitive
> to users
> 4. Extensibility
> With foreseen future, whether the current setting can be easily extended
>
> Please let us know what do you think and please keep the discussion in this
> mailing thread.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
>
> Best
> Yuan
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

Piotr Nowojski-5
Hi,

I would actually prefer option 6 (or 5/4), for the sake of configuration
being explicit and self explanatory. But at the same time I don't have very
hard preferences and from the remaining options, option 3 seems the most
reasonable.

The question would be, do we want to expose to the users that
ChangeLogStateBackend is wrapping an inner state backend or not? If not,
option 3 is the best. If we do, if we want to teach the users and help them
build the understanding of how things are working underneath, option 5 or 6
are better.

Best,
Piotrek

śr., 2 cze 2021 o 04:36 Yun Tang <[hidden email]> napisał(a):

> Hi Yuan, thanks for launching this discussion.
>
> I prefer option-3 as this is the easiest to understand for users.
>
>
> Best
> Yun Tang
> ________________________________
> From: Roman Khachatryan <[hidden email]>
> Sent: Monday, May 31, 2021 16:53
> To: dev <[hidden email]>
> Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend
> Configuration Proposal
>
> Hey Yuan, thanks for the proposal
>
> I think Option 3 is the simplest to use and exposes less details than any
> other.
> It's also consistent with the current way of configuring state
> backends, as long as we treat change logging as a common feature
> applicable to any state backend, like e.g.
> state.backend.local-recovery.
>
> Option 6 seems slightly less preferable as it exposes more details but
> I think is the most viable alternative.
>
> Regards,
> Roman
>
>
> On Mon, May 31, 2021 at 8:39 AM Yuan Mei <[hidden email]> wrote:
> >
> > Hey all,
> >
> > We would like to start a discussion on how to enable/config Changelog
> > Statebakcend.
> >
> > As part of FLIP-158[1], Changelog state backend wraps on top of existing
> > state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may
> > expect more) and delegates state changes to the underlying state
> backends.
> > This thread is to discuss the problem of how Changelog StateBackend
> should
> > be enabled and configured.
> >
> > Proposed options to enable/config state changelog is listed below:
> >
> > Option 1: Enable Changelog Statebackend through a Boolean Flag
> >
> > Option 2: Enable Changelog Statebackend through a Boolean Flag + a
> Special
> > Case
> >
> > Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
> > ChangelogStateBackend Exposed
> >
> > Option 4: Explicit Nested Configuration + “changelog.inner” prefix for
> > inner backend
> >
> > Option 5: Explicit Nested Configuration + inner state backend
> configuration
> > unchanged
> >
> > Option 6: Config Changelog and Inner Statebackend All-Together
> >
> > Details of each option can be found here:
> >
> https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing
> >
> > When considering these options, please consider these four dimensions:
> > 1 Consistency
> > API/config should follow a consistent model and should not have
> > contradicted logic beneath
> > 2 Simplicity
> > API should be easy to use and not introduce too much burden on users
> > 3. Explicity
> > API/config should not contain implicit assumptions and should be
> intuitive
> > to users
> > 4. Extensibility
> > With foreseen future, whether the current setting can be easily extended
> >
> > Please let us know what do you think and please keep the discussion in
> this
> > mailing thread.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
> >
> > Best
> > Yuan
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

Yu Li
+1 for option 3.

IMHO persisting (operator's) state data through change log is an
independent mechanism which could co-work with all kinds of local state
stores (heap and rocksdb). This mechanism is similar to the WAL
(write-ahead-log) mechanism in the database system. Although implement-wise
we're using wrapper (decorator) pattern and naming it as
`ChangeLogStateBackend`, it's not really another type of state backend. For
the same reason, ChangeLogStateBackend should be an internal class and not
exposed to the end user. Users only need to know / control whether to
enable change log or not, just like whether to enable WAL in the
traditional database system.

Thanks.

Best Regards,
Yu


On Thu, 3 Jun 2021 at 22:50, Piotr Nowojski <[hidden email]> wrote:

> Hi,
>
> I would actually prefer option 6 (or 5/4), for the sake of configuration
> being explicit and self explanatory. But at the same time I don't have very
> hard preferences and from the remaining options, option 3 seems the most
> reasonable.
>
> The question would be, do we want to expose to the users that
> ChangeLogStateBackend is wrapping an inner state backend or not? If not,
> option 3 is the best. If we do, if we want to teach the users and help them
> build the understanding of how things are working underneath, option 5 or 6
> are better.
>
> Best,
> Piotrek
>
> śr., 2 cze 2021 o 04:36 Yun Tang <[hidden email]> napisał(a):
>
> > Hi Yuan, thanks for launching this discussion.
> >
> > I prefer option-3 as this is the easiest to understand for users.
> >
> >
> > Best
> > Yun Tang
> > ________________________________
> > From: Roman Khachatryan <[hidden email]>
> > Sent: Monday, May 31, 2021 16:53
> > To: dev <[hidden email]>
> > Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend
> > Configuration Proposal
> >
> > Hey Yuan, thanks for the proposal
> >
> > I think Option 3 is the simplest to use and exposes less details than any
> > other.
> > It's also consistent with the current way of configuring state
> > backends, as long as we treat change logging as a common feature
> > applicable to any state backend, like e.g.
> > state.backend.local-recovery.
> >
> > Option 6 seems slightly less preferable as it exposes more details but
> > I think is the most viable alternative.
> >
> > Regards,
> > Roman
> >
> >
> > On Mon, May 31, 2021 at 8:39 AM Yuan Mei <[hidden email]> wrote:
> > >
> > > Hey all,
> > >
> > > We would like to start a discussion on how to enable/config Changelog
> > > Statebakcend.
> > >
> > > As part of FLIP-158[1], Changelog state backend wraps on top of
> existing
> > > state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may
> > > expect more) and delegates state changes to the underlying state
> > backends.
> > > This thread is to discuss the problem of how Changelog StateBackend
> > should
> > > be enabled and configured.
> > >
> > > Proposed options to enable/config state changelog is listed below:
> > >
> > > Option 1: Enable Changelog Statebackend through a Boolean Flag
> > >
> > > Option 2: Enable Changelog Statebackend through a Boolean Flag + a
> > Special
> > > Case
> > >
> > > Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
> > > ChangelogStateBackend Exposed
> > >
> > > Option 4: Explicit Nested Configuration + “changelog.inner” prefix for
> > > inner backend
> > >
> > > Option 5: Explicit Nested Configuration + inner state backend
> > configuration
> > > unchanged
> > >
> > > Option 6: Config Changelog and Inner Statebackend All-Together
> > >
> > > Details of each option can be found here:
> > >
> >
> https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing
> > >
> > > When considering these options, please consider these four dimensions:
> > > 1 Consistency
> > > API/config should follow a consistent model and should not have
> > > contradicted logic beneath
> > > 2 Simplicity
> > > API should be easy to use and not introduce too much burden on users
> > > 3. Explicity
> > > API/config should not contain implicit assumptions and should be
> > intuitive
> > > to users
> > > 4. Extensibility
> > > With foreseen future, whether the current setting can be easily
> extended
> > >
> > > Please let us know what do you think and please keep the discussion in
> > this
> > > mailing thread.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
> > >
> > > Best
> > > Yuan
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

curcur
Thank you everyone for replying!

Option 3 wins with dominating # of votes + mine.

This option works as a refined version of the original proposal in
FLIP-158: Generalized incremental checkpoints [1]:
  - Define consistent override and combination policy (flag + state
backend) in different config levels
  - Define explicitly the meaning of "enable flag" = true/false/unset
  - Hide ChangelogStateBackend from users

According to the discussion in this thread, we will go with
Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
ChangelogStateBackend Exposed

 [1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints

Best
Yuan

On Tue, Jun 8, 2021 at 6:40 PM Yu Li <[hidden email]> wrote:

> +1 for option 3.
>
> IMHO persisting (operator's) state data through change log is an
> independent mechanism which could co-work with all kinds of local state
> stores (heap and rocksdb). This mechanism is similar to the WAL
> (write-ahead-log) mechanism in the database system. Although implement-wise
> we're using wrapper (decorator) pattern and naming it as
> `ChangeLogStateBackend`, it's not really another type of state backend. For
> the same reason, ChangeLogStateBackend should be an internal class and not
> exposed to the end user. Users only need to know / control whether to
> enable change log or not, just like whether to enable WAL in the
> traditional database system.
>
> Thanks.
>
> Best Regards,
> Yu
>
>
> On Thu, 3 Jun 2021 at 22:50, Piotr Nowojski <[hidden email]> wrote:
>
> > Hi,
> >
> > I would actually prefer option 6 (or 5/4), for the sake of configuration
> > being explicit and self explanatory. But at the same time I don't have
> very
> > hard preferences and from the remaining options, option 3 seems the most
> > reasonable.
> >
> > The question would be, do we want to expose to the users that
> > ChangeLogStateBackend is wrapping an inner state backend or not? If not,
> > option 3 is the best. If we do, if we want to teach the users and help
> them
> > build the understanding of how things are working underneath, option 5
> or 6
> > are better.
> >
> > Best,
> > Piotrek
> >
> > śr., 2 cze 2021 o 04:36 Yun Tang <[hidden email]> napisał(a):
> >
> > > Hi Yuan, thanks for launching this discussion.
> > >
> > > I prefer option-3 as this is the easiest to understand for users.
> > >
> > >
> > > Best
> > > Yun Tang
> > > ________________________________
> > > From: Roman Khachatryan <[hidden email]>
> > > Sent: Monday, May 31, 2021 16:53
> > > To: dev <[hidden email]>
> > > Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend
> > > Configuration Proposal
> > >
> > > Hey Yuan, thanks for the proposal
> > >
> > > I think Option 3 is the simplest to use and exposes less details than
> any
> > > other.
> > > It's also consistent with the current way of configuring state
> > > backends, as long as we treat change logging as a common feature
> > > applicable to any state backend, like e.g.
> > > state.backend.local-recovery.
> > >
> > > Option 6 seems slightly less preferable as it exposes more details but
> > > I think is the most viable alternative.
> > >
> > > Regards,
> > > Roman
> > >
> > >
> > > On Mon, May 31, 2021 at 8:39 AM Yuan Mei <[hidden email]>
> wrote:
> > > >
> > > > Hey all,
> > > >
> > > > We would like to start a discussion on how to enable/config Changelog
> > > > Statebakcend.
> > > >
> > > > As part of FLIP-158[1], Changelog state backend wraps on top of
> > existing
> > > > state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and
> may
> > > > expect more) and delegates state changes to the underlying state
> > > backends.
> > > > This thread is to discuss the problem of how Changelog StateBackend
> > > should
> > > > be enabled and configured.
> > > >
> > > > Proposed options to enable/config state changelog is listed below:
> > > >
> > > > Option 1: Enable Changelog Statebackend through a Boolean Flag
> > > >
> > > > Option 2: Enable Changelog Statebackend through a Boolean Flag + a
> > > Special
> > > > Case
> > > >
> > > > Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
> > > > ChangelogStateBackend Exposed
> > > >
> > > > Option 4: Explicit Nested Configuration + “changelog.inner” prefix
> for
> > > > inner backend
> > > >
> > > > Option 5: Explicit Nested Configuration + inner state backend
> > > configuration
> > > > unchanged
> > > >
> > > > Option 6: Config Changelog and Inner Statebackend All-Together
> > > >
> > > > Details of each option can be found here:
> > > >
> > >
> >
> https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing
> > > >
> > > > When considering these options, please consider these four
> dimensions:
> > > > 1 Consistency
> > > > API/config should follow a consistent model and should not have
> > > > contradicted logic beneath
> > > > 2 Simplicity
> > > > API should be easy to use and not introduce too much burden on users
> > > > 3. Explicity
> > > > API/config should not contain implicit assumptions and should be
> > > intuitive
> > > > to users
> > > > 4. Extensibility
> > > > With foreseen future, whether the current setting can be easily
> > extended
> > > >
> > > > Please let us know what do you think and please keep the discussion
> in
> > > this
> > > > mailing thread.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
> > > >
> > > > Best
> > > > Yuan
> > >
> >
>