Hey all,
We would like to start a discussion on how to enable/config Changelog Statebakcend. As part of FLIP-158[1], Changelog state backend wraps on top of existing state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may expect more) and delegates state changes to the underlying state backends. This thread is to discuss the problem of how Changelog StateBackend should be enabled and configured. Proposed options to enable/config state changelog is listed below: Option 1: Enable Changelog Statebackend through a Boolean Flag Option 2: Enable Changelog Statebackend through a Boolean Flag + a Special Case Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O ChangelogStateBackend Exposed Option 4: Explicit Nested Configuration + “changelog.inner” prefix for inner backend Option 5: Explicit Nested Configuration + inner state backend configuration unchanged Option 6: Config Changelog and Inner Statebackend All-Together Details of each option can be found here: https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing When considering these options, please consider these four dimensions: 1 Consistency API/config should follow a consistent model and should not have contradicted logic beneath 2 Simplicity API should be easy to use and not introduce too much burden on users 3. Explicity API/config should not contain implicit assumptions and should be intuitive to users 4. Extensibility With foreseen future, whether the current setting can be easily extended Please let us know what do you think and please keep the discussion in this mailing thread. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints Best Yuan |
Hey Yuan, thanks for the proposal
I think Option 3 is the simplest to use and exposes less details than any other. It's also consistent with the current way of configuring state backends, as long as we treat change logging as a common feature applicable to any state backend, like e.g. state.backend.local-recovery. Option 6 seems slightly less preferable as it exposes more details but I think is the most viable alternative. Regards, Roman On Mon, May 31, 2021 at 8:39 AM Yuan Mei <[hidden email]> wrote: > > Hey all, > > We would like to start a discussion on how to enable/config Changelog > Statebakcend. > > As part of FLIP-158[1], Changelog state backend wraps on top of existing > state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may > expect more) and delegates state changes to the underlying state backends. > This thread is to discuss the problem of how Changelog StateBackend should > be enabled and configured. > > Proposed options to enable/config state changelog is listed below: > > Option 1: Enable Changelog Statebackend through a Boolean Flag > > Option 2: Enable Changelog Statebackend through a Boolean Flag + a Special > Case > > Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O > ChangelogStateBackend Exposed > > Option 4: Explicit Nested Configuration + “changelog.inner” prefix for > inner backend > > Option 5: Explicit Nested Configuration + inner state backend configuration > unchanged > > Option 6: Config Changelog and Inner Statebackend All-Together > > Details of each option can be found here: > https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing > > When considering these options, please consider these four dimensions: > 1 Consistency > API/config should follow a consistent model and should not have > contradicted logic beneath > 2 Simplicity > API should be easy to use and not introduce too much burden on users > 3. Explicity > API/config should not contain implicit assumptions and should be intuitive > to users > 4. Extensibility > With foreseen future, whether the current setting can be easily extended > > Please let us know what do you think and please keep the discussion in this > mailing thread. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints > > Best > Yuan |
Hi Yuan, thanks for launching this discussion.
I prefer option-3 as this is the easiest to understand for users. Best Yun Tang ________________________________ From: Roman Khachatryan <[hidden email]> Sent: Monday, May 31, 2021 16:53 To: dev <[hidden email]> Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal Hey Yuan, thanks for the proposal I think Option 3 is the simplest to use and exposes less details than any other. It's also consistent with the current way of configuring state backends, as long as we treat change logging as a common feature applicable to any state backend, like e.g. state.backend.local-recovery. Option 6 seems slightly less preferable as it exposes more details but I think is the most viable alternative. Regards, Roman On Mon, May 31, 2021 at 8:39 AM Yuan Mei <[hidden email]> wrote: > > Hey all, > > We would like to start a discussion on how to enable/config Changelog > Statebakcend. > > As part of FLIP-158[1], Changelog state backend wraps on top of existing > state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may > expect more) and delegates state changes to the underlying state backends. > This thread is to discuss the problem of how Changelog StateBackend should > be enabled and configured. > > Proposed options to enable/config state changelog is listed below: > > Option 1: Enable Changelog Statebackend through a Boolean Flag > > Option 2: Enable Changelog Statebackend through a Boolean Flag + a Special > Case > > Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O > ChangelogStateBackend Exposed > > Option 4: Explicit Nested Configuration + “changelog.inner” prefix for > inner backend > > Option 5: Explicit Nested Configuration + inner state backend configuration > unchanged > > Option 6: Config Changelog and Inner Statebackend All-Together > > Details of each option can be found here: > https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing > > When considering these options, please consider these four dimensions: > 1 Consistency > API/config should follow a consistent model and should not have > contradicted logic beneath > 2 Simplicity > API should be easy to use and not introduce too much burden on users > 3. Explicity > API/config should not contain implicit assumptions and should be intuitive > to users > 4. Extensibility > With foreseen future, whether the current setting can be easily extended > > Please let us know what do you think and please keep the discussion in this > mailing thread. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints > > Best > Yuan |
Hi,
I would actually prefer option 6 (or 5/4), for the sake of configuration being explicit and self explanatory. But at the same time I don't have very hard preferences and from the remaining options, option 3 seems the most reasonable. The question would be, do we want to expose to the users that ChangeLogStateBackend is wrapping an inner state backend or not? If not, option 3 is the best. If we do, if we want to teach the users and help them build the understanding of how things are working underneath, option 5 or 6 are better. Best, Piotrek śr., 2 cze 2021 o 04:36 Yun Tang <[hidden email]> napisał(a): > Hi Yuan, thanks for launching this discussion. > > I prefer option-3 as this is the easiest to understand for users. > > > Best > Yun Tang > ________________________________ > From: Roman Khachatryan <[hidden email]> > Sent: Monday, May 31, 2021 16:53 > To: dev <[hidden email]> > Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend > Configuration Proposal > > Hey Yuan, thanks for the proposal > > I think Option 3 is the simplest to use and exposes less details than any > other. > It's also consistent with the current way of configuring state > backends, as long as we treat change logging as a common feature > applicable to any state backend, like e.g. > state.backend.local-recovery. > > Option 6 seems slightly less preferable as it exposes more details but > I think is the most viable alternative. > > Regards, > Roman > > > On Mon, May 31, 2021 at 8:39 AM Yuan Mei <[hidden email]> wrote: > > > > Hey all, > > > > We would like to start a discussion on how to enable/config Changelog > > Statebakcend. > > > > As part of FLIP-158[1], Changelog state backend wraps on top of existing > > state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may > > expect more) and delegates state changes to the underlying state > backends. > > This thread is to discuss the problem of how Changelog StateBackend > should > > be enabled and configured. > > > > Proposed options to enable/config state changelog is listed below: > > > > Option 1: Enable Changelog Statebackend through a Boolean Flag > > > > Option 2: Enable Changelog Statebackend through a Boolean Flag + a > Special > > Case > > > > Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O > > ChangelogStateBackend Exposed > > > > Option 4: Explicit Nested Configuration + “changelog.inner” prefix for > > inner backend > > > > Option 5: Explicit Nested Configuration + inner state backend > configuration > > unchanged > > > > Option 6: Config Changelog and Inner Statebackend All-Together > > > > Details of each option can be found here: > > > https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing > > > > When considering these options, please consider these four dimensions: > > 1 Consistency > > API/config should follow a consistent model and should not have > > contradicted logic beneath > > 2 Simplicity > > API should be easy to use and not introduce too much burden on users > > 3. Explicity > > API/config should not contain implicit assumptions and should be > intuitive > > to users > > 4. Extensibility > > With foreseen future, whether the current setting can be easily extended > > > > Please let us know what do you think and please keep the discussion in > this > > mailing thread. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints > > > > Best > > Yuan > |
+1 for option 3.
IMHO persisting (operator's) state data through change log is an independent mechanism which could co-work with all kinds of local state stores (heap and rocksdb). This mechanism is similar to the WAL (write-ahead-log) mechanism in the database system. Although implement-wise we're using wrapper (decorator) pattern and naming it as `ChangeLogStateBackend`, it's not really another type of state backend. For the same reason, ChangeLogStateBackend should be an internal class and not exposed to the end user. Users only need to know / control whether to enable change log or not, just like whether to enable WAL in the traditional database system. Thanks. Best Regards, Yu On Thu, 3 Jun 2021 at 22:50, Piotr Nowojski <[hidden email]> wrote: > Hi, > > I would actually prefer option 6 (or 5/4), for the sake of configuration > being explicit and self explanatory. But at the same time I don't have very > hard preferences and from the remaining options, option 3 seems the most > reasonable. > > The question would be, do we want to expose to the users that > ChangeLogStateBackend is wrapping an inner state backend or not? If not, > option 3 is the best. If we do, if we want to teach the users and help them > build the understanding of how things are working underneath, option 5 or 6 > are better. > > Best, > Piotrek > > śr., 2 cze 2021 o 04:36 Yun Tang <[hidden email]> napisał(a): > > > Hi Yuan, thanks for launching this discussion. > > > > I prefer option-3 as this is the easiest to understand for users. > > > > > > Best > > Yun Tang > > ________________________________ > > From: Roman Khachatryan <[hidden email]> > > Sent: Monday, May 31, 2021 16:53 > > To: dev <[hidden email]> > > Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend > > Configuration Proposal > > > > Hey Yuan, thanks for the proposal > > > > I think Option 3 is the simplest to use and exposes less details than any > > other. > > It's also consistent with the current way of configuring state > > backends, as long as we treat change logging as a common feature > > applicable to any state backend, like e.g. > > state.backend.local-recovery. > > > > Option 6 seems slightly less preferable as it exposes more details but > > I think is the most viable alternative. > > > > Regards, > > Roman > > > > > > On Mon, May 31, 2021 at 8:39 AM Yuan Mei <[hidden email]> wrote: > > > > > > Hey all, > > > > > > We would like to start a discussion on how to enable/config Changelog > > > Statebakcend. > > > > > > As part of FLIP-158[1], Changelog state backend wraps on top of > existing > > > state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may > > > expect more) and delegates state changes to the underlying state > > backends. > > > This thread is to discuss the problem of how Changelog StateBackend > > should > > > be enabled and configured. > > > > > > Proposed options to enable/config state changelog is listed below: > > > > > > Option 1: Enable Changelog Statebackend through a Boolean Flag > > > > > > Option 2: Enable Changelog Statebackend through a Boolean Flag + a > > Special > > > Case > > > > > > Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O > > > ChangelogStateBackend Exposed > > > > > > Option 4: Explicit Nested Configuration + “changelog.inner” prefix for > > > inner backend > > > > > > Option 5: Explicit Nested Configuration + inner state backend > > configuration > > > unchanged > > > > > > Option 6: Config Changelog and Inner Statebackend All-Together > > > > > > Details of each option can be found here: > > > > > > https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing > > > > > > When considering these options, please consider these four dimensions: > > > 1 Consistency > > > API/config should follow a consistent model and should not have > > > contradicted logic beneath > > > 2 Simplicity > > > API should be easy to use and not introduce too much burden on users > > > 3. Explicity > > > API/config should not contain implicit assumptions and should be > > intuitive > > > to users > > > 4. Extensibility > > > With foreseen future, whether the current setting can be easily > extended > > > > > > Please let us know what do you think and please keep the discussion in > > this > > > mailing thread. > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints > > > > > > Best > > > Yuan > > > |
Thank you everyone for replying!
Option 3 wins with dominating # of votes + mine. This option works as a refined version of the original proposal in FLIP-158: Generalized incremental checkpoints [1]: - Define consistent override and combination policy (flag + state backend) in different config levels - Define explicitly the meaning of "enable flag" = true/false/unset - Hide ChangelogStateBackend from users According to the discussion in this thread, we will go with Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O ChangelogStateBackend Exposed [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints Best Yuan On Tue, Jun 8, 2021 at 6:40 PM Yu Li <[hidden email]> wrote: > +1 for option 3. > > IMHO persisting (operator's) state data through change log is an > independent mechanism which could co-work with all kinds of local state > stores (heap and rocksdb). This mechanism is similar to the WAL > (write-ahead-log) mechanism in the database system. Although implement-wise > we're using wrapper (decorator) pattern and naming it as > `ChangeLogStateBackend`, it's not really another type of state backend. For > the same reason, ChangeLogStateBackend should be an internal class and not > exposed to the end user. Users only need to know / control whether to > enable change log or not, just like whether to enable WAL in the > traditional database system. > > Thanks. > > Best Regards, > Yu > > > On Thu, 3 Jun 2021 at 22:50, Piotr Nowojski <[hidden email]> wrote: > > > Hi, > > > > I would actually prefer option 6 (or 5/4), for the sake of configuration > > being explicit and self explanatory. But at the same time I don't have > very > > hard preferences and from the remaining options, option 3 seems the most > > reasonable. > > > > The question would be, do we want to expose to the users that > > ChangeLogStateBackend is wrapping an inner state backend or not? If not, > > option 3 is the best. If we do, if we want to teach the users and help > them > > build the understanding of how things are working underneath, option 5 > or 6 > > are better. > > > > Best, > > Piotrek > > > > śr., 2 cze 2021 o 04:36 Yun Tang <[hidden email]> napisał(a): > > > > > Hi Yuan, thanks for launching this discussion. > > > > > > I prefer option-3 as this is the easiest to understand for users. > > > > > > > > > Best > > > Yun Tang > > > ________________________________ > > > From: Roman Khachatryan <[hidden email]> > > > Sent: Monday, May 31, 2021 16:53 > > > To: dev <[hidden email]> > > > Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend > > > Configuration Proposal > > > > > > Hey Yuan, thanks for the proposal > > > > > > I think Option 3 is the simplest to use and exposes less details than > any > > > other. > > > It's also consistent with the current way of configuring state > > > backends, as long as we treat change logging as a common feature > > > applicable to any state backend, like e.g. > > > state.backend.local-recovery. > > > > > > Option 6 seems slightly less preferable as it exposes more details but > > > I think is the most viable alternative. > > > > > > Regards, > > > Roman > > > > > > > > > On Mon, May 31, 2021 at 8:39 AM Yuan Mei <[hidden email]> > wrote: > > > > > > > > Hey all, > > > > > > > > We would like to start a discussion on how to enable/config Changelog > > > > Statebakcend. > > > > > > > > As part of FLIP-158[1], Changelog state backend wraps on top of > > existing > > > > state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and > may > > > > expect more) and delegates state changes to the underlying state > > > backends. > > > > This thread is to discuss the problem of how Changelog StateBackend > > > should > > > > be enabled and configured. > > > > > > > > Proposed options to enable/config state changelog is listed below: > > > > > > > > Option 1: Enable Changelog Statebackend through a Boolean Flag > > > > > > > > Option 2: Enable Changelog Statebackend through a Boolean Flag + a > > > Special > > > > Case > > > > > > > > Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O > > > > ChangelogStateBackend Exposed > > > > > > > > Option 4: Explicit Nested Configuration + “changelog.inner” prefix > for > > > > inner backend > > > > > > > > Option 5: Explicit Nested Configuration + inner state backend > > > configuration > > > > unchanged > > > > > > > > Option 6: Config Changelog and Inner Statebackend All-Together > > > > > > > > Details of each option can be found here: > > > > > > > > > > https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing > > > > > > > > When considering these options, please consider these four > dimensions: > > > > 1 Consistency > > > > API/config should follow a consistent model and should not have > > > > contradicted logic beneath > > > > 2 Simplicity > > > > API should be easy to use and not introduce too much burden on users > > > > 3. Explicity > > > > API/config should not contain implicit assumptions and should be > > > intuitive > > > > to users > > > > 4. Extensibility > > > > With foreseen future, whether the current setting can be easily > > extended > > > > > > > > Please let us know what do you think and please keep the discussion > in > > > this > > > > mailing thread. > > > > > > > > [1] > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints > > > > > > > > Best > > > > Yuan > > > > > > |
Free forum by Nabble | Edit this page |