|
Hello fellow squirrels!
We just made a PR [1] of a prototype targeting flexible state management for streaming tasks with the prospect of further implementing on top different strategies such as lazy state updates, incremental snapshots and state partitioning. You can read more regarding the motivation behind this design in the doc Gyula created [2]. As described in the doc, "managed" operator state and updates are explicitly specified by the OperatorState abstraction. Furthermore, OperatorState can be partitioned by a key and retrieved by the getState method. If the state is partitioned the getState method implicitly returns the state of the respective key that is currently processed by the operator. It will still be possible to add functionality in order to access keys arbitrarily if needed but for now this looks hopefully clear enough. Let's discuss here how we can use/modify this api to fit our general needs and vision for state management. Paris [1] https://github.com/apache/flink/pull/747 [2] https://docs.google.com/document/d/1nTn4Tpafsnt-TCT6L1vlHtGGgRevU90yRsUQEmkRMjk ? |
|
Hi!
Sorry for not responding earlier. Thanks for taking an initiative there and drafting this. I would like to have a close look after the 0.9 release is out. I am in favor of making this a major item for 0.10, to get the state with all its characteristics defined in a principles fashion: repartitionable, synchronous vs. asynchronous checkpointing, full vs incremental checkpoints. Greetings, Stephan On Fri, May 29, 2015 at 2:13 PM, Paris Carbone <[hidden email]> wrote: > Hello fellow squirrels! > > > We just made a PR [1] of a prototype targeting flexible state management > for streaming tasks with the prospect of further implementing on top > different strategies such as lazy state updates, incremental snapshots and > state partitioning. You can read more regarding the motivation behind this > design in the doc Gyula created [2]. > > > As described in the doc, "managed" operator state and updates are > explicitly specified by the OperatorState abstraction. Furthermore, > OperatorState can be partitioned by a key and retrieved by the getState > method. If the state is partitioned the getState method implicitly returns > the state of the respective key that is currently processed by the > operator. It will still be possible to add functionality in order to access > keys arbitrarily if needed but for now this looks hopefully clear enough. > > > Let's discuss here how we can use/modify this api to fit our general needs > and vision for state management. > > > Paris > > > [1] https://github.com/apache/flink/pull/747 > > [2] > https://docs.google.com/document/d/1nTn4Tpafsnt-TCT6L1vlHtGGgRevU90yRsUQEmkRMjk > > ? > > |
| Free forum by Nabble | Edit this page |
