Hi there,
What is progress on incremental checkpointing? Does flink dev has plan to work on this or JIRA to track this? super interested to know. I also research and consider use rocksdbstatebackend without running HDFS cluster nor talk to S3. Some primitive idea is to use ZK to store / notify state propagation progress and propagate via implement chain replication on top of YARN provisioned storage node. Thanks, Chen |
You might also consider support for a Bigtable
backend: HBase/Accumulo/Cassandra. The data model should be similar (identical?) to RocksDB and you get HA, recoverability, and support for really large state "for free". On Thursday, June 9, 2016, Chen Qin <[hidden email]> wrote: > Hi there, > > What is progress on incremental checkpointing? Does flink dev has plan to > work on this or JIRA to track this? super interested to know. > > I also research and consider use rocksdbstatebackend without running HDFS > cluster nor talk to S3. Some primitive idea is to use ZK to store / notify > state propagation progress and propagate via implement chain replication on > top of YARN provisioned storage node. > > Thanks, > Chen > |
Cassandra backend would be interesting especially if flink could benefit
from cassandra data locality. Cassandra/spark integration is using this for information to schedule spark tasks. On 9 June 2016 at 19:55, Nick Dimiduk <[hidden email]> wrote: > You might also consider support for a Bigtable > backend: HBase/Accumulo/Cassandra. The data model should be similar > (identical?) to RocksDB and you get HA, recoverability, and support for > really large state "for free". > > On Thursday, June 9, 2016, Chen Qin <[hidden email]> wrote: > > > Hi there, > > > > What is progress on incremental checkpointing? Does flink dev has plan to > > work on this or JIRA to track this? super interested to know. > > > > I also research and consider use rocksdbstatebackend without running HDFS > > cluster nor talk to S3. Some primitive idea is to use ZK to store / > notify > > state propagation progress and propagate via implement chain replication > on > > top of YARN provisioned storage node. > > > > Thanks, > > Chen > > > |
IIRC, all the above support data locality from back in the MR days. Not
sure how much data you're planning to checkpoint though -- is locality really that important for transient processor state? On Thu, Jun 9, 2016 at 11:06 AM, CPC <[hidden email]> wrote: > Cassandra backend would be interesting especially if flink could benefit > from cassandra data locality. Cassandra/spark integration is using this for > information to schedule spark tasks. > > On 9 June 2016 at 19:55, Nick Dimiduk <[hidden email]> wrote: > > > You might also consider support for a Bigtable > > backend: HBase/Accumulo/Cassandra. The data model should be similar > > (identical?) to RocksDB and you get HA, recoverability, and support for > > really large state "for free". > > > > On Thursday, June 9, 2016, Chen Qin <[hidden email]> wrote: > > > > > Hi there, > > > > > > What is progress on incremental checkpointing? Does flink dev has plan > to > > > work on this or JIRA to track this? super interested to know. > > > > > > I also research and consider use rocksdbstatebackend without running > HDFS > > > cluster nor talk to S3. Some primitive idea is to use ZK to store / > > notify > > > state propagation progress and propagate via implement chain > replication > > on > > > top of YARN provisioned storage node. > > > > > > Thanks, > > > Chen > > > > > > |
Hi!
The incremental checkpointing is still being worked upon. Aljoscha, Till and me have thought through this a lot and have now a pretty good understanding how we want to do it with respect to coordination, savepoints, restore, garbage collecting unneeded checkpoints, etc. We want to put this into a design doc as soon as possible, and we'd be happy to take input and discussion on the design. Please stay tuned for a little bit... Greetings, Stephan On Thu, Jun 9, 2016 at 8:42 PM, Nick Dimiduk <[hidden email]> wrote: > IIRC, all the above support data locality from back in the MR days. Not > sure how much data you're planning to checkpoint though -- is locality > really that important for transient processor state? > > On Thu, Jun 9, 2016 at 11:06 AM, CPC <[hidden email]> wrote: > > > Cassandra backend would be interesting especially if flink could benefit > > from cassandra data locality. Cassandra/spark integration is using this > for > > information to schedule spark tasks. > > > > On 9 June 2016 at 19:55, Nick Dimiduk <[hidden email]> wrote: > > > > > You might also consider support for a Bigtable > > > backend: HBase/Accumulo/Cassandra. The data model should be similar > > > (identical?) to RocksDB and you get HA, recoverability, and support for > > > really large state "for free". > > > > > > On Thursday, June 9, 2016, Chen Qin <[hidden email]> wrote: > > > > > > > Hi there, > > > > > > > > What is progress on incremental checkpointing? Does flink dev has > plan > > to > > > > work on this or JIRA to track this? super interested to know. > > > > > > > > I also research and consider use rocksdbstatebackend without running > > HDFS > > > > cluster nor talk to S3. Some primitive idea is to use ZK to store / > > > notify > > > > state propagation progress and propagate via implement chain > > replication > > > on > > > > top of YARN provisioned storage node. > > > > > > > > Thanks, > > > > Chen > > > > > > > > > > |
Thanks everyone, we were reasoning about the expense of drawing snapshots
of large state as a major benenfits to using rocksdb compare to jdbc backend. Our use case is money related event processing. It requires keeping weeks long large window, major data source ingestion QPS is around hundreds, another source that instrument data validation comes in out of order fashion, often largely delayed till payout window cut off force relative small amount of related messaged eviction. Again, we are very excited to hear about design and glad to provide some feedback. Thanks, Chen On Fri, Jun 10, 2016 at 7:24 AM, Stephan Ewen <[hidden email]> wrote: > Hi! > > The incremental checkpointing is still being worked upon. Aljoscha, Till > and me have thought through this a lot and have now a pretty good > understanding how we want to do it with respect to coordination, > savepoints, restore, garbage collecting unneeded checkpoints, etc. > > We want to put this into a design doc as soon as possible, and we'd be > happy to take input and discussion on the design. Please stay tuned for a > little bit... > > Greetings, > Stephan > > > On Thu, Jun 9, 2016 at 8:42 PM, Nick Dimiduk <[hidden email]> wrote: > > > IIRC, all the above support data locality from back in the MR days. Not > > sure how much data you're planning to checkpoint though -- is locality > > really that important for transient processor state? > > > > On Thu, Jun 9, 2016 at 11:06 AM, CPC <[hidden email]> wrote: > > > > > Cassandra backend would be interesting especially if flink could > benefit > > > from cassandra data locality. Cassandra/spark integration is using this > > for > > > information to schedule spark tasks. > > > > > > On 9 June 2016 at 19:55, Nick Dimiduk <[hidden email]> wrote: > > > > > > > You might also consider support for a Bigtable > > > > backend: HBase/Accumulo/Cassandra. The data model should be similar > > > > (identical?) to RocksDB and you get HA, recoverability, and support > for > > > > really large state "for free". > > > > > > > > On Thursday, June 9, 2016, Chen Qin <[hidden email]> wrote: > > > > > > > > > Hi there, > > > > > > > > > > What is progress on incremental checkpointing? Does flink dev has > > plan > > > to > > > > > work on this or JIRA to track this? super interested to know. > > > > > > > > > > I also research and consider use rocksdbstatebackend without > running > > > HDFS > > > > > cluster nor talk to S3. Some primitive idea is to use ZK to store / > > > > notify > > > > > state propagation progress and propagate via implement chain > > > replication > > > > on > > > > > top of YARN provisioned storage node. > > > > > > > > > > Thanks, > > > > > Chen > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |