Hi,
I am exploring if we can plugin hbase as state backend in Flink. We have need for streaming jobs with large window states, high throughput and reliability. I wanted to know if implementing Flink backend in Hbase or other distributed KV store is possible. Any documentation or pointers will be helpful. Thanks, Naveen |
Hi!
While certainly possible I think it’s a bad idea in general. I think state size itself shouldn’t be a problem with the RocksDb backend as you can always increase parallelism to shard more while keeping the insanely good performance compared to a remote kv store. We and other users have successfully used rocksdb state backend with incremental snapshots with several terabytes of state in production for years. The only main advantage I see for hbase and similar kvstores as statebackend is the instant recovery you get but even in that case you probably want an implementation that combines an embedded and remote kv store. Also the rocskdb backend without any external dependency will be infinitely more reliable in practice. Cheers Gyula On Thu, 27 Dec 2018 at 17:17, Naveen Kumar <[hidden email]> wrote: > Hi, > > I am exploring if we can plugin hbase as state backend in Flink. We have > need for streaming jobs with large window states, high throughput and > reliability. > > I wanted to know if implementing Flink backend in Hbase or other > distributed KV store is possible. Any documentation or pointers will be > helpful. > > Thanks, > Naveen > |
In reply to this post by Naveen Kumar
Did try to use rocksdb[1] as state backend?
1. https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend On Thu, 27 Dec 2018, 18:17 Naveen Kumar <[hidden email] wrote: > Hi, > > I am exploring if we can plugin hbase as state backend in Flink. We have > need for streaming jobs with large window states, high throughput and > reliability. > > I wanted to know if implementing Flink backend in Hbase or other > distributed KV store is possible. Any documentation or pointers will be > helpful. > > Thanks, > Naveen > |
Hi Naveen,
AFAIK, there are two level of storage in typical statebackend (local/remote). I think it kinda similar to what PC main memory and disk analogy. Take RocksDB Statebackend as example, window state (typical very large ListState) persisted in partitioned local rocksdb files, adding element to window is localized and cheap.When checkpoint starts, each of those rocksdb do upload to corresponding HDFS directories separately.This is good in a sense when any intermediate states between two successful checkpoints can be overwritten and local snapshots can be done cheaply and asynchronously. I heard folks tried to build mysqlbackend(deprecated), remote rocksdb as service backend(hard to scale and performance bottleneck) , Cassandra(hard to snapshot). All of which shares same trait on lack of local parallelizable snapshot semantic. Hope this helps! Chen On Thu, Dec 27, 2018 at 8:27 AM miki haiat <[hidden email]> wrote: > Did try to use rocksdb[1] as state backend? > > > 1. > > https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend > > > On Thu, 27 Dec 2018, 18:17 Naveen Kumar <[hidden email] > .invalid > wrote: > > > Hi, > > > > I am exploring if we can plugin hbase as state backend in Flink. We have > > need for streaming jobs with large window states, high throughput and > > reliability. > > > > I wanted to know if implementing Flink backend in Hbase or other > > distributed KV store is possible. Any documentation or pointers will be > > helpful. > > > > Thanks, > > Naveen > > > |
FWIW, one major advantage of adopting HBase as Flink statebackend is to
support direct read/write on DFS, so as to disaggregate storage and compute (DisAgg). DisAgg has several benefits, such as supporting elastic computing in cloud, much better (order of magnitude) recovery speed when rescaling up/down (as Gyula also mentioned), etc. and we could eliminate the performance regression compared to local RW through techniques like adding a local L2 cache. More information please refer to our talk <https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf> at this year's Flink Forward China, and we could discuss more in another thread if interested. Back to @Naveen's question here, we need to make HBase supporting embedded mode first before adopting it as Flink statebackend. We have done some initial work and please refer to HBASE-17743 <https://issues.apache.org/jira/browse/HBASE-17743> and the design doc there for more details. And for sure we will upstream our work when ready to (smile). Best Regards, Yu On Fri, 28 Dec 2018 at 13:12, Chen Qin <[hidden email]> wrote: > Hi Naveen, > > AFAIK, there are two level of storage in typical statebackend > (local/remote). I think it kinda similar to what PC main memory and disk > analogy. > > Take RocksDB Statebackend as example, window state (typical very large > ListState) persisted in partitioned local rocksdb files, adding element to > window is localized and cheap.When checkpoint starts, each of those rocksdb > do upload to corresponding HDFS directories separately.This is good in a > sense when any intermediate states between two successful checkpoints can > be overwritten and local snapshots can be done cheaply and asynchronously. > > I heard folks tried to build mysqlbackend(deprecated), remote rocksdb as > service backend(hard to scale and performance bottleneck) , Cassandra(hard > to snapshot). All of which shares same trait on lack of local > parallelizable snapshot semantic. > > Hope this helps! > Chen > > On Thu, Dec 27, 2018 at 8:27 AM miki haiat <[hidden email]> wrote: > > > Did try to use rocksdb[1] as state backend? > > > > > > 1. > > > > > https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend > > > > > > On Thu, 27 Dec 2018, 18:17 Naveen Kumar <[hidden email] > > .invalid > > wrote: > > > > > Hi, > > > > > > I am exploring if we can plugin hbase as state backend in Flink. We > have > > > need for streaming jobs with large window states, high throughput and > > > reliability. > > > > > > I wanted to know if implementing Flink backend in Hbase or other > > > distributed KV store is possible. Any documentation or pointers will be > > > helpful. > > > > > > Thanks, > > > Naveen > > > > > > |
Hi Yu,
Very cool! I might be out of dated of what’s new in Flink already… Just wonder If there are efforts to support seconds level barrier alignment? Chen > On Dec 27, 2018, at 23:26, Yu Li <[hidden email]> wrote: > > FWIW, one major advantage of adopting HBase as Flink statebackend is to > support direct read/write on DFS, so as to disaggregate storage and compute > (DisAgg). DisAgg has several benefits, such as supporting elastic > computing in cloud, much better (order of magnitude) recovery speed when > rescaling up/down (as Gyula also mentioned), etc. and we could eliminate > the performance regression compared to local RW through techniques like > adding a local L2 cache. More information please refer to our talk > <https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf> > at this year's Flink Forward China, and we could discuss more in another > thread if interested. > > Back to @Naveen's question here, we need to make HBase supporting embedded > mode first before adopting it as Flink statebackend. We have done some > initial work and please refer to HBASE-17743 > <https://issues.apache.org/jira/browse/HBASE-17743> and the design doc > there for more details. And for sure we will upstream our work when ready > to (smile). > > Best Regards, > Yu > > > On Fri, 28 Dec 2018 at 13:12, Chen Qin <[hidden email]> wrote: > >> Hi Naveen, >> >> AFAIK, there are two level of storage in typical statebackend >> (local/remote). I think it kinda similar to what PC main memory and disk >> analogy. >> >> Take RocksDB Statebackend as example, window state (typical very large >> ListState) persisted in partitioned local rocksdb files, adding element to >> window is localized and cheap.When checkpoint starts, each of those rocksdb >> do upload to corresponding HDFS directories separately.This is good in a >> sense when any intermediate states between two successful checkpoints can >> be overwritten and local snapshots can be done cheaply and asynchronously. >> >> I heard folks tried to build mysqlbackend(deprecated), remote rocksdb as >> service backend(hard to scale and performance bottleneck) , Cassandra(hard >> to snapshot). All of which shares same trait on lack of local >> parallelizable snapshot semantic. >> >> Hope this helps! >> Chen >> >> On Thu, Dec 27, 2018 at 8:27 AM miki haiat <[hidden email]> wrote: >> >>> Did try to use rocksdb[1] as state backend? >>> >>> >>> 1. >>> >>> >> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend >>> >>> >>> On Thu, 27 Dec 2018, 18:17 Naveen Kumar <[hidden email] >>> .invalid >>> wrote: >>> >>>> Hi, >>>> >>>> I am exploring if we can plugin hbase as state backend in Flink. We >> have >>>> need for streaming jobs with large window states, high throughput and >>>> reliability. >>>> >>>> I wanted to know if implementing Flink backend in Hbase or other >>>> distributed KV store is possible. Any documentation or pointers will be >>>> helpful. >>>> >>>> Thanks, >>>> Naveen >>>> >>> >> |
Free forum by Nabble | Edit this page |