[Discuss] State Backend use external HBase storage

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discuss] State Backend use external HBase storage

shijinkui
Hi, All

At present flink have three state backend: memory, file system, rocksdb.
MemoryStateBackend will tansform the snapshot to jobManager, 5MB limited default. Even setting it bigger, that not suitable for very big state storage.
HDFS can meet the reliability guarantee, but It's slow. File System and RocksDB are fast, but they are have no reliability guarantee.
Three state backend all have no reliability guarantee.

Can we have a Hbase state backend, providing reliability guarantee of state snapshot?
For user, only new a HbaseStateBackend object, provide hbase parameter and optimization configure.
Maybe Hbase or other distributed key-value storage is heavyweight storage, we only use hbase client to read/write asynchronously.

-Jinkui Shi
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] State Backend use external HBase storage

Till Rohrmann
Hi Jinkui,

the file system state backend and the RocksDB state backend can be
configured (and usually should be) such that they store their checkpoint
data on a reliable storage system such as HDFS. Then you also have the
reliability guarantees.

Of course, one can start adding more state backends to Flink. At some point
in time there was the idea to write a Cassandra backed state backend [1],
for example. Similarly, one could think about a HBase backed state backend.

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Cassandra-statebackend-td12690.html


Cheers,
Till

On Wed, Nov 16, 2016 at 3:10 AM, shijinkui <[hidden email]> wrote:

> Hi, All
>
> At present flink have three state backend: memory, file system, rocksdb.
> MemoryStateBackend will tansform the snapshot to jobManager, 5MB limited
> default. Even setting it bigger, that not suitable for very big state
> storage.
> HDFS can meet the reliability guarantee, but It's slow. File System and
> RocksDB are fast, but they are have no reliability guarantee.
> Three state backend all have no reliability guarantee.
>
> Can we have a Hbase state backend, providing reliability guarantee of
> state snapshot?
> For user, only new a HbaseStateBackend object, provide hbase parameter and
> optimization configure.
> Maybe Hbase or other distributed key-value storage is heavyweight storage,
> we only use hbase client to read/write asynchronously.
>
> -Jinkui Shi
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] State Backend use external HBase storage

Jinkui Shi
Hi, Chen Qin
I fount this issue. Does it kicked off?  What’s the current progress?
https://issues.apache.org/jira/browse/FLINK-4266 <https://issues.apache.org/jira/browse/FLINK-4266>

> On Nov 16, 2016, at 19:35, Till Rohrmann <[hidden email]> wrote:
>
> Hi Jinkui,
>
> the file system state backend and the RocksDB state backend can be
> configured (and usually should be) such that they store their checkpoint
> data on a reliable storage system such as HDFS. Then you also have the
> reliability guarantees.
>
> Of course, one can start adding more state backends to Flink. At some point
> in time there was the idea to write a Cassandra backed state backend [1],
> for example. Similarly, one could think about a HBase backed state backend.
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Cassandra-statebackend-td12690.html
>
>
> Cheers,
> Till
>
> On Wed, Nov 16, 2016 at 3:10 AM, shijinkui <[hidden email]> wrote:
>
>> Hi, All
>>
>> At present flink have three state backend: memory, file system, rocksdb.
>> MemoryStateBackend will tansform the snapshot to jobManager, 5MB limited
>> default. Even setting it bigger, that not suitable for very big state
>> storage.
>> HDFS can meet the reliability guarantee, but It's slow. File System and
>> RocksDB are fast, but they are have no reliability guarantee.
>> Three state backend all have no reliability guarantee.
>>
>> Can we have a Hbase state backend, providing reliability guarantee of
>> state snapshot?
>> For user, only new a HbaseStateBackend object, provide hbase parameter and
>> optimization configure.
>> Maybe Hbase or other distributed key-value storage is heavyweight storage,
>> we only use hbase client to read/write asynchronously.
>>
>> -Jinkui Shi
>>

Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] State Backend use external HBase storage

chenqin
Hi Jinkui,
Remote state backend is in discussion phase, we send out design for some
times.
Due to the fact it will be impacted with dynamic scaling and expected non
partition state changes, we decided to revisit after dusts settled.

Thanks,
Chen

On Thu, Nov 17, 2016 at 2:23 AM, sjk <[hidden email]> wrote:

> Hi, Chen Qin
> I fount this issue. Does it kicked off?  What’s the current progress?
> https://issues.apache.org/jira/browse/FLINK-4266
>
> On Nov 16, 2016, at 19:35, Till Rohrmann <[hidden email]> wrote:
>
> Hi Jinkui,
>
> the file system state backend and the RocksDB state backend can be
> configured (and usually should be) such that they store their checkpoint
> data on a reliable storage system such as HDFS. Then you also have the
> reliability guarantees.
>
> Of course, one can start adding more state backends to Flink. At some point
> in time there was the idea to write a Cassandra backed state backend [1],
> for example. Similarly, one could think about a HBase backed state backend.
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Cassandra-
> statebackend-td12690.html
>
>
> Cheers,
> Till
>
> On Wed, Nov 16, 2016 at 3:10 AM, shijinkui <[hidden email]> wrote:
>
> Hi, All
>
> At present flink have three state backend: memory, file system, rocksdb.
> MemoryStateBackend will tansform the snapshot to jobManager, 5MB limited
> default. Even setting it bigger, that not suitable for very big state
> storage.
> HDFS can meet the reliability guarantee, but It's slow. File System and
> RocksDB are fast, but they are have no reliability guarantee.
> Three state backend all have no reliability guarantee.
>
> Can we have a Hbase state backend, providing reliability guarantee of
> state snapshot?
> For user, only new a HbaseStateBackend object, provide hbase parameter and
> optimization configure.
> Maybe Hbase or other distributed key-value storage is heavyweight storage,
> we only use hbase client to read/write asynchronously.
>
> -Jinkui Shi
>
>
>