[DISCUSS] Proposal to support disk spilling in HeapKeyedStateBackend

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Proposal to support disk spilling in HeapKeyedStateBackend

Yu Li
Hi All,

As mentioned in our speak[1] given in FlinkForwardChina2018, we have
improved HeapKeyedStateBackend to support disk spilling and put it in
production here in Alibaba for last year's Singles' Day. Now we're ready to
upstream our work and the design doc is up for review[2]. Please let us
know your point of the feature and any comment is welcomed/appreciated.

We plan to keep the discussion open for at least 72 hours, and will create
umbrella jira and subtasks if no objections. Thanks.

Below is a brief description about the motivation of the work, FYI:


*HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink, since
state lives as Java objects on the heap in HeapKeyedStateBackend and the
de/serialization only happens during state snapshot and restore, it
outperforms RocksDBKeyeStateBackend when all data could reside in
memory.**However,
along with the advantage, HeapKeyedStateBackend also has its shortcomings,
and the most painful one is the difficulty to estimate the maximum heap
size (Xmx) to set, and we will suffer from GC impact once the heap memory
is not enough to hold all state data. There’re several (inevitable) causes
for such scenario, including (but not limited to):*



** Memory overhead of Java object representation (tens of times of the
serialized data size).* Data flood caused by burst traffic.* Data
accumulation caused by source malfunction.**To resolve this problem, we
proposed a solution to support spilling state data to disk before heap
memory is exhausted. We will monitor the heap usage and choose the coldest
data to spill, and reload them when heap memory is regained after data
removing or TTL expiration, automatically.*

[1] https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf
[2]
https://docs.google.com/document/d/1rtWQjIQ-tYWt0lTkZYdqTM6gQUleV8AXrfTOyWUZMf4/edit?usp=sharing

Best Regards,
Yu
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Proposal to support disk spilling in HeapKeyedStateBackend

Stefan Richter-2
Hi Yu,

Sorry for the late reaction. As already discussed internally, I think this is a very good proposal and design that can help to improve a major limitation of the current state backend. I think that most discussion is happening in the design doc and I left my comments there. Looking forward to seeing this integrated with Flink soon!

Best,
Stefan

> On 24. May 2019, at 14:50, Yu Li <[hidden email]> wrote:
>
> Hi All,
>
> As mentioned in our speak[1] given in FlinkForwardChina2018, we have improved HeapKeyedStateBackend to support disk spilling and put it in production here in Alibaba for last year's Singles' Day. Now we're ready to upstream our work and the design doc is up for review[2]. Please let us know your point of the feature and any comment is welcomed/appreciated.
>
> We plan to keep the discussion open for at least 72 hours, and will create umbrella jira and subtasks if no objections. Thanks.
>
> Below is a brief description about the motivation of the work, FYI:
>
> HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink, since state lives as Java objects on the heap in HeapKeyedStateBackend and the de/serialization only happens during state snapshot and restore, it outperforms RocksDBKeyeStateBackend when all data could reside in memory.
> However, along with the advantage, HeapKeyedStateBackend also has its shortcomings, and the most painful one is the difficulty to estimate the maximum heap size (Xmx) to set, and we will suffer from GC impact once the heap memory is not enough to hold all state data. There’re several (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we proposed a solution to support spilling state data to disk before heap memory is exhausted. We will monitor the heap usage and choose the coldest data to spill, and reload them when heap memory is regained after data removing or TTL expiration, automatically.
>
> [1] https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf <https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf>
> [2] https://docs.google.com/document/d/1rtWQjIQ-tYWt0lTkZYdqTM6gQUleV8AXrfTOyWUZMf4/edit?usp=sharing <https://docs.google.com/document/d/1rtWQjIQ-tYWt0lTkZYdqTM6gQUleV8AXrfTOyWUZMf4/edit?usp=sharing>
> Best Regards,
> Yu

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Proposal to support disk spilling in HeapKeyedStateBackend

Danny Chan
In reply to this post by Yu Li
+1, thanks for you nice work, Yu Li !

Best,
Danny Chan
在 2019年5月24日 +0800 PM8:51,Yu Li <[hidden email]>,写道:

> Hi All,
>
> As mentioned in our speak[1] given in FlinkForwardChina2018, we have
> improved HeapKeyedStateBackend to support disk spilling and put it in
> production here in Alibaba for last year's Singles' Day. Now we're ready to
> upstream our work and the design doc is up for review[2]. Please let us
> know your point of the feature and any comment is welcomed/appreciated.
>
> We plan to keep the discussion open for at least 72 hours, and will create
> umbrella jira and subtasks if no objections. Thanks.
>
> Below is a brief description about the motivation of the work, FYI:
>
>
> *HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink, since
> state lives as Java objects on the heap in HeapKeyedStateBackend and the
> de/serialization only happens during state snapshot and restore, it
> outperforms RocksDBKeyeStateBackend when all data could reside in
> memory.**However,
> along with the advantage, HeapKeyedStateBackend also has its shortcomings,
> and the most painful one is the difficulty to estimate the maximum heap
> size (Xmx) to set, and we will suffer from GC impact once the heap memory
> is not enough to hold all state data. There’re several (inevitable) causes
> for such scenario, including (but not limited to):*
>
>
>
> ** Memory overhead of Java object representation (tens of times of the
> serialized data size).* Data flood caused by burst traffic.* Data
> accumulation caused by source malfunction.**To resolve this problem, we
> proposed a solution to support spilling state data to disk before heap
> memory is exhausted. We will monitor the heap usage and choose the coldest
> data to spill, and reload them when heap memory is regained after data
> removing or TTL expiration, automatically.*
>
> [1] https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf
> [2]
> https://docs.google.com/document/d/1rtWQjIQ-tYWt0lTkZYdqTM6gQUleV8AXrfTOyWUZMf4/edit?usp=sharing
>
> Best Regards,
> Yu
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Proposal to support disk spilling in HeapKeyedStateBackend

Tzu-Li (Gordon) Tai
Hi Yu,

+1 to move forward with efforts to add this feature.

As mentioned in the document as well as some offline discussions, from my
side the only comments I have are related to how we snapshot the off-heap
key groups.
I think a recent discussion I posted about savepoint format unification for
keyed state as well as reworking abstractions for snapshot strategies [1]
will be relevant here.

Cheers,
Gordon

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-41-Unify-Keyed-State-Snapshot-Binary-Format-for-Savepoints-td29197.html

On Wed, May 29, 2019 at 5:08 PM Yuzhao Chen <[hidden email]> wrote:

> +1, thanks for you nice work, Yu Li !
>
> Best,
> Danny Chan
> 在 2019年5月24日 +0800 PM8:51,Yu Li <[hidden email]>,写道:
> > Hi All,
> >
> > As mentioned in our speak[1] given in FlinkForwardChina2018, we have
> > improved HeapKeyedStateBackend to support disk spilling and put it in
> > production here in Alibaba for last year's Singles' Day. Now we're ready
> to
> > upstream our work and the design doc is up for review[2]. Please let us
> > know your point of the feature and any comment is welcomed/appreciated.
> >
> > We plan to keep the discussion open for at least 72 hours, and will
> create
> > umbrella jira and subtasks if no objections. Thanks.
> >
> > Below is a brief description about the motivation of the work, FYI:
> >
> >
> > *HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink,
> since
> > state lives as Java objects on the heap in HeapKeyedStateBackend and the
> > de/serialization only happens during state snapshot and restore, it
> > outperforms RocksDBKeyeStateBackend when all data could reside in
> > memory.**However,
> > along with the advantage, HeapKeyedStateBackend also has its
> shortcomings,
> > and the most painful one is the difficulty to estimate the maximum heap
> > size (Xmx) to set, and we will suffer from GC impact once the heap memory
> > is not enough to hold all state data. There’re several (inevitable)
> causes
> > for such scenario, including (but not limited to):*
> >
> >
> >
> > ** Memory overhead of Java object representation (tens of times of the
> > serialized data size).* Data flood caused by burst traffic.* Data
> > accumulation caused by source malfunction.**To resolve this problem, we
> > proposed a solution to support spilling state data to disk before heap
> > memory is exhausted. We will monitor the heap usage and choose the
> coldest
> > data to spill, and reload them when heap memory is regained after data
> > removing or TTL expiration, automatically.*
> >
> > [1]
> https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf
> > [2]
> >
> https://docs.google.com/document/d/1rtWQjIQ-tYWt0lTkZYdqTM6gQUleV8AXrfTOyWUZMf4/edit?usp=sharing
> >
> > Best Regards,
> > Yu
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Proposal to support disk spilling in HeapKeyedStateBackend

Yu Li
Thanks all for the feedbacks here. And thanks for the reminder about the
plan of reworking abstractions of snapshot strategies @Gordon, will
definitely watch it and make sure of the collaboration of the two works.

Since this discussion thread has been open for some time and we all get a
consensus, will close it now and start creating JIRAs. On the other hand,
discussion in design doc will continue for sure and just let me know if you
have any suggestions there or in the coming JIRAs.

Thanks again for all participants in the discussion!

Best Regards,
Yu


On Thu, 30 May 2019 at 15:45, Tzu-Li (Gordon) Tai <[hidden email]>
wrote:

> Hi Yu,
>
> +1 to move forward with efforts to add this feature.
>
> As mentioned in the document as well as some offline discussions, from my
> side the only comments I have are related to how we snapshot the off-heap
> key groups.
> I think a recent discussion I posted about savepoint format unification for
> keyed state as well as reworking abstractions for snapshot strategies [1]
> will be relevant here.
>
> Cheers,
> Gordon
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-41-Unify-Keyed-State-Snapshot-Binary-Format-for-Savepoints-td29197.html
>
> On Wed, May 29, 2019 at 5:08 PM Yuzhao Chen <[hidden email]> wrote:
>
> > +1, thanks for you nice work, Yu Li !
> >
> > Best,
> > Danny Chan
> > 在 2019年5月24日 +0800 PM8:51,Yu Li <[hidden email]>,写道:
> > > Hi All,
> > >
> > > As mentioned in our speak[1] given in FlinkForwardChina2018, we have
> > > improved HeapKeyedStateBackend to support disk spilling and put it in
> > > production here in Alibaba for last year's Singles' Day. Now we're
> ready
> > to
> > > upstream our work and the design doc is up for review[2]. Please let us
> > > know your point of the feature and any comment is welcomed/appreciated.
> > >
> > > We plan to keep the discussion open for at least 72 hours, and will
> > create
> > > umbrella jira and subtasks if no objections. Thanks.
> > >
> > > Below is a brief description about the motivation of the work, FYI:
> > >
> > >
> > > *HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink,
> > since
> > > state lives as Java objects on the heap in HeapKeyedStateBackend and
> the
> > > de/serialization only happens during state snapshot and restore, it
> > > outperforms RocksDBKeyeStateBackend when all data could reside in
> > > memory.**However,
> > > along with the advantage, HeapKeyedStateBackend also has its
> > shortcomings,
> > > and the most painful one is the difficulty to estimate the maximum heap
> > > size (Xmx) to set, and we will suffer from GC impact once the heap
> memory
> > > is not enough to hold all state data. There’re several (inevitable)
> > causes
> > > for such scenario, including (but not limited to):*
> > >
> > >
> > >
> > > ** Memory overhead of Java object representation (tens of times of the
> > > serialized data size).* Data flood caused by burst traffic.* Data
> > > accumulation caused by source malfunction.**To resolve this problem, we
> > > proposed a solution to support spilling state data to disk before heap
> > > memory is exhausted. We will monitor the heap usage and choose the
> > coldest
> > > data to spill, and reload them when heap memory is regained after data
> > > removing or TTL expiration, automatically.*
> > >
> > > [1]
> > https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf
> > > [2]
> > >
> >
> https://docs.google.com/document/d/1rtWQjIQ-tYWt0lTkZYdqTM6gQUleV8AXrfTOyWUZMf4/edit?usp=sharing
> > >
> > > Best Regards,
> > > Yu
> >
>