(DEPRECATED) Apache Flink Mailing List archive.

Is Flink's recovery speed still slow?

Classic

List

Threaded

5 messages Options

jiaxl

Is Flink's recovery speed still slow?

This post was updated on .

From conclusion of this paper https://dl.acm.org/citation.cfm?id=3132750
<https://dl.acm.org/citation.cfm?id=3132750http://> , Flink's recovery
speed is slower than that of Spark Streaming, which will be a problem in
large scale deployment where fault happens frequently.
I'd like to know whether this is still a problem or not. Any advices are
appreciated.

--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

vino yang

Re: is Flink's recovery speed still slow?

Hi jiaxl,

The paper you mentioned was published at 2017. I think it doesn't have much
reference value now.
Over time, both frameworks are constantly evolving.
At the end of May this year, Flink has supported the major feature of local
recovery in the latest release of version 1.5.
This greatly improves the speed of recovery.
Flink has not stopped the improvement of state recovery and fault
tolerance.
I think you can verify it yourself.

Thanks, vino.

2018-07-24 23:15 GMT+08:00 jiaxl <[hidden email]>:

> From conclusion of this paper https://dl.acm.org/citation.cfm?id=3132750
> <https://dl.acm.org/citation.cfm?id=3132750http://> , Flink's recovery
> speed is slower than that of Spark Streaming, which will be a problem in
> large scale deployment where fault happens frequently.
> I'd like to know whether this is still a problem or not. Any advices are
> appreciated.
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>

chenqin

Re: is Flink's recovery speed still slow?

As far as I learned from folks with better understanding than myself , barrier alignment might be only path to get deterministic output.

Any state or outcome between barrier alignments requires second thought(like UDP packages from network). Currently, alignment is used only do heavyweight checkpointing. If folks decided to improve algorithm and use in other ways like auto scaling or secondary task shadowing is still TBD.

Chen

> On Jul 24, 2018, at 18:57, vino yang <[hidden email]> wrote:
>
> Hi jiaxl,
>
> The paper you mentioned was published at 2017. I think it doesn't have much
> reference value now.
> Over time, both frameworks are constantly evolving.
> At the end of May this year, Flink has supported the major feature of local
> recovery in the latest release of version 1.5.
> This greatly improves the speed of recovery.
> Flink has not stopped the improvement of state recovery and fault
> tolerance.
> I think you can verify it yourself.
>
> Thanks, vino.
>
>
> 2018-07-24 23:15 GMT+08:00 jiaxl <[hidden email]>:
>
>> From conclusion of this paper https://dl.acm.org/citation.cfm?id=3132750
>> <https://dl.acm.org/citation.cfm?id=3132750http://> , Flink's recovery
>> speed is slower than that of Spark Streaming, which will be a problem in
>> large scale deployment where fault happens frequently.
>> I'd like to know whether this is still a problem or not. Any advices are
>> appreciated.
>>
>>
>>
>> --
>> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>>

jiaxl

Re: is Flink's recovery speed still slow?

In reply to this post by vino yang

Hi vino,

Thanks for your early reply.

Since 2017, developers of Flink have done great job to improve the
performance. But I didn't find papers or blogs as a response to that paper.
So I asked this question here.
Before asking this question, I was doing some experiment with Flink 1.5.1.
But as you know, it takes some time to tune the system to its best state and
then experiment can be done. So I expect that some experienced developers
may have done some related research to share.

Thanks again, jiaxl

--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

vino yang

Re: is Flink's recovery speed still slow?

Hi jiaxl,

Thanks for your verification!

Yes, Flink is growing very fast. There is really not much benchmark or blog
to explore this topic, after all, the local recovery feature is released in
version 1.5. The time point is not long before, and this part is still
being improved and not very mature.

Thanks, vino.

2018-07-25 19:16 GMT+08:00 jiaxl <[hidden email]>:

> Hi vino,
>
> Thanks for your early reply.
>
> Since 2017, developers of Flink have done great job to improve the
> performance. But I didn't find papers or blogs as a response to that paper.
> So I asked this question here.
> Before asking this question, I was doing some experiment with Flink 1.5.1.
> But as you know, it takes some time to tune the system to its best state
> and
> then experiment can be done. So I expect that some experienced developers
> may have done some related research to share.
>
>
> Thanks again, jiaxl
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>