(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Task speculative execution for Flink batch

Classic

List

Threaded

11 messages Options

Tao Yangyu

[DISCUSS] Task speculative execution for Flink batch

Hi everyone,

We propose task speculative execution for Flink batch in this message as
follows.

In the batch mode, the job is usually divided into multiple parallel tasks
executed cross many nodes in the cluster. It is common to encounter the
performance degradation on some nodes due to hardware problems or accident
I/O busy and high CPU load. This kind of degradation can probably cause the
running tasks on the node to be quite slow that is so called long tail
tasks. Although the long tail tasks will not fail, they can severely affect
the total job running time. Flink task scheduler does not take this long
tail problem into account currently.

Here we propose the speculative execution strategy to handle the problem.
The basic idea is to run a copy of task on another node when the original
task is identified to be long tail. In more details, the speculative task
will be triggered when the scheduler detects that the data processing
throughput of a task is much slower than others. The speculative task is
executed in parallel with the original one and share the same failure retry
mechanism. Once either task complete, the scheduler admits its output as
the final result and cancel the other running one. The preliminary
experiments has demonstrated the effectiveness.

The detailed design doc will be ready soon. Your reviews and comments will
be much appreciated.

Thanks!

Ryan

Zhijiang(wangzhijiang999)

回复：[DISCUSS] Task speculative execution for Flink batch

Thanks yangyu for launching this discussion.

I really like this proposal. We ever found this scene frequently that some long tail tasks to delay the total batch job execution time in production.
We also have some thoughts for bringing this mechanism. Looking forward to your detail design doc, then we can discussion further.

Best,
Zhijiang
------------------------------------------------------------------
发件人：Tao Yangyu <[hidden email]>
发送时间：2018年11月6日(星期二) 11:01
收件人：dev <[hidden email]>
主　题：[DISCUSS] Task speculative execution for Flink batch

Hi everyone,

We propose task speculative execution for Flink batch in this message as
follows.

In the batch mode, the job is usually divided into multiple parallel tasks
executed cross many nodes in the cluster. It is common to encounter the
performance degradation on some nodes due to hardware problems or accident
I/O busy and high CPU load. This kind of degradation can probably cause the
running tasks on the node to be quite slow that is so called long tail
tasks. Although the long tail tasks will not fail, they can severely affect
the total job running time. Flink task scheduler does not take this long
tail problem into account currently.

Here we propose the speculative execution strategy to handle the problem.
The basic idea is to run a copy of task on another node when the original
task is identified to be long tail. In more details, the speculative task
will be triggered when the scheduler detects that the data processing
throughput of a task is much slower than others. The speculative task is
executed in parallel with the original one and share the same failure retry
mechanism. Once either task complete, the scheduler admits its output as
the final result and cancel the other running one. The preliminary
experiments has demonstrated the effectiveness.

The detailed design doc will be ready soon. Your reviews and comments will
be much appreciated.

Thanks!

Ryan

Till Rohrmann

Re: [DISCUSS] Task speculative execution for Flink batch

Thanks for starting this discussion Ryan. I'm looking forward to your
design document about this feature. Quick question: Will it be a batch only
feature? If no, then it needs to take checkpointing into account as well.

Cheers,
Till

On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email]>
wrote:

> Thanks yangyu for launching this discussion.
>
> I really like this proposal. We ever found this scene frequently that some
> long tail tasks to delay the total batch job execution time in production.
> We also have some thoughts for bringing this mechanism. Looking forward to
> your detail design doc, then we can discussion further.
>
> Best,
> Zhijiang
> ------------------------------------------------------------------
> 发件人：Tao Yangyu <[hidden email]>
> 发送时间：2018年11月6日(星期二) 11:01
> 收件人：dev <[hidden email]>
> 主题：[DISCUSS] Task speculative execution for Flink batch
>
> Hi everyone,
>
> We propose task speculative execution for Flink batch in this message as
> follows.
>
> In the batch mode, the job is usually divided into multiple parallel tasks
> executed cross many nodes in the cluster. It is common to encounter the
> performance degradation on some nodes due to hardware problems or accident
> I/O busy and high CPU load. This kind of degradation can probably cause the
> running tasks on the node to be quite slow that is so called long tail
> tasks. Although the long tail tasks will not fail, they can severely affect
> the total job running time. Flink task scheduler does not take this long
> tail problem into account currently.
>
>
>
> Here we propose the speculative execution strategy to handle the problem.
> The basic idea is to run a copy of task on another node when the original
> task is identified to be long tail. In more details, the speculative task
> will be triggered when the scheduler detects that the data processing
> throughput of a task is much slower than others. The speculative task is
> executed in parallel with the original one and share the same failure retry
> mechanism. Once either task complete, the scheduler admits its output as
> the final result and cancel the other running one. The preliminary
> experiments has demonstrated the effectiveness.
>
>
> The detailed design doc will be ready soon. Your reviews and comments will
> be much appreciated.
>
>
> Thanks!
>
> Ryan
>
>

isunjin

Re: [DISCUSS] Task speculative execution for Flink batch

I think this is target for batch at the very beginning, the idea should be also work for both case, with different algorithm/strategy.

Ryan, since you are working on this, I will assign FLINK-10644 <https://issues.apache.org/jira/browse/FLINK-10644> to you.

Jin

> On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]> wrote:
>
> Thanks for starting this discussion Ryan. I'm looking forward to your
> design document about this feature. Quick question: Will it be a batch only
> feature? If no, then it needs to take checkpointing into account as well.
>
> Cheers,
> Till
>
> On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email]>
> wrote:
>
>> Thanks yangyu for launching this discussion.
>>
>> I really like this proposal. We ever found this scene frequently that some
>> long tail tasks to delay the total batch job execution time in production.
>> We also have some thoughts for bringing this mechanism. Looking forward to
>> your detail design doc, then we can discussion further.
>>
>> Best,
>> Zhijiang
>> ------------------------------------------------------------------
>> 发件人：Tao Yangyu <[hidden email]>
>> 发送时间：2018年11月6日(星期二) 11:01
>> 收件人：dev <[hidden email]>
>> 主题：[DISCUSS] Task speculative execution for Flink batch
>>
>> Hi everyone,
>>
>> We propose task speculative execution for Flink batch in this message as
>> follows.
>>
>> In the batch mode, the job is usually divided into multiple parallel tasks
>> executed cross many nodes in the cluster. It is common to encounter the
>> performance degradation on some nodes due to hardware problems or accident
>> I/O busy and high CPU load. This kind of degradation can probably cause the
>> running tasks on the node to be quite slow that is so called long tail
>> tasks. Although the long tail tasks will not fail, they can severely affect
>> the total job running time. Flink task scheduler does not take this long
>> tail problem into account currently.
>>
>>
>>
>> Here we propose the speculative execution strategy to handle the problem.
>> The basic idea is to run a copy of task on another node when the original
>> task is identified to be long tail. In more details, the speculative task
>> will be triggered when the scheduler detects that the data processing
>> throughput of a task is much slower than others. The speculative task is
>> executed in parallel with the original one and share the same failure retry
>> mechanism. Once either task complete, the scheduler admits its output as
>> the final result and cancel the other running one. The preliminary
>> experiments has demonstrated the effectiveness.
>>
>>
>> The detailed design doc will be ready soon. Your reviews and comments will
>> be much appreciated.
>>
>>
>> Thanks!
>>
>> Ryan
>>
>>

Jeff Zhang

Re: [DISCUSS] Task speculative execution for Flink batch

+1 for the speculative execution for Flink batch, Speculative execution is
used in lots of batch execution engine like mr, tez and spark. This would
be a great improvement for Flink in batch scenario.

Jin Sun <[hidden email]>于2018年11月7日周三上午8:38写道：

> I think this is target for batch at the very beginning, the idea should be
> also work for both case, with different algorithm/strategy.
>
> Ryan, since you are working on this, I will assign FLINK-10644 <
> https://issues.apache.org/jira/browse/FLINK-10644> to you.
>
> Jin
>
> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]> wrote:
> >
> > Thanks for starting this discussion Ryan. I'm looking forward to your
> > design document about this feature. Quick question: Will it be a batch
> only
> > feature? If no, then it needs to take checkpointing into account as well.
> >
> > Cheers,
> > Till
> >
> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email]
> .invalid>
> > wrote:
> >
> >> Thanks yangyu for launching this discussion.
> >>
> >> I really like this proposal. We ever found this scene frequently that
> some
> >> long tail tasks to delay the total batch job execution time in
> production.
> >> We also have some thoughts for bringing this mechanism. Looking forward
> to
> >> your detail design doc, then we can discussion further.
> >>
> >> Best,
> >> Zhijiang
> >> ------------------------------------------------------------------
> >> 发件人：Tao Yangyu <[hidden email]>
> >> 发送时间：2018年11月6日(星期二) 11:01
> >> 收件人：dev <[hidden email]>
> >> 主题：[DISCUSS] Task speculative execution for Flink batch
> >>
> >> Hi everyone,
> >>
> >> We propose task speculative execution for Flink batch in this message as
> >> follows.
> >>
> >> In the batch mode, the job is usually divided into multiple parallel
> tasks
> >> executed cross many nodes in the cluster. It is common to encounter the
> >> performance degradation on some nodes due to hardware problems or
> accident
> >> I/O busy and high CPU load. This kind of degradation can probably cause
> the
> >> running tasks on the node to be quite slow that is so called long tail
> >> tasks. Although the long tail tasks will not fail, they can severely
> affect
> >> the total job running time. Flink task scheduler does not take this long
> >> tail problem into account currently.
> >>
> >>
> >>
> >> Here we propose the speculative execution strategy to handle the
> problem.
> >> The basic idea is to run a copy of task on another node when the
> original
> >> task is identified to be long tail. In more details, the speculative
> task
> >> will be triggered when the scheduler detects that the data processing
> >> throughput of a task is much slower than others. The speculative task is
> >> executed in parallel with the original one and share the same failure
> retry
> >> mechanism. Once either task complete, the scheduler admits its output as
> >> the final result and cancel the other running one. The preliminary
> >> experiments has demonstrated the effectiveness.
> >>
> >>
> >> The detailed design doc will be ready soon. Your reviews and comments
> will
> >> be much appreciated.
> >>
> >>
> >> Thanks!
> >>
> >> Ryan
> >>
> >>
>
>

SHI Xiaogang

Re: [DISCUSS] Task speculative execution for Flink batch

Hi,

+1 for the speculative execution.

It will be more great if it can work well with exisitng checkpointing and
pipelined execution. That way, we can move a further step towards the
unification of batch and stream processing.

Regards,
Xiaogang

Jeff Zhang <[hidden email]> 于2018年11月7日周三上午9:40写道：

> +1 for the speculative execution for Flink batch, Speculative execution is
> used in lots of batch execution engine like mr, tez and spark. This would
> be a great improvement for Flink in batch scenario.
>
> Jin Sun <[hidden email]>于2018年11月7日周三上午8:38写道：
>
> > I think this is target for batch at the very beginning, the idea should
> be
> > also work for both case, with different algorithm/strategy.
> >
> > Ryan, since you are working on this, I will assign FLINK-10644 <
> > https://issues.apache.org/jira/browse/FLINK-10644> to you.
> >
> > Jin
> >
> > > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]>
> wrote:
> > >
> > > Thanks for starting this discussion Ryan. I'm looking forward to your
> > > design document about this feature. Quick question: Will it be a batch
> > only
> > > feature? If no, then it needs to take checkpointing into account as
> well.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email]
> > .invalid>
> > > wrote:
> > >
> > >> Thanks yangyu for launching this discussion.
> > >>
> > >> I really like this proposal. We ever found this scene frequently that
> > some
> > >> long tail tasks to delay the total batch job execution time in
> > production.
> > >> We also have some thoughts for bringing this mechanism. Looking
> forward
> > to
> > >> your detail design doc, then we can discussion further.
> > >>
> > >> Best,
> > >> Zhijiang
> > >> ------------------------------------------------------------------
> > >> 发件人：Tao Yangyu <[hidden email]>
> > >> 发送时间：2018年11月6日(星期二) 11:01
> > >> 收件人：dev <[hidden email]>
> > >> 主题：[DISCUSS] Task speculative execution for Flink batch
> > >>
> > >> Hi everyone,
> > >>
> > >> We propose task speculative execution for Flink batch in this message
> as
> > >> follows.
> > >>
> > >> In the batch mode, the job is usually divided into multiple parallel
> > tasks
> > >> executed cross many nodes in the cluster. It is common to encounter
> the
> > >> performance degradation on some nodes due to hardware problems or
> > accident
> > >> I/O busy and high CPU load. This kind of degradation can probably
> cause
> > the
> > >> running tasks on the node to be quite slow that is so called long tail
> > >> tasks. Although the long tail tasks will not fail, they can severely
> > affect
> > >> the total job running time. Flink task scheduler does not take this
> long
> > >> tail problem into account currently.
> > >>
> > >>
> > >>
> > >> Here we propose the speculative execution strategy to handle the
> > problem.
> > >> The basic idea is to run a copy of task on another node when the
> > original
> > >> task is identified to be long tail. In more details, the speculative
> > task
> > >> will be triggered when the scheduler detects that the data processing
> > >> throughput of a task is much slower than others. The speculative task
> is
> > >> executed in parallel with the original one and share the same failure
> > retry
> > >> mechanism. Once either task complete, the scheduler admits its output
> as
> > >> the final result and cancel the other running one. The preliminary
> > >> experiments has demonstrated the effectiveness.
> > >>
> > >>
> > >> The detailed design doc will be ready soon. Your reviews and comments
> > will
> > >> be much appreciated.
> > >>
> > >>
> > >> Thanks!
> > >>
> > >> Ryan
> > >>
> > >>
> >
> >
>

Becket Qin

Re: [DISCUSS] Task speculative execution for Flink batch

+1, Thanks Yangyu for proposing this very useful feature. Looking forward
to the design doc.

On Wed, Nov 7, 2018 at 10:15 AM SHI Xiaogang <[hidden email]> wrote:

> Hi,
>
> +1 for the speculative execution.
>
> It will be more great if it can work well with exisitng checkpointing and
> pipelined execution. That way, we can move a further step towards the
> unification of batch and stream processing.
>
> Regards,
> Xiaogang
>
> Jeff Zhang <[hidden email]> 于2018年11月7日周三上午9:40写道：
>
> > +1 for the speculative execution for Flink batch, Speculative execution
> is
> > used in lots of batch execution engine like mr, tez and spark. This would
> > be a great improvement for Flink in batch scenario.
> >
> > Jin Sun <[hidden email]>于2018年11月7日周三上午8:38写道：
> >
> > > I think this is target for batch at the very beginning, the idea should
> > be
> > > also work for both case, with different algorithm/strategy.
> > >
> > > Ryan, since you are working on this, I will assign FLINK-10644 <
> > > https://issues.apache.org/jira/browse/FLINK-10644> to you.
> > >
> > > Jin
> > >
> > > > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]>
> > wrote:
> > > >
> > > > Thanks for starting this discussion Ryan. I'm looking forward to your
> > > > design document about this feature. Quick question: Will it be a
> batch
> > > only
> > > > feature? If no, then it needs to take checkpointing into account as
> > well.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email]
> > > .invalid>
> > > > wrote:
> > > >
> > > >> Thanks yangyu for launching this discussion.
> > > >>
> > > >> I really like this proposal. We ever found this scene frequently
> that
> > > some
> > > >> long tail tasks to delay the total batch job execution time in
> > > production.
> > > >> We also have some thoughts for bringing this mechanism. Looking
> > forward
> > > to
> > > >> your detail design doc, then we can discussion further.
> > > >>
> > > >> Best,
> > > >> Zhijiang
> > > >> ------------------------------------------------------------------
> > > >> 发件人：Tao Yangyu <[hidden email]>
> > > >> 发送时间：2018年11月6日(星期二) 11:01
> > > >> 收件人：dev <[hidden email]>
> > > >> 主题：[DISCUSS] Task speculative execution for Flink batch
> > > >>
> > > >> Hi everyone,
> > > >>
> > > >> We propose task speculative execution for Flink batch in this
> message
> > as
> > > >> follows.
> > > >>
> > > >> In the batch mode, the job is usually divided into multiple parallel
> > > tasks
> > > >> executed cross many nodes in the cluster. It is common to encounter
> > the
> > > >> performance degradation on some nodes due to hardware problems or
> > > accident
> > > >> I/O busy and high CPU load. This kind of degradation can probably
> > cause
> > > the
> > > >> running tasks on the node to be quite slow that is so called long
> tail
> > > >> tasks. Although the long tail tasks will not fail, they can severely
> > > affect
> > > >> the total job running time. Flink task scheduler does not take this
> > long
> > > >> tail problem into account currently.
> > > >>
> > > >>
> > > >>
> > > >> Here we propose the speculative execution strategy to handle the
> > > problem.
> > > >> The basic idea is to run a copy of task on another node when the
> > > original
> > > >> task is identified to be long tail. In more details, the speculative
> > > task
> > > >> will be triggered when the scheduler detects that the data
> processing
> > > >> throughput of a task is much slower than others. The speculative
> task
> > is
> > > >> executed in parallel with the original one and share the same
> failure
> > > retry
> > > >> mechanism. Once either task complete, the scheduler admits its
> output
> > as
> > > >> the final result and cancel the other running one. The preliminary
> > > >> experiments has demonstrated the effectiveness.
> > > >>
> > > >>
> > > >> The detailed design doc will be ready soon. Your reviews and
> comments
> > > will
> > > >> be much appreciated.
> > > >>
> > > >>
> > > >> Thanks!
> > > >>
> > > >> Ryan
> > > >>
> > > >>
> > >
> > >
> >
>

Tao Yangyu

Re: [DISCUSS] Task speculative execution for Flink batch

In reply to this post by isunjin

Thanks so much for your all feedbacks!

Yes, as mentioned above by Jin Sun, the design currently targets batch to
explore the general framework and basic modules. The strategy could be also
applied to stream with some extended code, for example, the result
commitment.

Jin Sun <[hidden email]> 于2018年11月7日周三上午8:38写道：

Tao Yangyu

Re: [DISCUSS] Task speculative execution for Flink batch

Hi all，

After refined, the detailed design doc is here:
https://docs.google.com/document/d/1X_Pfo4WcO-TEZmmVTTYNn44LQg5gnFeeaeqM7ZNLQ7M/edit?usp=sharing

Your kind reviews and comments are very appreciated and will help so much
the feature to be completed.

Best,
Ryan

Tao Yangyu <[hidden email]> 于2018年11月7日周三下午4:49写道：

> Thanks so much for your all feedbacks!
>
> Yes, as mentioned above by Jin Sun, the design currently targets batch to
> explore the general framework and basic modules. The strategy could be also
> applied to stream with some extended code, for example, the result
> commitment.
>
> Jin Sun <[hidden email]> 于2018年11月7日周三上午8:38写道：
>
>> I think this is target for batch at the very beginning, the idea should
>> be also work for both case, with different algorithm/strategy.
>>
>> Ryan, since you are working on this, I will assign FLINK-10644 <
>> https://issues.apache.org/jira/browse/FLINK-10644> to you.
>>
>> Jin
>>
>> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]> wrote:
>> >
>> > Thanks for starting this discussion Ryan. I'm looking forward to your
>> > design document about this feature. Quick question: Will it be a batch
>> only
>> > feature? If no, then it needs to take checkpointing into account as
>> well.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email]
>> .invalid>
>> > wrote:
>> >
>> >> Thanks yangyu for launching this discussion.
>> >>
>> >> I really like this proposal. We ever found this scene frequently that
>> some
>> >> long tail tasks to delay the total batch job execution time in
>> production.
>> >> We also have some thoughts for bringing this mechanism. Looking
>> forward to
>> >> your detail design doc, then we can discussion further.
>> >>
>> >> Best,
>> >> Zhijiang
>> >> ------------------------------------------------------------------
>> >> 发件人：Tao Yangyu <[hidden email]>
>> >> 发送时间：2018年11月6日(星期二) 11:01
>> >> 收件人：dev <[hidden email]>
>> >> 主题：[DISCUSS] Task speculative execution for Flink batch
>> >>
>> >> Hi everyone,
>> >>
>> >> We propose task speculative execution for Flink batch in this message
>> as
>> >> follows.
>> >>
>> >> In the batch mode, the job is usually divided into multiple parallel
>> tasks
>> >> executed cross many nodes in the cluster. It is common to encounter the
>> >> performance degradation on some nodes due to hardware problems or
>> accident
>> >> I/O busy and high CPU load. This kind of degradation can probably
>> cause the
>> >> running tasks on the node to be quite slow that is so called long tail
>> >> tasks. Although the long tail tasks will not fail, they can severely
>> affect
>> >> the total job running time. Flink task scheduler does not take this
>> long
>> >> tail problem into account currently.
>> >>
>> >>
>> >>
>> >> Here we propose the speculative execution strategy to handle the
>> problem.
>> >> The basic idea is to run a copy of task on another node when the
>> original
>> >> task is identified to be long tail. In more details, the speculative
>> task
>> >> will be triggered when the scheduler detects that the data processing
>> >> throughput of a task is much slower than others. The speculative task
>> is
>> >> executed in parallel with the original one and share the same failure
>> retry
>> >> mechanism. Once either task complete, the scheduler admits its output
>> as
>> >> the final result and cancel the other running one. The preliminary
>> >> experiments has demonstrated the effectiveness.
>> >>
>> >>
>> >> The detailed design doc will be ready soon. Your reviews and comments
>> will
>> >> be much appreciated.
>> >>
>> >>
>> >> Thanks!
>> >>
>> >> Ryan
>> >>
>> >>
>>
>>

Xiaowei Jiang

Re: [DISCUSS] Task speculative execution for Flink batch

Thanks Yangyu for the nice design doc! One thing to consider is the
granularity of speculation. Multiple task may propagate data through
pipeline mode. In such case, fixing a single task may not be enough. But
you might be able to fix this problem by increasing the granularity of
speculation. The traditional case of a single speculative task can be
considered as a special case of this.

Xiaowei

On Sat, Nov 17, 2018 at 10:27 PM Tao Yangyu <[hidden email]> wrote:

> Hi all，
>
> After refined, the detailed design doc is here:
>
> https://docs.google.com/document/d/1X_Pfo4WcO-TEZmmVTTYNn44LQg5gnFeeaeqM7ZNLQ7M/edit?usp=sharing
>
> Your kind reviews and comments are very appreciated and will help so much
> the feature to be completed.
>
> Best,
> Ryan
>
>
> Tao Yangyu <[hidden email]> 于2018年11月7日周三下午4:49写道：
>
> > Thanks so much for your all feedbacks!
> >
> > Yes, as mentioned above by Jin Sun, the design currently targets batch to
> > explore the general framework and basic modules. The strategy could be
> also
> > applied to stream with some extended code, for example, the result
> > commitment.
> >
> > Jin Sun <[hidden email]> 于2018年11月7日周三上午8:38写道：
> >
> >> I think this is target for batch at the very beginning, the idea should
> >> be also work for both case, with different algorithm/strategy.
> >>
> >> Ryan, since you are working on this, I will assign FLINK-10644 <
> >> https://issues.apache.org/jira/browse/FLINK-10644> to you.
> >>
> >> Jin
> >>
> >> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]>
> wrote:
> >> >
> >> > Thanks for starting this discussion Ryan. I'm looking forward to your
> >> > design document about this feature. Quick question: Will it be a batch
> >> only
> >> > feature? If no, then it needs to take checkpointing into account as
> >> well.
> >> >
> >> > Cheers,
> >> > Till
> >> >
> >> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email]
> >> .invalid>
> >> > wrote:
> >> >
> >> >> Thanks yangyu for launching this discussion.
> >> >>
> >> >> I really like this proposal. We ever found this scene frequently that
> >> some
> >> >> long tail tasks to delay the total batch job execution time in
> >> production.
> >> >> We also have some thoughts for bringing this mechanism. Looking
> >> forward to
> >> >> your detail design doc, then we can discussion further.
> >> >>
> >> >> Best,
> >> >> Zhijiang
> >> >> ------------------------------------------------------------------
> >> >> 发件人：Tao Yangyu <[hidden email]>
> >> >> 发送时间：2018年11月6日(星期二) 11:01
> >> >> 收件人：dev <[hidden email]>
> >> >> 主题：[DISCUSS] Task speculative execution for Flink batch
> >> >>
> >> >> Hi everyone,
> >> >>
> >> >> We propose task speculative execution for Flink batch in this message
> >> as
> >> >> follows.
> >> >>
> >> >> In the batch mode, the job is usually divided into multiple parallel
> >> tasks
> >> >> executed cross many nodes in the cluster. It is common to encounter
> the
> >> >> performance degradation on some nodes due to hardware problems or
> >> accident
> >> >> I/O busy and high CPU load. This kind of degradation can probably
> >> cause the
> >> >> running tasks on the node to be quite slow that is so called long
> tail
> >> >> tasks. Although the long tail tasks will not fail, they can severely
> >> affect
> >> >> the total job running time. Flink task scheduler does not take this
> >> long
> >> >> tail problem into account currently.
> >> >>
> >> >>
> >> >>
> >> >> Here we propose the speculative execution strategy to handle the
> >> problem.
> >> >> The basic idea is to run a copy of task on another node when the
> >> original
> >> >> task is identified to be long tail. In more details, the speculative
> >> task
> >> >> will be triggered when the scheduler detects that the data processing
> >> >> throughput of a task is much slower than others. The speculative task
> >> is
> >> >> executed in parallel with the original one and share the same failure
> >> retry
> >> >> mechanism. Once either task complete, the scheduler admits its output
> >> as
> >> >> the final result and cancel the other running one. The preliminary
> >> >> experiments has demonstrated the effectiveness.
> >> >>
> >> >>
> >> >> The detailed design doc will be ready soon. Your reviews and
> comments
> >> will
> >> >> be much appreciated.
> >> >>
> >> >>
> >> >> Thanks!
> >> >>
> >> >> Ryan
> >> >>
> >> >>
> >>
> >>
>

Tao Yangyu

Re: [DISCUSS] Task speculative execution for Flink batch

Thanks Xiaowei for the inspiring comments!
Yes, we could increase the granularity of speculation from a single task to
a bundle of successive tasks especially for the pipelined channel.

Xiaowei Jiang <[hidden email]> 于2018年11月18日周日下午2:24写道：

> Thanks Yangyu for the nice design doc! One thing to consider is the
> granularity of speculation. Multiple task may propagate data through
> pipeline mode. In such case, fixing a single task may not be enough. But
> you might be able to fix this problem by increasing the granularity of
> speculation. The traditional case of a single speculative task can be
> considered as a special case of this.
>
> Xiaowei
>
> On Sat, Nov 17, 2018 at 10:27 PM Tao Yangyu <[hidden email]> wrote:
>
> > Hi all，
> >
> > After refined, the detailed design doc is here:
> >
> >
> https://docs.google.com/document/d/1X_Pfo4WcO-TEZmmVTTYNn44LQg5gnFeeaeqM7ZNLQ7M/edit?usp=sharing
> >
> > Your kind reviews and comments are very appreciated and will help so much
> > the feature to be completed.
> >
> > Best,
> > Ryan
> >
> >
> > Tao Yangyu <[hidden email]> 于2018年11月7日周三下午4:49写道：
> >
> > > Thanks so much for your all feedbacks!
> > >
> > > Yes, as mentioned above by Jin Sun, the design currently targets batch
> to
> > > explore the general framework and basic modules. The strategy could be
> > also
> > > applied to stream with some extended code, for example, the result
> > > commitment.
> > >
> > > Jin Sun <[hidden email]> 于2018年11月7日周三上午8:38写道：
> > >
> > >> I think this is target for batch at the very beginning, the idea
> should
> > >> be also work for both case, with different algorithm/strategy.
> > >>
> > >> Ryan, since you are working on this, I will assign FLINK-10644 <
> > >> https://issues.apache.org/jira/browse/FLINK-10644> to you.
> > >>
> > >> Jin
> > >>
> > >> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]>
> > wrote:
> > >> >
> > >> > Thanks for starting this discussion Ryan. I'm looking forward to
> your
> > >> > design document about this feature. Quick question: Will it be a
> batch
> > >> only
> > >> > feature? If no, then it needs to take checkpointing into account as
> > >> well.
> > >> >
> > >> > Cheers,
> > >> > Till
> > >> >
> > >> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email]
> > >> .invalid>
> > >> > wrote:
> > >> >
> > >> >> Thanks yangyu for launching this discussion.
> > >> >>
> > >> >> I really like this proposal. We ever found this scene frequently
> that
> > >> some
> > >> >> long tail tasks to delay the total batch job execution time in
> > >> production.
> > >> >> We also have some thoughts for bringing this mechanism. Looking
> > >> forward to
> > >> >> your detail design doc, then we can discussion further.
> > >> >>
> > >> >> Best,
> > >> >> Zhijiang
> > >> >> ------------------------------------------------------------------
> > >> >> 发件人：Tao Yangyu <[hidden email]>
> > >> >> 发送时间：2018年11月6日(星期二) 11:01
> > >> >> 收件人：dev <[hidden email]>
> > >> >> 主题：[DISCUSS] Task speculative execution for Flink batch
> > >> >>
> > >> >> Hi everyone,
> > >> >>
> > >> >> We propose task speculative execution for Flink batch in this
> message
> > >> as
> > >> >> follows.
> > >> >>
> > >> >> In the batch mode, the job is usually divided into multiple
> parallel
> > >> tasks
> > >> >> executed cross many nodes in the cluster. It is common to encounter
> > the
> > >> >> performance degradation on some nodes due to hardware problems or
> > >> accident
> > >> >> I/O busy and high CPU load. This kind of degradation can probably
> > >> cause the
> > >> >> running tasks on the node to be quite slow that is so called long
> > tail
> > >> >> tasks. Although the long tail tasks will not fail, they can
> severely
> > >> affect
> > >> >> the total job running time. Flink task scheduler does not take this
> > >> long
> > >> >> tail problem into account currently.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Here we propose the speculative execution strategy to handle the
> > >> problem.
> > >> >> The basic idea is to run a copy of task on another node when the
> > >> original
> > >> >> task is identified to be long tail. In more details, the
> speculative
> > >> task
> > >> >> will be triggered when the scheduler detects that the data
> processing
> > >> >> throughput of a task is much slower than others. The speculative
> task
> > >> is
> > >> >> executed in parallel with the original one and share the same
> failure
> > >> retry
> > >> >> mechanism. Once either task complete, the scheduler admits its
> output
> > >> as
> > >> >> the final result and cancel the other running one. The preliminary
> > >> >> experiments has demonstrated the effectiveness.
> > >> >>
> > >> >>
> > >> >> The detailed design doc will be ready soon. Your reviews and
> > comments
> > >> will
> > >> >> be much appreciated.
> > >> >>
> > >> >>
> > >> >> Thanks!
> > >> >>
> > >> >> Ryan
> > >> >>
> > >> >>
> > >>
> > >>
> >
>