Hi everyone,
We propose task speculative execution for Flink batch in this message as follows. In the batch mode, the job is usually divided into multiple parallel tasks executed cross many nodes in the cluster. It is common to encounter the performance degradation on some nodes due to hardware problems or accident I/O busy and high CPU load. This kind of degradation can probably cause the running tasks on the node to be quite slow that is so called long tail tasks. Although the long tail tasks will not fail, they can severely affect the total job running time. Flink task scheduler does not take this long tail problem into account currently. Here we propose the speculative execution strategy to handle the problem. The basic idea is to run a copy of task on another node when the original task is identified to be long tail. In more details, the speculative task will be triggered when the scheduler detects that the data processing throughput of a task is much slower than others. The speculative task is executed in parallel with the original one and share the same failure retry mechanism. Once either task complete, the scheduler admits its output as the final result and cancel the other running one. The preliminary experiments has demonstrated the effectiveness. The detailed design doc will be ready soon. Your reviews and comments will be much appreciated. Thanks! Ryan |
Thanks yangyu for launching this discussion.
I really like this proposal. We ever found this scene frequently that some long tail tasks to delay the total batch job execution time in production. We also have some thoughts for bringing this mechanism. Looking forward to your detail design doc, then we can discussion further. Best, Zhijiang ------------------------------------------------------------------ 发件人:Tao Yangyu <[hidden email]> 发送时间:2018年11月6日(星期二) 11:01 收件人:dev <[hidden email]> 主 题:[DISCUSS] Task speculative execution for Flink batch Hi everyone, We propose task speculative execution for Flink batch in this message as follows. In the batch mode, the job is usually divided into multiple parallel tasks executed cross many nodes in the cluster. It is common to encounter the performance degradation on some nodes due to hardware problems or accident I/O busy and high CPU load. This kind of degradation can probably cause the running tasks on the node to be quite slow that is so called long tail tasks. Although the long tail tasks will not fail, they can severely affect the total job running time. Flink task scheduler does not take this long tail problem into account currently. Here we propose the speculative execution strategy to handle the problem. The basic idea is to run a copy of task on another node when the original task is identified to be long tail. In more details, the speculative task will be triggered when the scheduler detects that the data processing throughput of a task is much slower than others. The speculative task is executed in parallel with the original one and share the same failure retry mechanism. Once either task complete, the scheduler admits its output as the final result and cancel the other running one. The preliminary experiments has demonstrated the effectiveness. The detailed design doc will be ready soon. Your reviews and comments will be much appreciated. Thanks! Ryan |
Thanks for starting this discussion Ryan. I'm looking forward to your
design document about this feature. Quick question: Will it be a batch only feature? If no, then it needs to take checkpointing into account as well. Cheers, Till On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email]> wrote: > Thanks yangyu for launching this discussion. > > I really like this proposal. We ever found this scene frequently that some > long tail tasks to delay the total batch job execution time in production. > We also have some thoughts for bringing this mechanism. Looking forward to > your detail design doc, then we can discussion further. > > Best, > Zhijiang > ------------------------------------------------------------------ > 发件人:Tao Yangyu <[hidden email]> > 发送时间:2018年11月6日(星期二) 11:01 > 收件人:dev <[hidden email]> > 主 题:[DISCUSS] Task speculative execution for Flink batch > > Hi everyone, > > We propose task speculative execution for Flink batch in this message as > follows. > > In the batch mode, the job is usually divided into multiple parallel tasks > executed cross many nodes in the cluster. It is common to encounter the > performance degradation on some nodes due to hardware problems or accident > I/O busy and high CPU load. This kind of degradation can probably cause the > running tasks on the node to be quite slow that is so called long tail > tasks. Although the long tail tasks will not fail, they can severely affect > the total job running time. Flink task scheduler does not take this long > tail problem into account currently. > > > > Here we propose the speculative execution strategy to handle the problem. > The basic idea is to run a copy of task on another node when the original > task is identified to be long tail. In more details, the speculative task > will be triggered when the scheduler detects that the data processing > throughput of a task is much slower than others. The speculative task is > executed in parallel with the original one and share the same failure retry > mechanism. Once either task complete, the scheduler admits its output as > the final result and cancel the other running one. The preliminary > experiments has demonstrated the effectiveness. > > > The detailed design doc will be ready soon. Your reviews and comments will > be much appreciated. > > > Thanks! > > Ryan > > |
I think this is target for batch at the very beginning, the idea should be also work for both case, with different algorithm/strategy.
Ryan, since you are working on this, I will assign FLINK-10644 <https://issues.apache.org/jira/browse/FLINK-10644> to you. Jin > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]> wrote: > > Thanks for starting this discussion Ryan. I'm looking forward to your > design document about this feature. Quick question: Will it be a batch only > feature? If no, then it needs to take checkpointing into account as well. > > Cheers, > Till > > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email]> > wrote: > >> Thanks yangyu for launching this discussion. >> >> I really like this proposal. We ever found this scene frequently that some >> long tail tasks to delay the total batch job execution time in production. >> We also have some thoughts for bringing this mechanism. Looking forward to >> your detail design doc, then we can discussion further. >> >> Best, >> Zhijiang >> ------------------------------------------------------------------ >> 发件人:Tao Yangyu <[hidden email]> >> 发送时间:2018年11月6日(星期二) 11:01 >> 收件人:dev <[hidden email]> >> 主 题:[DISCUSS] Task speculative execution for Flink batch >> >> Hi everyone, >> >> We propose task speculative execution for Flink batch in this message as >> follows. >> >> In the batch mode, the job is usually divided into multiple parallel tasks >> executed cross many nodes in the cluster. It is common to encounter the >> performance degradation on some nodes due to hardware problems or accident >> I/O busy and high CPU load. This kind of degradation can probably cause the >> running tasks on the node to be quite slow that is so called long tail >> tasks. Although the long tail tasks will not fail, they can severely affect >> the total job running time. Flink task scheduler does not take this long >> tail problem into account currently. >> >> >> >> Here we propose the speculative execution strategy to handle the problem. >> The basic idea is to run a copy of task on another node when the original >> task is identified to be long tail. In more details, the speculative task >> will be triggered when the scheduler detects that the data processing >> throughput of a task is much slower than others. The speculative task is >> executed in parallel with the original one and share the same failure retry >> mechanism. Once either task complete, the scheduler admits its output as >> the final result and cancel the other running one. The preliminary >> experiments has demonstrated the effectiveness. >> >> >> The detailed design doc will be ready soon. Your reviews and comments will >> be much appreciated. >> >> >> Thanks! >> >> Ryan >> >> |
+1 for the speculative execution for Flink batch, Speculative execution is
used in lots of batch execution engine like mr, tez and spark. This would be a great improvement for Flink in batch scenario. Jin Sun <[hidden email]>于2018年11月7日周三 上午8:38写道: > I think this is target for batch at the very beginning, the idea should be > also work for both case, with different algorithm/strategy. > > Ryan, since you are working on this, I will assign FLINK-10644 < > https://issues.apache.org/jira/browse/FLINK-10644> to you. > > Jin > > > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]> wrote: > > > > Thanks for starting this discussion Ryan. I'm looking forward to your > > design document about this feature. Quick question: Will it be a batch > only > > feature? If no, then it needs to take checkpointing into account as well. > > > > Cheers, > > Till > > > > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email] > .invalid> > > wrote: > > > >> Thanks yangyu for launching this discussion. > >> > >> I really like this proposal. We ever found this scene frequently that > some > >> long tail tasks to delay the total batch job execution time in > production. > >> We also have some thoughts for bringing this mechanism. Looking forward > to > >> your detail design doc, then we can discussion further. > >> > >> Best, > >> Zhijiang > >> ------------------------------------------------------------------ > >> 发件人:Tao Yangyu <[hidden email]> > >> 发送时间:2018年11月6日(星期二) 11:01 > >> 收件人:dev <[hidden email]> > >> 主 题:[DISCUSS] Task speculative execution for Flink batch > >> > >> Hi everyone, > >> > >> We propose task speculative execution for Flink batch in this message as > >> follows. > >> > >> In the batch mode, the job is usually divided into multiple parallel > tasks > >> executed cross many nodes in the cluster. It is common to encounter the > >> performance degradation on some nodes due to hardware problems or > accident > >> I/O busy and high CPU load. This kind of degradation can probably cause > the > >> running tasks on the node to be quite slow that is so called long tail > >> tasks. Although the long tail tasks will not fail, they can severely > affect > >> the total job running time. Flink task scheduler does not take this long > >> tail problem into account currently. > >> > >> > >> > >> Here we propose the speculative execution strategy to handle the > problem. > >> The basic idea is to run a copy of task on another node when the > original > >> task is identified to be long tail. In more details, the speculative > task > >> will be triggered when the scheduler detects that the data processing > >> throughput of a task is much slower than others. The speculative task is > >> executed in parallel with the original one and share the same failure > retry > >> mechanism. Once either task complete, the scheduler admits its output as > >> the final result and cancel the other running one. The preliminary > >> experiments has demonstrated the effectiveness. > >> > >> > >> The detailed design doc will be ready soon. Your reviews and comments > will > >> be much appreciated. > >> > >> > >> Thanks! > >> > >> Ryan > >> > >> > > |
Hi,
+1 for the speculative execution. It will be more great if it can work well with exisitng checkpointing and pipelined execution. That way, we can move a further step towards the unification of batch and stream processing. Regards, Xiaogang Jeff Zhang <[hidden email]> 于2018年11月7日周三 上午9:40写道: > +1 for the speculative execution for Flink batch, Speculative execution is > used in lots of batch execution engine like mr, tez and spark. This would > be a great improvement for Flink in batch scenario. > > Jin Sun <[hidden email]>于2018年11月7日周三 上午8:38写道: > > > I think this is target for batch at the very beginning, the idea should > be > > also work for both case, with different algorithm/strategy. > > > > Ryan, since you are working on this, I will assign FLINK-10644 < > > https://issues.apache.org/jira/browse/FLINK-10644> to you. > > > > Jin > > > > > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]> > wrote: > > > > > > Thanks for starting this discussion Ryan. I'm looking forward to your > > > design document about this feature. Quick question: Will it be a batch > > only > > > feature? If no, then it needs to take checkpointing into account as > well. > > > > > > Cheers, > > > Till > > > > > > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email] > > .invalid> > > > wrote: > > > > > >> Thanks yangyu for launching this discussion. > > >> > > >> I really like this proposal. We ever found this scene frequently that > > some > > >> long tail tasks to delay the total batch job execution time in > > production. > > >> We also have some thoughts for bringing this mechanism. Looking > forward > > to > > >> your detail design doc, then we can discussion further. > > >> > > >> Best, > > >> Zhijiang > > >> ------------------------------------------------------------------ > > >> 发件人:Tao Yangyu <[hidden email]> > > >> 发送时间:2018年11月6日(星期二) 11:01 > > >> 收件人:dev <[hidden email]> > > >> 主 题:[DISCUSS] Task speculative execution for Flink batch > > >> > > >> Hi everyone, > > >> > > >> We propose task speculative execution for Flink batch in this message > as > > >> follows. > > >> > > >> In the batch mode, the job is usually divided into multiple parallel > > tasks > > >> executed cross many nodes in the cluster. It is common to encounter > the > > >> performance degradation on some nodes due to hardware problems or > > accident > > >> I/O busy and high CPU load. This kind of degradation can probably > cause > > the > > >> running tasks on the node to be quite slow that is so called long tail > > >> tasks. Although the long tail tasks will not fail, they can severely > > affect > > >> the total job running time. Flink task scheduler does not take this > long > > >> tail problem into account currently. > > >> > > >> > > >> > > >> Here we propose the speculative execution strategy to handle the > > problem. > > >> The basic idea is to run a copy of task on another node when the > > original > > >> task is identified to be long tail. In more details, the speculative > > task > > >> will be triggered when the scheduler detects that the data processing > > >> throughput of a task is much slower than others. The speculative task > is > > >> executed in parallel with the original one and share the same failure > > retry > > >> mechanism. Once either task complete, the scheduler admits its output > as > > >> the final result and cancel the other running one. The preliminary > > >> experiments has demonstrated the effectiveness. > > >> > > >> > > >> The detailed design doc will be ready soon. Your reviews and comments > > will > > >> be much appreciated. > > >> > > >> > > >> Thanks! > > >> > > >> Ryan > > >> > > >> > > > > > |
+1, Thanks Yangyu for proposing this very useful feature. Looking forward
to the design doc. On Wed, Nov 7, 2018 at 10:15 AM SHI Xiaogang <[hidden email]> wrote: > Hi, > > +1 for the speculative execution. > > It will be more great if it can work well with exisitng checkpointing and > pipelined execution. That way, we can move a further step towards the > unification of batch and stream processing. > > Regards, > Xiaogang > > Jeff Zhang <[hidden email]> 于2018年11月7日周三 上午9:40写道: > > > +1 for the speculative execution for Flink batch, Speculative execution > is > > used in lots of batch execution engine like mr, tez and spark. This would > > be a great improvement for Flink in batch scenario. > > > > Jin Sun <[hidden email]>于2018年11月7日周三 上午8:38写道: > > > > > I think this is target for batch at the very beginning, the idea should > > be > > > also work for both case, with different algorithm/strategy. > > > > > > Ryan, since you are working on this, I will assign FLINK-10644 < > > > https://issues.apache.org/jira/browse/FLINK-10644> to you. > > > > > > Jin > > > > > > > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]> > > wrote: > > > > > > > > Thanks for starting this discussion Ryan. I'm looking forward to your > > > > design document about this feature. Quick question: Will it be a > batch > > > only > > > > feature? If no, then it needs to take checkpointing into account as > > well. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email] > > > .invalid> > > > > wrote: > > > > > > > >> Thanks yangyu for launching this discussion. > > > >> > > > >> I really like this proposal. We ever found this scene frequently > that > > > some > > > >> long tail tasks to delay the total batch job execution time in > > > production. > > > >> We also have some thoughts for bringing this mechanism. Looking > > forward > > > to > > > >> your detail design doc, then we can discussion further. > > > >> > > > >> Best, > > > >> Zhijiang > > > >> ------------------------------------------------------------------ > > > >> 发件人:Tao Yangyu <[hidden email]> > > > >> 发送时间:2018年11月6日(星期二) 11:01 > > > >> 收件人:dev <[hidden email]> > > > >> 主 题:[DISCUSS] Task speculative execution for Flink batch > > > >> > > > >> Hi everyone, > > > >> > > > >> We propose task speculative execution for Flink batch in this > message > > as > > > >> follows. > > > >> > > > >> In the batch mode, the job is usually divided into multiple parallel > > > tasks > > > >> executed cross many nodes in the cluster. It is common to encounter > > the > > > >> performance degradation on some nodes due to hardware problems or > > > accident > > > >> I/O busy and high CPU load. This kind of degradation can probably > > cause > > > the > > > >> running tasks on the node to be quite slow that is so called long > tail > > > >> tasks. Although the long tail tasks will not fail, they can severely > > > affect > > > >> the total job running time. Flink task scheduler does not take this > > long > > > >> tail problem into account currently. > > > >> > > > >> > > > >> > > > >> Here we propose the speculative execution strategy to handle the > > > problem. > > > >> The basic idea is to run a copy of task on another node when the > > > original > > > >> task is identified to be long tail. In more details, the speculative > > > task > > > >> will be triggered when the scheduler detects that the data > processing > > > >> throughput of a task is much slower than others. The speculative > task > > is > > > >> executed in parallel with the original one and share the same > failure > > > retry > > > >> mechanism. Once either task complete, the scheduler admits its > output > > as > > > >> the final result and cancel the other running one. The preliminary > > > >> experiments has demonstrated the effectiveness. > > > >> > > > >> > > > >> The detailed design doc will be ready soon. Your reviews and > comments > > > will > > > >> be much appreciated. > > > >> > > > >> > > > >> Thanks! > > > >> > > > >> Ryan > > > >> > > > >> > > > > > > > > > |
In reply to this post by isunjin
Thanks so much for your all feedbacks!
Yes, as mentioned above by Jin Sun, the design currently targets batch to explore the general framework and basic modules. The strategy could be also applied to stream with some extended code, for example, the result commitment. Jin Sun <[hidden email]> 于2018年11月7日周三 上午8:38写道: > I think this is target for batch at the very beginning, the idea should be > also work for both case, with different algorithm/strategy. > > Ryan, since you are working on this, I will assign FLINK-10644 < > https://issues.apache.org/jira/browse/FLINK-10644> to you. > > Jin > > > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]> wrote: > > > > Thanks for starting this discussion Ryan. I'm looking forward to your > > design document about this feature. Quick question: Will it be a batch > only > > feature? If no, then it needs to take checkpointing into account as well. > > > > Cheers, > > Till > > > > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email] > .invalid> > > wrote: > > > >> Thanks yangyu for launching this discussion. > >> > >> I really like this proposal. We ever found this scene frequently that > some > >> long tail tasks to delay the total batch job execution time in > production. > >> We also have some thoughts for bringing this mechanism. Looking forward > to > >> your detail design doc, then we can discussion further. > >> > >> Best, > >> Zhijiang > >> ------------------------------------------------------------------ > >> 发件人:Tao Yangyu <[hidden email]> > >> 发送时间:2018年11月6日(星期二) 11:01 > >> 收件人:dev <[hidden email]> > >> 主 题:[DISCUSS] Task speculative execution for Flink batch > >> > >> Hi everyone, > >> > >> We propose task speculative execution for Flink batch in this message as > >> follows. > >> > >> In the batch mode, the job is usually divided into multiple parallel > tasks > >> executed cross many nodes in the cluster. It is common to encounter the > >> performance degradation on some nodes due to hardware problems or > accident > >> I/O busy and high CPU load. This kind of degradation can probably cause > the > >> running tasks on the node to be quite slow that is so called long tail > >> tasks. Although the long tail tasks will not fail, they can severely > affect > >> the total job running time. Flink task scheduler does not take this long > >> tail problem into account currently. > >> > >> > >> > >> Here we propose the speculative execution strategy to handle the > problem. > >> The basic idea is to run a copy of task on another node when the > original > >> task is identified to be long tail. In more details, the speculative > task > >> will be triggered when the scheduler detects that the data processing > >> throughput of a task is much slower than others. The speculative task is > >> executed in parallel with the original one and share the same failure > retry > >> mechanism. Once either task complete, the scheduler admits its output as > >> the final result and cancel the other running one. The preliminary > >> experiments has demonstrated the effectiveness. > >> > >> > >> The detailed design doc will be ready soon. Your reviews and comments > will > >> be much appreciated. > >> > >> > >> Thanks! > >> > >> Ryan > >> > >> > > |
Hi all,
After refined, the detailed design doc is here: https://docs.google.com/document/d/1X_Pfo4WcO-TEZmmVTTYNn44LQg5gnFeeaeqM7ZNLQ7M/edit?usp=sharing Your kind reviews and comments are very appreciated and will help so much the feature to be completed. Best, Ryan Tao Yangyu <[hidden email]> 于2018年11月7日周三 下午4:49写道: > Thanks so much for your all feedbacks! > > Yes, as mentioned above by Jin Sun, the design currently targets batch to > explore the general framework and basic modules. The strategy could be also > applied to stream with some extended code, for example, the result > commitment. > > Jin Sun <[hidden email]> 于2018年11月7日周三 上午8:38写道: > >> I think this is target for batch at the very beginning, the idea should >> be also work for both case, with different algorithm/strategy. >> >> Ryan, since you are working on this, I will assign FLINK-10644 < >> https://issues.apache.org/jira/browse/FLINK-10644> to you. >> >> Jin >> >> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]> wrote: >> > >> > Thanks for starting this discussion Ryan. I'm looking forward to your >> > design document about this feature. Quick question: Will it be a batch >> only >> > feature? If no, then it needs to take checkpointing into account as >> well. >> > >> > Cheers, >> > Till >> > >> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email] >> .invalid> >> > wrote: >> > >> >> Thanks yangyu for launching this discussion. >> >> >> >> I really like this proposal. We ever found this scene frequently that >> some >> >> long tail tasks to delay the total batch job execution time in >> production. >> >> We also have some thoughts for bringing this mechanism. Looking >> forward to >> >> your detail design doc, then we can discussion further. >> >> >> >> Best, >> >> Zhijiang >> >> ------------------------------------------------------------------ >> >> 发件人:Tao Yangyu <[hidden email]> >> >> 发送时间:2018年11月6日(星期二) 11:01 >> >> 收件人:dev <[hidden email]> >> >> 主 题:[DISCUSS] Task speculative execution for Flink batch >> >> >> >> Hi everyone, >> >> >> >> We propose task speculative execution for Flink batch in this message >> as >> >> follows. >> >> >> >> In the batch mode, the job is usually divided into multiple parallel >> tasks >> >> executed cross many nodes in the cluster. It is common to encounter the >> >> performance degradation on some nodes due to hardware problems or >> accident >> >> I/O busy and high CPU load. This kind of degradation can probably >> cause the >> >> running tasks on the node to be quite slow that is so called long tail >> >> tasks. Although the long tail tasks will not fail, they can severely >> affect >> >> the total job running time. Flink task scheduler does not take this >> long >> >> tail problem into account currently. >> >> >> >> >> >> >> >> Here we propose the speculative execution strategy to handle the >> problem. >> >> The basic idea is to run a copy of task on another node when the >> original >> >> task is identified to be long tail. In more details, the speculative >> task >> >> will be triggered when the scheduler detects that the data processing >> >> throughput of a task is much slower than others. The speculative task >> is >> >> executed in parallel with the original one and share the same failure >> retry >> >> mechanism. Once either task complete, the scheduler admits its output >> as >> >> the final result and cancel the other running one. The preliminary >> >> experiments has demonstrated the effectiveness. >> >> >> >> >> >> The detailed design doc will be ready soon. Your reviews and comments >> will >> >> be much appreciated. >> >> >> >> >> >> Thanks! >> >> >> >> Ryan >> >> >> >> >> >> |
Thanks Yangyu for the nice design doc! One thing to consider is the
granularity of speculation. Multiple task may propagate data through pipeline mode. In such case, fixing a single task may not be enough. But you might be able to fix this problem by increasing the granularity of speculation. The traditional case of a single speculative task can be considered as a special case of this. Xiaowei On Sat, Nov 17, 2018 at 10:27 PM Tao Yangyu <[hidden email]> wrote: > Hi all, > > After refined, the detailed design doc is here: > > https://docs.google.com/document/d/1X_Pfo4WcO-TEZmmVTTYNn44LQg5gnFeeaeqM7ZNLQ7M/edit?usp=sharing > > Your kind reviews and comments are very appreciated and will help so much > the feature to be completed. > > Best, > Ryan > > > Tao Yangyu <[hidden email]> 于2018年11月7日周三 下午4:49写道: > > > Thanks so much for your all feedbacks! > > > > Yes, as mentioned above by Jin Sun, the design currently targets batch to > > explore the general framework and basic modules. The strategy could be > also > > applied to stream with some extended code, for example, the result > > commitment. > > > > Jin Sun <[hidden email]> 于2018年11月7日周三 上午8:38写道: > > > >> I think this is target for batch at the very beginning, the idea should > >> be also work for both case, with different algorithm/strategy. > >> > >> Ryan, since you are working on this, I will assign FLINK-10644 < > >> https://issues.apache.org/jira/browse/FLINK-10644> to you. > >> > >> Jin > >> > >> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]> > wrote: > >> > > >> > Thanks for starting this discussion Ryan. I'm looking forward to your > >> > design document about this feature. Quick question: Will it be a batch > >> only > >> > feature? If no, then it needs to take checkpointing into account as > >> well. > >> > > >> > Cheers, > >> > Till > >> > > >> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email] > >> .invalid> > >> > wrote: > >> > > >> >> Thanks yangyu for launching this discussion. > >> >> > >> >> I really like this proposal. We ever found this scene frequently that > >> some > >> >> long tail tasks to delay the total batch job execution time in > >> production. > >> >> We also have some thoughts for bringing this mechanism. Looking > >> forward to > >> >> your detail design doc, then we can discussion further. > >> >> > >> >> Best, > >> >> Zhijiang > >> >> ------------------------------------------------------------------ > >> >> 发件人:Tao Yangyu <[hidden email]> > >> >> 发送时间:2018年11月6日(星期二) 11:01 > >> >> 收件人:dev <[hidden email]> > >> >> 主 题:[DISCUSS] Task speculative execution for Flink batch > >> >> > >> >> Hi everyone, > >> >> > >> >> We propose task speculative execution for Flink batch in this message > >> as > >> >> follows. > >> >> > >> >> In the batch mode, the job is usually divided into multiple parallel > >> tasks > >> >> executed cross many nodes in the cluster. It is common to encounter > the > >> >> performance degradation on some nodes due to hardware problems or > >> accident > >> >> I/O busy and high CPU load. This kind of degradation can probably > >> cause the > >> >> running tasks on the node to be quite slow that is so called long > tail > >> >> tasks. Although the long tail tasks will not fail, they can severely > >> affect > >> >> the total job running time. Flink task scheduler does not take this > >> long > >> >> tail problem into account currently. > >> >> > >> >> > >> >> > >> >> Here we propose the speculative execution strategy to handle the > >> problem. > >> >> The basic idea is to run a copy of task on another node when the > >> original > >> >> task is identified to be long tail. In more details, the speculative > >> task > >> >> will be triggered when the scheduler detects that the data processing > >> >> throughput of a task is much slower than others. The speculative task > >> is > >> >> executed in parallel with the original one and share the same failure > >> retry > >> >> mechanism. Once either task complete, the scheduler admits its output > >> as > >> >> the final result and cancel the other running one. The preliminary > >> >> experiments has demonstrated the effectiveness. > >> >> > >> >> > >> >> The detailed design doc will be ready soon. Your reviews and > comments > >> will > >> >> be much appreciated. > >> >> > >> >> > >> >> Thanks! > >> >> > >> >> Ryan > >> >> > >> >> > >> > >> > |
Thanks Xiaowei for the inspiring comments!
Yes, we could increase the granularity of speculation from a single task to a bundle of successive tasks especially for the pipelined channel. Xiaowei Jiang <[hidden email]> 于2018年11月18日周日 下午2:24写道: > Thanks Yangyu for the nice design doc! One thing to consider is the > granularity of speculation. Multiple task may propagate data through > pipeline mode. In such case, fixing a single task may not be enough. But > you might be able to fix this problem by increasing the granularity of > speculation. The traditional case of a single speculative task can be > considered as a special case of this. > > Xiaowei > > On Sat, Nov 17, 2018 at 10:27 PM Tao Yangyu <[hidden email]> wrote: > > > Hi all, > > > > After refined, the detailed design doc is here: > > > > > https://docs.google.com/document/d/1X_Pfo4WcO-TEZmmVTTYNn44LQg5gnFeeaeqM7ZNLQ7M/edit?usp=sharing > > > > Your kind reviews and comments are very appreciated and will help so much > > the feature to be completed. > > > > Best, > > Ryan > > > > > > Tao Yangyu <[hidden email]> 于2018年11月7日周三 下午4:49写道: > > > > > Thanks so much for your all feedbacks! > > > > > > Yes, as mentioned above by Jin Sun, the design currently targets batch > to > > > explore the general framework and basic modules. The strategy could be > > also > > > applied to stream with some extended code, for example, the result > > > commitment. > > > > > > Jin Sun <[hidden email]> 于2018年11月7日周三 上午8:38写道: > > > > > >> I think this is target for batch at the very beginning, the idea > should > > >> be also work for both case, with different algorithm/strategy. > > >> > > >> Ryan, since you are working on this, I will assign FLINK-10644 < > > >> https://issues.apache.org/jira/browse/FLINK-10644> to you. > > >> > > >> Jin > > >> > > >> > On Nov 6, 2018, at 4:45 AM, Till Rohrmann <[hidden email]> > > wrote: > > >> > > > >> > Thanks for starting this discussion Ryan. I'm looking forward to > your > > >> > design document about this feature. Quick question: Will it be a > batch > > >> only > > >> > feature? If no, then it needs to take checkpointing into account as > > >> well. > > >> > > > >> > Cheers, > > >> > Till > > >> > > > >> > On Tue, Nov 6, 2018 at 4:29 AM zhijiang <[hidden email] > > >> .invalid> > > >> > wrote: > > >> > > > >> >> Thanks yangyu for launching this discussion. > > >> >> > > >> >> I really like this proposal. We ever found this scene frequently > that > > >> some > > >> >> long tail tasks to delay the total batch job execution time in > > >> production. > > >> >> We also have some thoughts for bringing this mechanism. Looking > > >> forward to > > >> >> your detail design doc, then we can discussion further. > > >> >> > > >> >> Best, > > >> >> Zhijiang > > >> >> ------------------------------------------------------------------ > > >> >> 发件人:Tao Yangyu <[hidden email]> > > >> >> 发送时间:2018年11月6日(星期二) 11:01 > > >> >> 收件人:dev <[hidden email]> > > >> >> 主 题:[DISCUSS] Task speculative execution for Flink batch > > >> >> > > >> >> Hi everyone, > > >> >> > > >> >> We propose task speculative execution for Flink batch in this > message > > >> as > > >> >> follows. > > >> >> > > >> >> In the batch mode, the job is usually divided into multiple > parallel > > >> tasks > > >> >> executed cross many nodes in the cluster. It is common to encounter > > the > > >> >> performance degradation on some nodes due to hardware problems or > > >> accident > > >> >> I/O busy and high CPU load. This kind of degradation can probably > > >> cause the > > >> >> running tasks on the node to be quite slow that is so called long > > tail > > >> >> tasks. Although the long tail tasks will not fail, they can > severely > > >> affect > > >> >> the total job running time. Flink task scheduler does not take this > > >> long > > >> >> tail problem into account currently. > > >> >> > > >> >> > > >> >> > > >> >> Here we propose the speculative execution strategy to handle the > > >> problem. > > >> >> The basic idea is to run a copy of task on another node when the > > >> original > > >> >> task is identified to be long tail. In more details, the > speculative > > >> task > > >> >> will be triggered when the scheduler detects that the data > processing > > >> >> throughput of a task is much slower than others. The speculative > task > > >> is > > >> >> executed in parallel with the original one and share the same > failure > > >> retry > > >> >> mechanism. Once either task complete, the scheduler admits its > output > > >> as > > >> >> the final result and cancel the other running one. The preliminary > > >> >> experiments has demonstrated the effectiveness. > > >> >> > > >> >> > > >> >> The detailed design doc will be ready soon. Your reviews and > > comments > > >> will > > >> >> be much appreciated. > > >> >> > > >> >> > > >> >> Thanks! > > >> >> > > >> >> Ryan > > >> >> > > >> >> > > >> > > >> > > > |
Free forum by Nabble | Edit this page |