Hi everyone,
This post proposes the blacklist mechanism as an enhancement of flink scheduler. The motivation is as follows. In our clusters, jobs encounter Hardware and software environment problems occasionally, including software library missing,bad hardware,resource shortage like out of disk space,these problems will lead to task failure,the failover strategy will take care of that and redeploy the relevant tasks. But because of reasons like location preference and limited total resources,the failed task will be scheduled to be deployed on the same host, then the task will fail again and again, many times. The primary cause of this problem is the mismatching of task and resource. Currently, the resource allocation algorithm does not take these into consideration. We introduce the blacklist mechanism to solve this problem. The basic idea is that when a task fails too many times on some resource, the Scheduler will not assign the resource to that task. We have implemented this feature in our inner version of flink, and currently, it works fine. The following is the design draft, we would really appreciate it if you can review and comment. https://docs.google.com/document/d/1Qfb_QPd7CLcGT-kJjWSCdO8xFeobSCHF0vNcfiO4Bkw Best, Yingjie -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ |
Thanks for sharing this design document with the community Yingjie.
I like the design to pass the job specific blacklisted TMs as a scheduling constraint. This makes a lot of sense to me. Cheers, Till On Fri, Nov 2, 2018 at 4:51 PM yingjie <[hidden email]> wrote: > Hi everyone, > > This post proposes the blacklist mechanism as an enhancement of flink > scheduler. The motivation is as follows. > > In our clusters, jobs encounter Hardware and software environment problems > occasionally, including software library missing,bad hardware,resource > shortage like out of disk space,these problems will lead to task > failure,the > failover strategy will take care of that and redeploy the relevant tasks. > But because of reasons like location preference and limited total > resources,the failed task will be scheduled to be deployed on the same > host, > then the task will fail again and again, many times. The primary cause of > this problem is the mismatching of task and resource. Currently, the > resource allocation algorithm does not take these into consideration. > > We introduce the blacklist mechanism to solve this problem. The basic idea > is that when a task fails too many times on some resource, the Scheduler > will not assign the resource to that task. We have implemented this feature > in our inner version of flink, and currently, it works fine. > > The following is the design draft, we would really appreciate it if you can > review and comment. > > https://docs.google.com/document/d/1Qfb_QPd7CLcGT-kJjWSCdO8xFeobSCHF0vNcfiO4Bkw > > Best, > Yingjie > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > |
Thanks yingjie for bringing this discussion.
I encountered this issue during failover and also noticed other users complainting related issues in community before. So it is necessary to have this mechanism for enhancing schedule process first, and then enrich the internal rules step by step. Wish this feature working in the next major release. :) Best, Zhijiang ------------------------------------------------------------------ 发件人:Till Rohrmann <[hidden email]> 发送时间:2018年11月5日(星期一) 18:43 收件人:dev <[hidden email]> 主 题:Re: [DISCUSS]Enhancing flink scheduler by implementing blacklist mechanism Thanks for sharing this design document with the community Yingjie. I like the design to pass the job specific blacklisted TMs as a scheduling constraint. This makes a lot of sense to me. Cheers, Till On Fri, Nov 2, 2018 at 4:51 PM yingjie <[hidden email]> wrote: > Hi everyone, > > This post proposes the blacklist mechanism as an enhancement of flink > scheduler. The motivation is as follows. > > In our clusters, jobs encounter Hardware and software environment problems > occasionally, including software library missing,bad hardware,resource > shortage like out of disk space,these problems will lead to task > failure,the > failover strategy will take care of that and redeploy the relevant tasks. > But because of reasons like location preference and limited total > resources,the failed task will be scheduled to be deployed on the same > host, > then the task will fail again and again, many times. The primary cause of > this problem is the mismatching of task and resource. Currently, the > resource allocation algorithm does not take these into consideration. > > We introduce the blacklist mechanism to solve this problem. The basic idea > is that when a task fails too many times on some resource, the Scheduler > will not assign the resource to that task. We have implemented this feature > in our inner version of flink, and currently, it works fine. > > The following is the design draft, we would really appreciate it if you can > review and comment. > > https://docs.google.com/document/d/1Qfb_QPd7CLcGT-kJjWSCdO8xFeobSCHF0vNcfiO4Bkw > > Best, > Yingjie > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > |
Hi yingjie,
Thanks for proposing the blacklist! I agree with that black list is important for job maintenance, since some jobs may not be able to failover automatically if some tasks are always scheduled to the problematic hosts or TMs. This will increase the burden of the operators since they need to pay more attention to the status of the jobs. I have read the proposal and left some comments. I think a problem is how we cooperator with external resource managers (like YARN or Mesos) so that they will apply for resource according to our blacklist. If they cannot fully obey the blacklist, then we may need to deal with the inappropriate resource. Looking forward to the future advance of this feature! Thanks again for the exciting proposal. Best, Yun Gao ------------------------------------------------------------------ From:zhijiang <[hidden email]> Send Time:2018 Nov 27 (Tue) 10:40 To:dev <[hidden email]> Subject:回复:[DISCUSS]Enhancing flink scheduler by implementing blacklist mechanism Thanks yingjie for bringing this discussion. I encountered this issue during failover and also noticed other users complainting related issues in community before. So it is necessary to have this mechanism for enhancing schedule process first, and then enrich the internal rules step by step. Wish this feature working in the next major release. :) Best, Zhijiang ------------------------------------------------------------------ 发件人:Till Rohrmann <[hidden email]> 发送时间:2018年11月5日(星期一) 18:43 收件人:dev <[hidden email]> 主 题:Re: [DISCUSS]Enhancing flink scheduler by implementing blacklist mechanism Thanks for sharing this design document with the community Yingjie. I like the design to pass the job specific blacklisted TMs as a scheduling constraint. This makes a lot of sense to me. Cheers, Till On Fri, Nov 2, 2018 at 4:51 PM yingjie <[hidden email]> wrote: > Hi everyone, > > This post proposes the blacklist mechanism as an enhancement of flink > scheduler. The motivation is as follows. > > In our clusters, jobs encounter Hardware and software environment problems > occasionally, including software library missing,bad hardware,resource > shortage like out of disk space,these problems will lead to task > failure,the > failover strategy will take care of that and redeploy the relevant tasks. > But because of reasons like location preference and limited total > resources,the failed task will be scheduled to be deployed on the same > host, > then the task will fail again and again, many times. The primary cause of > this problem is the mismatching of task and resource. Currently, the > resource allocation algorithm does not take these into consideration. > > We introduce the blacklist mechanism to solve this problem. The basic idea > is that when a task fails too many times on some resource, the Scheduler > will not assign the resource to that task. We have implemented this feature > in our inner version of flink, and currently, it works fine. > > The following is the design draft, we would really appreciate it if you can > review and comment. > > https://docs.google.com/document/d/1Qfb_QPd7CLcGT-kJjWSCdO8xFeobSCHF0vNcfiO4Bkw > > Best, > Yingjie > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > |
thanks yingjie to share this doc and I think this is very important feature
for production. As you mentioned in your document, an unhealthy node can cause a TM startup failure but cluster management may offer the same node for some reason. (I have encountered such a scenario in our production environment.) As your proposal RM can blacklist this unhealthy node because of the launch failure. I have some questions: Do you want every ResourceManager(MesosResoruceManager,YarnResourceManager) to implement this policy? If not, you want the Flink to implement this mechanism, I think the interface of current RM may be not enough. thanks. Yun Gao <[hidden email]> 于2018年11月28日周三 上午11:29写道: > Hi yingjie, > Thanks for proposing the blacklist! I agree with that black list is > important for job maintenance, since some jobs may not be able to failover > automatically if some tasks are always scheduled to the problematic hosts > or TMs. This will increase the burden of the operators since they need to > pay more attention to the status of the jobs. > > I have read the proposal and left some comments. I think a problem > is how we cooperator with external resource managers (like YARN or Mesos) > so that they will apply for resource according to our blacklist. If they > cannot fully obey the blacklist, then we may need to deal with the > inappropriate resource. > > Looking forward to the future advance of this feature! Thanks again > for the exciting proposal. > > > Best, > Yun Gao > > > > ------------------------------------------------------------------ > From:zhijiang <[hidden email]> > Send Time:2018 Nov 27 (Tue) 10:40 > To:dev <[hidden email]> > Subject:回复:[DISCUSS]Enhancing flink scheduler by implementing blacklist > mechanism > > Thanks yingjie for bringing this discussion. > > I encountered this issue during failover and also noticed other users > complainting related issues in community before. > So it is necessary to have this mechanism for enhancing schedule process > first, and then enrich the internal rules step by step. > Wish this feature working in the next major release. :) > > Best, > Zhijiang > ------------------------------------------------------------------ > 发件人:Till Rohrmann <[hidden email]> > 发送时间:2018年11月5日(星期一) 18:43 > 收件人:dev <[hidden email]> > 主 题:Re: [DISCUSS]Enhancing flink scheduler by implementing blacklist > mechanism > > Thanks for sharing this design document with the community Yingjie. > > I like the design to pass the job specific blacklisted TMs as a scheduling > constraint. This makes a lot of sense to me. > > Cheers, > Till > > On Fri, Nov 2, 2018 at 4:51 PM yingjie <[hidden email]> wrote: > > > Hi everyone, > > > > This post proposes the blacklist mechanism as an enhancement of flink > > scheduler. The motivation is as follows. > > > > In our clusters, jobs encounter Hardware and software environment > problems > > occasionally, including software library missing,bad hardware,resource > > shortage like out of disk space,these problems will lead to task > > failure,the > > failover strategy will take care of that and redeploy the relevant tasks. > > But because of reasons like location preference and limited total > > resources,the failed task will be scheduled to be deployed on the same > > host, > > then the task will fail again and again, many times. The primary cause of > > this problem is the mismatching of task and resource. Currently, the > > resource allocation algorithm does not take these into consideration. > > > > We introduce the blacklist mechanism to solve this problem. The basic > idea > > is that when a task fails too many times on some resource, the Scheduler > > will not assign the resource to that task. We have implemented this > feature > > in our inner version of flink, and currently, it works fine. > > > > The following is the design draft, we would really appreciate it if you > can > > review and comment. > > > > > https://docs.google.com/document/d/1Qfb_QPd7CLcGT-kJjWSCdO8xFeobSCHF0vNcfiO4Bkw > > > > Best, > > Yingjie > > > > > > > > -- > > Sent from: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > > > > > |
This is a quite useful feature for production use. I once encountered such
a case in production cluster and the Storm jobs used 2 hours to stabilize. After that, we implemented similar blacklist solution for storm. The design doc looks good to me. Some minor suggestions about blacklist removal: in some cases, when the cluster is problematic (the whole cluster), the worst case is that all the nodes are in blacklist if in-proper configured blacklist size. Then the whole cluster is unavailable for allocation and have to wait for the removal timeout. This is much easier to happen on small cluster. The solution I once used was: we will not allocate nodes in blacklist if resource available. But, if no resource available, we will remove nodes from blacklist via some LRU algorithm to allocate. Hope this help. Thanks Weihua Guowei Ma <[hidden email]> 于2018年11月28日周三 下午2:23写道: > thanks yingjie to share this doc and I think this is very important feature > for production. > > As you mentioned in your document, an unhealthy node can cause a TM > startup failure but cluster management may offer the same node for some > reason. (I have encountered such a scenario in our production environment.) > As your proposal RM can blacklist this unhealthy node because of the > launch failure. > > I have some questions: > Do you want every > ResourceManager(MesosResoruceManager,YarnResourceManager) to implement > this policy? > If not, you want the Flink to implement this mechanism, I think the > interface of current RM may be not enough. > > thanks. > > > Yun Gao <[hidden email]> 于2018年11月28日周三 上午11:29写道: > > > Hi yingjie, > > Thanks for proposing the blacklist! I agree with that black list is > > important for job maintenance, since some jobs may not be able to > failover > > automatically if some tasks are always scheduled to the problematic hosts > > or TMs. This will increase the burden of the operators since they need to > > pay more attention to the status of the jobs. > > > > I have read the proposal and left some comments. I think a problem > > is how we cooperator with external resource managers (like YARN or Mesos) > > so that they will apply for resource according to our blacklist. If they > > cannot fully obey the blacklist, then we may need to deal with the > > inappropriate resource. > > > > Looking forward to the future advance of this feature! Thanks again > > for the exciting proposal. > > > > > > Best, > > Yun Gao > > > > > > > > ------------------------------------------------------------------ > > From:zhijiang <[hidden email]> > > Send Time:2018 Nov 27 (Tue) 10:40 > > To:dev <[hidden email]> > > Subject:回复:[DISCUSS]Enhancing flink scheduler by implementing blacklist > > mechanism > > > > Thanks yingjie for bringing this discussion. > > > > I encountered this issue during failover and also noticed other users > > complainting related issues in community before. > > So it is necessary to have this mechanism for enhancing schedule process > > first, and then enrich the internal rules step by step. > > Wish this feature working in the next major release. :) > > > > Best, > > Zhijiang > > ------------------------------------------------------------------ > > 发件人:Till Rohrmann <[hidden email]> > > 发送时间:2018年11月5日(星期一) 18:43 > > 收件人:dev <[hidden email]> > > 主 题:Re: [DISCUSS]Enhancing flink scheduler by implementing blacklist > > mechanism > > > > Thanks for sharing this design document with the community Yingjie. > > > > I like the design to pass the job specific blacklisted TMs as a > scheduling > > constraint. This makes a lot of sense to me. > > > > Cheers, > > Till > > > > On Fri, Nov 2, 2018 at 4:51 PM yingjie <[hidden email]> wrote: > > > > > Hi everyone, > > > > > > This post proposes the blacklist mechanism as an enhancement of flink > > > scheduler. The motivation is as follows. > > > > > > In our clusters, jobs encounter Hardware and software environment > > problems > > > occasionally, including software library missing,bad hardware,resource > > > shortage like out of disk space,these problems will lead to task > > > failure,the > > > failover strategy will take care of that and redeploy the relevant > tasks. > > > But because of reasons like location preference and limited total > > > resources,the failed task will be scheduled to be deployed on the same > > > host, > > > then the task will fail again and again, many times. The primary cause > of > > > this problem is the mismatching of task and resource. Currently, the > > > resource allocation algorithm does not take these into consideration. > > > > > > We introduce the blacklist mechanism to solve this problem. The basic > > idea > > > is that when a task fails too many times on some resource, the > Scheduler > > > will not assign the resource to that task. We have implemented this > > feature > > > in our inner version of flink, and currently, it works fine. > > > > > > The following is the design draft, we would really appreciate it if you > > can > > > review and comment. > > > > > > > > > https://docs.google.com/document/d/1Qfb_QPd7CLcGT-kJjWSCdO8xFeobSCHF0vNcfiO4Bkw > > > > > > Best, > > > Yingjie > > > > > > > > > > > > -- > > > Sent from: > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > > > > > > > > > > |
Thanks, Weihua.
Your suggestions make a lot of sense to me. Currently, all blacklisted resources will be released from blacklist if there is no available resource. Maybe only releasing a portion of the blacklisted resources based on the number of slots needed and some LRU like algorithm is a better choice. Best, Yingjie Weihua Jiang <[hidden email]> 于2018年11月28日周三 下午2:57写道: > This is a quite useful feature for production use. I once encountered such > a case in production cluster and the Storm jobs used 2 hours to stabilize. > After that, we implemented similar blacklist solution for storm. > > The design doc looks good to me. Some minor suggestions about blacklist > removal: in some cases, when the cluster is problematic (the whole > cluster), the worst case is that all the nodes are in blacklist if > in-proper configured blacklist size. Then the whole cluster is unavailable > for allocation and have to wait for the removal timeout. This is much > easier to happen on small cluster. > > The solution I once used was: we will not allocate nodes in blacklist if > resource available. But, if no resource available, we will remove nodes > from blacklist via some LRU algorithm to allocate. > > Hope this help. > > Thanks > Weihua > > Guowei Ma <[hidden email]> 于2018年11月28日周三 下午2:23写道: > > > thanks yingjie to share this doc and I think this is very important > feature > > for production. > > > > As you mentioned in your document, an unhealthy node can cause a TM > > startup failure but cluster management may offer the same node for some > > reason. (I have encountered such a scenario in our production > environment.) > > As your proposal RM can blacklist this unhealthy node because of the > > launch failure. > > > > I have some questions: > > Do you want every > > ResourceManager(MesosResoruceManager,YarnResourceManager) to implement > > this policy? > > If not, you want the Flink to implement this mechanism, I think the > > interface of current RM may be not enough. > > > > thanks. > > > > > > Yun Gao <[hidden email]> 于2018年11月28日周三 上午11:29写道: > > > > > Hi yingjie, > > > Thanks for proposing the blacklist! I agree with that black list > is > > > important for job maintenance, since some jobs may not be able to > > failover > > > automatically if some tasks are always scheduled to the problematic > hosts > > > or TMs. This will increase the burden of the operators since they need > to > > > pay more attention to the status of the jobs. > > > > > > I have read the proposal and left some comments. I think a > problem > > > is how we cooperator with external resource managers (like YARN or > Mesos) > > > so that they will apply for resource according to our blacklist. If > they > > > cannot fully obey the blacklist, then we may need to deal with the > > > inappropriate resource. > > > > > > Looking forward to the future advance of this feature! Thanks > again > > > for the exciting proposal. > > > > > > > > > Best, > > > Yun Gao > > > > > > > > > > > > ------------------------------------------------------------------ > > > From:zhijiang <[hidden email]> > > > Send Time:2018 Nov 27 (Tue) 10:40 > > > To:dev <[hidden email]> > > > Subject:回复:[DISCUSS]Enhancing flink scheduler by implementing blacklist > > > mechanism > > > > > > Thanks yingjie for bringing this discussion. > > > > > > I encountered this issue during failover and also noticed other users > > > complainting related issues in community before. > > > So it is necessary to have this mechanism for enhancing schedule > process > > > first, and then enrich the internal rules step by step. > > > Wish this feature working in the next major release. :) > > > > > > Best, > > > Zhijiang > > > ------------------------------------------------------------------ > > > 发件人:Till Rohrmann <[hidden email]> > > > 发送时间:2018年11月5日(星期一) 18:43 > > > 收件人:dev <[hidden email]> > > > 主 题:Re: [DISCUSS]Enhancing flink scheduler by implementing blacklist > > > mechanism > > > > > > Thanks for sharing this design document with the community Yingjie. > > > > > > I like the design to pass the job specific blacklisted TMs as a > > scheduling > > > constraint. This makes a lot of sense to me. > > > > > > Cheers, > > > Till > > > > > > On Fri, Nov 2, 2018 at 4:51 PM yingjie <[hidden email]> > wrote: > > > > > > > Hi everyone, > > > > > > > > This post proposes the blacklist mechanism as an enhancement of flink > > > > scheduler. The motivation is as follows. > > > > > > > > In our clusters, jobs encounter Hardware and software environment > > > problems > > > > occasionally, including software library missing,bad > hardware,resource > > > > shortage like out of disk space,these problems will lead to task > > > > failure,the > > > > failover strategy will take care of that and redeploy the relevant > > tasks. > > > > But because of reasons like location preference and limited total > > > > resources,the failed task will be scheduled to be deployed on the > same > > > > host, > > > > then the task will fail again and again, many times. The primary > cause > > of > > > > this problem is the mismatching of task and resource. Currently, the > > > > resource allocation algorithm does not take these into consideration. > > > > > > > > We introduce the blacklist mechanism to solve this problem. The basic > > > idea > > > > is that when a task fails too many times on some resource, the > > Scheduler > > > > will not assign the resource to that task. We have implemented this > > > feature > > > > in our inner version of flink, and currently, it works fine. > > > > > > > > The following is the design draft, we would really appreciate it if > you > > > can > > > > review and comment. > > > > > > > > > > > > > > https://docs.google.com/document/d/1Qfb_QPd7CLcGT-kJjWSCdO8xFeobSCHF0vNcfiO4Bkw > > > > > > > > Best, > > > > Yingjie > > > > > > > > > > > > > > > > -- > > > > Sent from: > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > > > > > > > > > > > > > > > > |
In reply to this post by Guowei Ma
You are right. I think, at least, we need a new interface to be implemented
to collect the failure information. Best, Yingjie Guowei Ma <[hidden email]> 于2018年11月28日周三 下午2:23写道: > thanks yingjie to share this doc and I think this is very important feature > for production. > > As you mentioned in your document, an unhealthy node can cause a TM > startup failure but cluster management may offer the same node for some > reason. (I have encountered such a scenario in our production environment.) > As your proposal RM can blacklist this unhealthy node because of the > launch failure. > > I have some questions: > Do you want every > ResourceManager(MesosResoruceManager,YarnResourceManager) to implement > this policy? > If not, you want the Flink to implement this mechanism, I think the > interface of current RM may be not enough. > > thanks. > > > Yun Gao <[hidden email]> 于2018年11月28日周三 上午11:29写道: > > > Hi yingjie, > > Thanks for proposing the blacklist! I agree with that black list is > > important for job maintenance, since some jobs may not be able to > failover > > automatically if some tasks are always scheduled to the problematic hosts > > or TMs. This will increase the burden of the operators since they need to > > pay more attention to the status of the jobs. > > > > I have read the proposal and left some comments. I think a problem > > is how we cooperator with external resource managers (like YARN or Mesos) > > so that they will apply for resource according to our blacklist. If they > > cannot fully obey the blacklist, then we may need to deal with the > > inappropriate resource. > > > > Looking forward to the future advance of this feature! Thanks again > > for the exciting proposal. > > > > > > Best, > > Yun Gao > > > > > > > > ------------------------------------------------------------------ > > From:zhijiang <[hidden email]> > > Send Time:2018 Nov 27 (Tue) 10:40 > > To:dev <[hidden email]> > > Subject:回复:[DISCUSS]Enhancing flink scheduler by implementing blacklist > > mechanism > > > > Thanks yingjie for bringing this discussion. > > > > I encountered this issue during failover and also noticed other users > > complainting related issues in community before. > > So it is necessary to have this mechanism for enhancing schedule process > > first, and then enrich the internal rules step by step. > > Wish this feature working in the next major release. :) > > > > Best, > > Zhijiang > > ------------------------------------------------------------------ > > 发件人:Till Rohrmann <[hidden email]> > > 发送时间:2018年11月5日(星期一) 18:43 > > 收件人:dev <[hidden email]> > > 主 题:Re: [DISCUSS]Enhancing flink scheduler by implementing blacklist > > mechanism > > > > Thanks for sharing this design document with the community Yingjie. > > > > I like the design to pass the job specific blacklisted TMs as a > scheduling > > constraint. This makes a lot of sense to me. > > > > Cheers, > > Till > > > > On Fri, Nov 2, 2018 at 4:51 PM yingjie <[hidden email]> wrote: > > > > > Hi everyone, > > > > > > This post proposes the blacklist mechanism as an enhancement of flink > > > scheduler. The motivation is as follows. > > > > > > In our clusters, jobs encounter Hardware and software environment > > problems > > > occasionally, including software library missing,bad hardware,resource > > > shortage like out of disk space,these problems will lead to task > > > failure,the > > > failover strategy will take care of that and redeploy the relevant > tasks. > > > But because of reasons like location preference and limited total > > > resources,the failed task will be scheduled to be deployed on the same > > > host, > > > then the task will fail again and again, many times. The primary cause > of > > > this problem is the mismatching of task and resource. Currently, the > > > resource allocation algorithm does not take these into consideration. > > > > > > We introduce the blacklist mechanism to solve this problem. The basic > > idea > > > is that when a task fails too many times on some resource, the > Scheduler > > > will not assign the resource to that task. We have implemented this > > feature > > > in our inner version of flink, and currently, it works fine. > > > > > > The following is the design draft, we would really appreciate it if you > > can > > > review and comment. > > > > > > > > > https://docs.google.com/document/d/1Qfb_QPd7CLcGT-kJjWSCdO8xFeobSCHF0vNcfiO4Bkw > > > > > > Best, > > > Yingjie > > > > > > > > > > > > -- > > > Sent from: > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > > > > > > > > > > |
In reply to this post by Yun Gao
Thanks, Yun.
If the external resource manager cannot fully obey the blacklist, then we have two choices. The first one is do nothing and using the returned resources directly. The other one is checking the returned resource, then releasing the unsatisfied resource and reallocating until success or timeout. The second one may be better. Best, Yingjie Yun Gao <[hidden email]> 于2018年11月28日周三 上午11:29写道: > Hi yingjie, > Thanks for proposing the blacklist! I agree with that black list is > important for job maintenance, since some jobs may not be able to failover > automatically if some tasks are always scheduled to the problematic hosts > or TMs. This will increase the burden of the operators since they need to > pay more attention to the status of the jobs. > > I have read the proposal and left some comments. I think a problem > is how we cooperator with external resource managers (like YARN or Mesos) > so that they will apply for resource according to our blacklist. If they > cannot fully obey the blacklist, then we may need to deal with the > inappropriate resource. > > Looking forward to the future advance of this feature! Thanks again > for the exciting proposal. > > > Best, > Yun Gao > > > > ------------------------------------------------------------------ > From:zhijiang <[hidden email]> > Send Time:2018 Nov 27 (Tue) 10:40 > To:dev <[hidden email]> > Subject:回复:[DISCUSS]Enhancing flink scheduler by implementing blacklist > mechanism > > Thanks yingjie for bringing this discussion. > > I encountered this issue during failover and also noticed other users > complainting related issues in community before. > So it is necessary to have this mechanism for enhancing schedule process > first, and then enrich the internal rules step by step. > Wish this feature working in the next major release. :) > > Best, > Zhijiang > ------------------------------------------------------------------ > 发件人:Till Rohrmann <[hidden email]> > 发送时间:2018年11月5日(星期一) 18:43 > 收件人:dev <[hidden email]> > 主 题:Re: [DISCUSS]Enhancing flink scheduler by implementing blacklist > mechanism > > Thanks for sharing this design document with the community Yingjie. > > I like the design to pass the job specific blacklisted TMs as a scheduling > constraint. This makes a lot of sense to me. > > Cheers, > Till > > On Fri, Nov 2, 2018 at 4:51 PM yingjie <[hidden email]> wrote: > > > Hi everyone, > > > > This post proposes the blacklist mechanism as an enhancement of flink > > scheduler. The motivation is as follows. > > > > In our clusters, jobs encounter Hardware and software environment > problems > > occasionally, including software library missing,bad hardware,resource > > shortage like out of disk space,these problems will lead to task > > failure,the > > failover strategy will take care of that and redeploy the relevant tasks. > > But because of reasons like location preference and limited total > > resources,the failed task will be scheduled to be deployed on the same > > host, > > then the task will fail again and again, many times. The primary cause of > > this problem is the mismatching of task and resource. Currently, the > > resource allocation algorithm does not take these into consideration. > > > > We introduce the blacklist mechanism to solve this problem. The basic > idea > > is that when a task fails too many times on some resource, the Scheduler > > will not assign the resource to that task. We have implemented this > feature > > in our inner version of flink, and currently, it works fine. > > > > The following is the design draft, we would really appreciate it if you > can > > review and comment. > > > > > https://docs.google.com/document/d/1Qfb_QPd7CLcGT-kJjWSCdO8xFeobSCHF0vNcfiO4Bkw > > > > Best, > > Yingjie > > > > > > > > -- > > Sent from: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > > > > > |
Thank you for the detailed design proposal. I think in general, some
blacklisting is a good idea to have. The proposal goes immediately very far and adds a lot of additional complexity, so let me play devil's advocate here and challenge the proposal a bit. *Concerns / Doubts* - The proposal here is trying to solve all possible problems in a super big solution. It adds a lot of complexity and special case code. - The complexity is designed into a component which is already very complex: The scheduler. Work on locality-optimized scheduling, better batch scheduling, speculative execution, etc. already makes the scheduler very complex. I think we should try to keep additional complexity out if possible. - Especially in a quickly evolving project with an open community like Flink, simplicity is a great good. It allows us to move faster in the future and makes outside contributions easier to write, and easier to review and to ensure quality. - As an example: I think that aspects like "missing python library" are artifacts of previous generation resource managers. The (Docker) image based systems (Mesos, K8s, ...) should simply not have this problems any more. Adding code to handle such cases seems to me as adding "instant legacy" code. *Suggestions* (these are not a fleshed out proposals, but thoughts for directions to go into) - We should try and delegate as much of this work as possible to the resource management framework, like Yarn, Mesos, K8s, etc. - Within Flink, we should see if we can have blacklisting code only in the Resource Manager, plus by defining certain exception types. - For each type of blacklisting, let's look at where we would need to implement it, and whether it is a good tradeoff between between breadth of improvement versus code complexity and maintenance effort. - Example: We can have an exception type that indicates a failure due to an environment issue (full disk, missing parts in the environment, network connectivity issue). That should trigger an exit of the TaskManager with a non-zero code. Yarn/Mesos/K8s should have some own blacklist mechanisms, which will not schedule an application's process on a certain host again, if it failed there too often. This might be an easy way to delegate complexity and keep Flink's own code and architecture simple. - Having no blacklisting code in the scheduler, but only in the resource manager means most likely that blacklisting applies to all jobs of a session together. I think we should look at whether this is acceptable, because it greatly simplifies the problem. In which cases is that not sufficient? - If there are more fine grained blacklisting needed in certain unique deployments, would it be fair to say that this should require the implementation of a specialized scheduler (rather than add the code to the core Flink scheduler?). With some of the current refactorings, the next Flink releases should make it possible to add custom schedulers. Looking forward to your thoughts... Best, Stephan On Thu, Nov 29, 2018 at 4:33 AM Yingjie Cao <[hidden email]> wrote: > Thanks, Yun. > > If the external resource manager cannot fully obey the blacklist, then we > have two > choices. The first one is do nothing and using the returned resources > directly. The > other one is checking the returned resource, then releasing the unsatisfied > resource > and reallocating until success or timeout. The second one may be better. > > > Best, > Yingjie > > Yun Gao <[hidden email]> 于2018年11月28日周三 上午11:29写道: > > > Hi yingjie, > > Thanks for proposing the blacklist! I agree with that black list is > > important for job maintenance, since some jobs may not be able to > failover > > automatically if some tasks are always scheduled to the problematic hosts > > or TMs. This will increase the burden of the operators since they need to > > pay more attention to the status of the jobs. > > > > I have read the proposal and left some comments. I think a problem > > is how we cooperator with external resource managers (like YARN or Mesos) > > so that they will apply for resource according to our blacklist. If they > > cannot fully obey the blacklist, then we may need to deal with the > > inappropriate resource. > > > > Looking forward to the future advance of this feature! Thanks again > > for the exciting proposal. > > > > > > Best, > > Yun Gao > > > > > > > > ------------------------------------------------------------------ > > From:zhijiang <[hidden email]> > > Send Time:2018 Nov 27 (Tue) 10:40 > > To:dev <[hidden email]> > > Subject:回复:[DISCUSS]Enhancing flink scheduler by implementing blacklist > > mechanism > > > > Thanks yingjie for bringing this discussion. > > > > I encountered this issue during failover and also noticed other users > > complainting related issues in community before. > > So it is necessary to have this mechanism for enhancing schedule process > > first, and then enrich the internal rules step by step. > > Wish this feature working in the next major release. :) > > > > Best, > > Zhijiang > > ------------------------------------------------------------------ > > 发件人:Till Rohrmann <[hidden email]> > > 发送时间:2018年11月5日(星期一) 18:43 > > 收件人:dev <[hidden email]> > > 主 题:Re: [DISCUSS]Enhancing flink scheduler by implementing blacklist > > mechanism > > > > Thanks for sharing this design document with the community Yingjie. > > > > I like the design to pass the job specific blacklisted TMs as a > scheduling > > constraint. This makes a lot of sense to me. > > > > Cheers, > > Till > > > > On Fri, Nov 2, 2018 at 4:51 PM yingjie <[hidden email]> wrote: > > > > > Hi everyone, > > > > > > This post proposes the blacklist mechanism as an enhancement of flink > > > scheduler. The motivation is as follows. > > > > > > In our clusters, jobs encounter Hardware and software environment > > problems > > > occasionally, including software library missing,bad hardware,resource > > > shortage like out of disk space,these problems will lead to task > > > failure,the > > > failover strategy will take care of that and redeploy the relevant > tasks. > > > But because of reasons like location preference and limited total > > > resources,the failed task will be scheduled to be deployed on the same > > > host, > > > then the task will fail again and again, many times. The primary cause > of > > > this problem is the mismatching of task and resource. Currently, the > > > resource allocation algorithm does not take these into consideration. > > > > > > We introduce the blacklist mechanism to solve this problem. The basic > > idea > > > is that when a task fails too many times on some resource, the > Scheduler > > > will not assign the resource to that task. We have implemented this > > feature > > > in our inner version of flink, and currently, it works fine. > > > > > > The following is the design draft, we would really appreciate it if you > > can > > > review and comment. > > > > > > > > > https://docs.google.com/document/d/1Qfb_QPd7CLcGT-kJjWSCdO8xFeobSCHF0vNcfiO4Bkw > > > > > > Best, > > > Yingjie > > > > > > > > > > > > -- > > > Sent from: > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > > > > > > > > > > |
Free forum by Nabble | Edit this page |