回复:Dataset and eager scheduling

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

回复:Dataset and eager scheduling

wangzhijiang
Hi,
    From my understand,  if you do not care resource waste and confirm there are enough resources in cluster, you can set EAGER schedule mode for batch job.
    From optimizer aspect, if not set the PIPELINED_FORCED hint for ExecutionMode, for some special topology cases, the optimizer would consider BATCH DataExchangeMode to avoid dead lock risk. That means the producer tasks should first deploy and output the data. After the producer tasks finish, the consumer tasks will be scheduled and start to consume data.And it is exactly the case of FROM_SOURCE schedule mode. For this case, if use EAGER mode for replacement, the consumer task may be do nothing after startup until the producer tasks finish, so it wastes resources.  But for PIPELINED DataExchangeMode, EAGER schedule mode can make sense because the consumer task can request data once the producer task ouput the first data.
    Maybe my understanding is not very accurate, welcome any discuss!

Cheers,
zhijiang
------------------------------------------------------------------发件人:CPC <[hidden email]>发送时间:2017年3月2日(星期四) 18:52收件人:dev <[hidden email]>主 题:Dataset and eager scheduling
Hi all,

Currently our team trying implement a runtime operator also playing with
scheduler. We are trying to understand batch optimizer but it will take
some time. What we want to know is whether changing batch scheduling mode
from LAZY_FROM_SOURCES to EAGER could affect optimizer? I mean whether
optimizer have some strong assumptions that batch jobs scheduling mode is
always lazy_from_sources?

Thanks...