Hi dev,
While exploring FLINK RunningJobRegistry usage, I find it has three statuses and two components(Dispatcher and JobManagerRunner) have access to it. It is a bit wired and below shared my thoughts. 1. Three statuses(PENDING/RUNNING/DONE). The original version of RunningJobRegistry represents status of a job only RUNNING or not, and in FLINK-5501 <https://issues.apache.org/jira/browse/FLINK-5501> we extend it to DONE. The proposal in FLINK-5501 said "If the mini cluster starts multi JobManagerRunner, and the leader JobManagerRunner already finishes the job to set the job status FINISHED, other JobManagerRunner will exit after grants the leadership again." However, we only start one JobManagerRunner for a job. And if the previous runner failed, dispatcher will re-launch a new one. All of job scheduling should be taken care of by Dispatcher. There is no standby JobManagerRunner. Besides, JobSchedulingStatue.DONE would be cleaned up once Dispatcher got notified that the job reached globally terminal state(FLINK-9421 <https://issues.apache.org/jira/browse/FLINK-9421>). So even we add a DONE status, it could be cleaned up and even in the case FLINK-5501 described, it is possible that a following JobManagerRunner starts a "finished" job. Thus, whether a job is running is valuable(indicate if the JobManagerRunner should reconcile), but the DONE status, has lost its meaning. I'd like to propose using only PENDING/RUNNING statuses in RunningJobRegistry. 2. Both JobManagerRunner and Dispatcher get access to RunningJobRegistry. RunningJobRegistry is a concept of job scheduling and is ideally used in Dispatcher. We currently read from RunningJobRegistry in Dispatcher and read from/write to RunningJobRegistry in JobManagerRunner. To unify the usage in one component, I list two possible options. (2.1). Only access RunningJobRegistry in Dispatcher This is the way following that Dispatcher maintains job scheduling. When a job submitted and the Dispatcher found that there was not other runner of this job, the Dispatcher get JobSchedulingStatus from RunningJobRegistry, and passed whether the job was RUNNING or not to the JobManagerRunner. And then Dispatcher set the JobSchedulingStatus to RUNNING. When the job reached globally terminal state, Dispatcher cleaned up RunningJobRegistry data. Pros: Concept follows what it means, i.e., Dispatcher schedules jobs. Cons: We get JobSchedulingStatue earlier than it actually needed(before JobManagerRunner granted leadership and be able to actually start the job). Set the job status to RUNNING before the JobMaster started(this is, however, based on how we regard a status as RUNNING). (2.2) Only access RunningJobRegistry in JobManagerRunner In this way we checked JobSchedulingStatus when JobManagerRunner granted leadership. Set RUNNING when a JobMaster started and cleanup in JobManagerRunner#jobReachedGloballyTerminalState. Pros: JobManagerRunner reads/writes RunningJobRegistry closer to when the status required and updated (than (2.1)). Cons: A JobManager should take care of running a job instead of scheduling a job. With both of options above, we possibly run into a false-positive case that we set the job status to RUNNING but the JM failed before any TM launched. Do you guys think the issues mentioned above valid? If so, what do you think to resolve it? Best, tison. |
Free forum by Nabble | Edit this page |