What component should take care of RunningJobRegistry?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

What component should take care of RunningJobRegistry?

tison
Hi dev,

While exploring FLINK RunningJobRegistry usage, I find it has three
statuses and two components(Dispatcher and JobManagerRunner) have access to
it. It is a bit wired and below shared my thoughts.

1. Three statuses(PENDING/RUNNING/DONE). The original version of
RunningJobRegistry represents status of a job only RUNNING or not, and in
FLINK-5501 <https://issues.apache.org/jira/browse/FLINK-5501> we extend it
to DONE.

The proposal in FLINK-5501 said "If the mini cluster starts multi
JobManagerRunner, and the leader JobManagerRunner already finishes the job
to set the job status FINISHED, other JobManagerRunner will exit after
grants the leadership again."

However, we only start one JobManagerRunner for a job. And if the previous
runner failed, dispatcher will re-launch a new one. All of job scheduling
should be taken care of by Dispatcher. There is no standby JobManagerRunner.

Besides, JobSchedulingStatue.DONE would be cleaned up once Dispatcher got
notified that the job reached globally terminal state(FLINK-9421
<https://issues.apache.org/jira/browse/FLINK-9421>). So even we add a DONE
status, it could be cleaned up and even in the case FLINK-5501 described,
it is possible that a following JobManagerRunner starts a "finished" job.

Thus, whether a job is running is valuable(indicate if the JobManagerRunner
should reconcile), but the DONE status, has lost its meaning. I'd like to
propose using only PENDING/RUNNING statuses in RunningJobRegistry.

2. Both JobManagerRunner and Dispatcher get access to RunningJobRegistry.

RunningJobRegistry is a concept of job scheduling and is ideally used in
Dispatcher. We currently read from RunningJobRegistry in Dispatcher and
read from/write to RunningJobRegistry in JobManagerRunner. To unify the
usage in one component, I list two possible options.

(2.1). Only access RunningJobRegistry in Dispatcher

This is the way following that Dispatcher maintains job scheduling. When a
job submitted and the Dispatcher found that there was not other runner of
this job, the Dispatcher get JobSchedulingStatus from RunningJobRegistry,
and passed whether the job was RUNNING or not to the JobManagerRunner. And
then Dispatcher set the JobSchedulingStatus to RUNNING. When the job reached
globally terminal state, Dispatcher cleaned up RunningJobRegistry data.

Pros: Concept follows what it means, i.e., Dispatcher schedules jobs.

Cons: We get JobSchedulingStatue earlier than it actually needed(before
JobManagerRunner granted leadership and be able to actually start the job).
Set the job status to RUNNING before the JobMaster started(this is,
however, based on how we regard a status as RUNNING).

(2.2) Only access RunningJobRegistry in JobManagerRunner

In this way we checked JobSchedulingStatus when JobManagerRunner granted
leadership. Set RUNNING when a JobMaster started and cleanup in
JobManagerRunner#jobReachedGloballyTerminalState.

Pros: JobManagerRunner reads/writes RunningJobRegistry closer to when the
status required and updated (than (2.1)).

Cons: A JobManager should take care of running a job instead of scheduling
a job.

With both of options above, we possibly run into a false-positive case that
we set the job status to RUNNING but the JM failed before any TM launched.

Do you guys think the issues mentioned above valid? If so, what do you
think to resolve it?

Best,
tison.