YARN parallelism vs Config

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

YARN parallelism vs Config

Dominik Wosiński
Hey,

I was wondering about the relation between the parallelism set by YARN in
Yarn properties file. Currently, as far as I know there is only one
execution of `writeYarnPropertiesFIle` method and it sets the parallelism
in the YARN properties to the number of workers * number of slots per
worker. But doesn't the `flink-conf.yaml` take the precedence in resolving
the configuration ? I am trying to understand the reasoning between always
setting the Yarn properties to the max available slots and whether this
will be used at all, since there is a default value  in flink config for
paralellism.

Thanks in advance,
Best Regards,
Dom.
Reply | Threaded
Open this post in threaded view
|

Re: YARN parallelism vs Config

Jeff Zhang
Hi Dom.

I believe this is for per job scenario. That means you create flink cluster
in yarn cluster when you submitting your flink. Since this is is per job
scenario (there's only one job in this flink cluster), so that it makes
more sense to set up the parallelism to be workers * number of slots per
worker. Otherwise either you can not get enough slots if the parallelism in
flink-conf.yaml is too large or some idle containers will be left there if
parallelism in flink-conf.yaml is too small.


Dominik Wosiński <[hidden email]> 于2019年7月20日周六 上午5:36写道:

> Hey,
>
> I was wondering about the relation between the parallelism set by YARN in
> Yarn properties file. Currently, as far as I know there is only one
> execution of `writeYarnPropertiesFIle` method and it sets the parallelism
> in the YARN properties to the number of workers * number of slots per
> worker. But doesn't the `flink-conf.yaml` take the precedence in resolving
> the configuration ? I am trying to understand the reasoning between always
> setting the Yarn properties to the max available slots and whether this
> will be used at all, since there is a default value  in flink config for
> paralellism.
>
> Thanks in advance,
> Best Regards,
> Dom.
>


--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: YARN parallelism vs Config

Till Rohrmann
Hi Dominik,

the yarn-properties-USER file is mainly a legacy artifact which served the
following purpose: Back in the days when starting a Yarn session cluster,
one needed to specify the number of task managers with which to start the
cluster. This number could not be changed. If now a user wanted to submit a
job to this cluster via the client, we wanted to submit it with the maximum
parallelism possible if no parallelism was specified. In order to find this
parallelism out, the number of task managers times the number of slots is
written out to the yarn properties file from which it can be read.

Nowadays, this does not make much sense anymore since the Yarn session
cluster can dynamically allocate and deallocate containers. However, the
value from the yarn properties file still overwrites the value specified in
the Flink configuration. I think with the refactoring of the client we
should refactor this behaviour and remove the yarn properties file.

Cheers,
Till

On Tue, Jul 23, 2019 at 4:06 AM Jeff Zhang <[hidden email]> wrote:

> Hi Dom.
>
> I believe this is for per job scenario. That means you create flink cluster
> in yarn cluster when you submitting your flink. Since this is is per job
> scenario (there's only one job in this flink cluster), so that it makes
> more sense to set up the parallelism to be workers * number of slots per
> worker. Otherwise either you can not get enough slots if the parallelism in
> flink-conf.yaml is too large or some idle containers will be left there if
> parallelism in flink-conf.yaml is too small.
>
>
> Dominik Wosiński <[hidden email]> 于2019年7月20日周六 上午5:36写道:
>
> > Hey,
> >
> > I was wondering about the relation between the parallelism set by YARN in
> > Yarn properties file. Currently, as far as I know there is only one
> > execution of `writeYarnPropertiesFIle` method and it sets the parallelism
> > in the YARN properties to the number of workers * number of slots per
> > worker. But doesn't the `flink-conf.yaml` take the precedence in
> resolving
> > the configuration ? I am trying to understand the reasoning between
> always
> > setting the Yarn properties to the max available slots and whether this
> > will be used at all, since there is a default value  in flink config for
> > paralellism.
> >
> > Thanks in advance,
> > Best Regards,
> > Dom.
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>