Hey,
I was wondering about the relation between the parallelism set by YARN in Yarn properties file. Currently, as far as I know there is only one execution of `writeYarnPropertiesFIle` method and it sets the parallelism in the YARN properties to the number of workers * number of slots per worker. But doesn't the `flink-conf.yaml` take the precedence in resolving the configuration ? I am trying to understand the reasoning between always setting the Yarn properties to the max available slots and whether this will be used at all, since there is a default value in flink config for paralellism. Thanks in advance, Best Regards, Dom. |
Hi Dom.
I believe this is for per job scenario. That means you create flink cluster in yarn cluster when you submitting your flink. Since this is is per job scenario (there's only one job in this flink cluster), so that it makes more sense to set up the parallelism to be workers * number of slots per worker. Otherwise either you can not get enough slots if the parallelism in flink-conf.yaml is too large or some idle containers will be left there if parallelism in flink-conf.yaml is too small. Dominik Wosiński <[hidden email]> 于2019年7月20日周六 上午5:36写道: > Hey, > > I was wondering about the relation between the parallelism set by YARN in > Yarn properties file. Currently, as far as I know there is only one > execution of `writeYarnPropertiesFIle` method and it sets the parallelism > in the YARN properties to the number of workers * number of slots per > worker. But doesn't the `flink-conf.yaml` take the precedence in resolving > the configuration ? I am trying to understand the reasoning between always > setting the Yarn properties to the max available slots and whether this > will be used at all, since there is a default value in flink config for > paralellism. > > Thanks in advance, > Best Regards, > Dom. > -- Best Regards Jeff Zhang |
Hi Dominik,
the yarn-properties-USER file is mainly a legacy artifact which served the following purpose: Back in the days when starting a Yarn session cluster, one needed to specify the number of task managers with which to start the cluster. This number could not be changed. If now a user wanted to submit a job to this cluster via the client, we wanted to submit it with the maximum parallelism possible if no parallelism was specified. In order to find this parallelism out, the number of task managers times the number of slots is written out to the yarn properties file from which it can be read. Nowadays, this does not make much sense anymore since the Yarn session cluster can dynamically allocate and deallocate containers. However, the value from the yarn properties file still overwrites the value specified in the Flink configuration. I think with the refactoring of the client we should refactor this behaviour and remove the yarn properties file. Cheers, Till On Tue, Jul 23, 2019 at 4:06 AM Jeff Zhang <[hidden email]> wrote: > Hi Dom. > > I believe this is for per job scenario. That means you create flink cluster > in yarn cluster when you submitting your flink. Since this is is per job > scenario (there's only one job in this flink cluster), so that it makes > more sense to set up the parallelism to be workers * number of slots per > worker. Otherwise either you can not get enough slots if the parallelism in > flink-conf.yaml is too large or some idle containers will be left there if > parallelism in flink-conf.yaml is too small. > > > Dominik Wosiński <[hidden email]> 于2019年7月20日周六 上午5:36写道: > > > Hey, > > > > I was wondering about the relation between the parallelism set by YARN in > > Yarn properties file. Currently, as far as I know there is only one > > execution of `writeYarnPropertiesFIle` method and it sets the parallelism > > in the YARN properties to the number of workers * number of slots per > > worker. But doesn't the `flink-conf.yaml` take the precedence in > resolving > > the configuration ? I am trying to understand the reasoning between > always > > setting the Yarn properties to the max available slots and whether this > > will be used at all, since there is a default value in flink config for > > paralellism. > > > > Thanks in advance, > > Best Regards, > > Dom. > > > > > -- > Best Regards > > Jeff Zhang > |
Free forum by Nabble | Edit this page |