Use of loading flink-conf.yaml in Flink-sql client

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Use of loading flink-conf.yaml in Flink-sql client

Dipanjan Mazumder
Hi,
              I was going through the Flink Sql client code and came through a flow where we are loading  flink-conf.yaml in the configuration object as prerequisite for the SQL client to start. I can see that the configuration file has properties pertaining to the Flink cluster. As far as my understanding for the use of SQL client it only requires the JobManager host and port information to connect which this configuration file has. The configuration file also has other properties which is confusing me a bit , the properties are as below:
# The heap size for the JobManager JVM  jobmanager.heap.size: 1024m  # The heap size for the TaskManager JVM  taskmanager.heap.size: 1024m  # The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline.  taskmanager.numberOfTaskSlots: 1  # The parallelism used for programs that did not specify and other parallelism.  parallelism.default: 1

Are the the above properties even required by the SQL client because as far as my understanding the cluster is already there up and running , so it is possible that it will already have allocated resources , or is it that this is the configuration that is custom for the current session and the Jobmanager will allocate resources for the current request based on this config file. 
   
   - "jobmanager.heap.size: 1024m": 
   
   - The Flink standalone cluster is already running and so is the Job manager also already running with predefined heap size .what will this parameter help when loaded through the sqlClient. SQL client is submitting configs on a running Flink cluster.
   
   - "taskmanager.heap.size: 1024m" : 
   
   - The Flink standalone cluster is already running and so is the Taskmanager is also already running with predefined heap size .what will this parameter help when loaded through the sqlClient. SQL client is submitting configs on a running Flink cluster.
   
   - "parallelism.default: 1" :
   
   - This can be used for the current session , any application submitted as a part of current session if no parallelism is defined it will default to 1.
   
   - "taskmanager.numberOfTaskSlots: 1 ":
   
   - Did not investigate this property and not sure if taskmanager slots can be specified per application or not.

Can i get some help understanding this part. Because i am trying to extend the SQL client to create an API based client for my platform requirement.Awaiting for response.
RegardsDipanjan
Reply | Threaded
Open this post in threaded view
|

Re: Use of loading flink-conf.yaml in Flink-sql client

Dipanjan Mazumder
 Hi,
   i was reading through the Flink docs, and i have got to an understanding that each application will have its own instance of Jobamanager and TaskManagers and so every application will have to have a initial configuration for defining the application topology to be drawn in the flink cluster, so every application will have a separate flink-conf.yaml  , which will specifically define the Flink topology for that application.
Am i correct in my understanding. Please kindly confirm on the same.
Waiting for the response.
RegardsDipanjan
    On Saturday, September 14, 2019, 02:03:32 PM GMT+5:30, Dipanjan Mazumder <[hidden email]> wrote:  
 
 Hi,
              I was going through the Flink Sql client code and came through a flow where we are loading  flink-conf.yaml in the configuration object as prerequisite for the SQL client to start. I can see that the configuration file has properties pertaining to the Flink cluster. As far as my understanding for the use of SQL client it only requires the JobManager host and port information to connect which this configuration file has. The configuration file also has other properties which is confusing me a bit , the properties are as below:
# The heap size for the JobManager JVM  jobmanager.heap.size: 1024m  # The heap size for the TaskManager JVM  taskmanager.heap.size: 1024m  # The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline.  taskmanager.numberOfTaskSlots: 1  # The parallelism used for programs that did not specify and other parallelism.  parallelism.default: 1

Are the the above properties even required by the SQL client because as far as my understanding the cluster is already there up and running , so it is possible that it will already have allocated resources , or is it that this is the configuration that is custom for the current session and the Jobmanager will allocate resources for the current request based on this config file. 
   
   - "jobmanager.heap.size: 1024m": 
   
   - The Flink standalone cluster is already running and so is the Job manager also already running with predefined heap size .what will this parameter help when loaded through the sqlClient. SQL client is submitting configs on a running Flink cluster.
   
   - "taskmanager.heap.size: 1024m" : 
   
   - The Flink standalone cluster is already running and so is the Taskmanager is also already running with predefined heap size .what will this parameter help when loaded through the sqlClient. SQL client is submitting configs on a running Flink cluster.
   
   - "parallelism.default: 1" :
   
   - This can be used for the current session , any application submitted as a part of current session if no parallelism is defined it will default to 1.
   
   - "taskmanager.numberOfTaskSlots: 1 ":
   
   - Did not investigate this property and not sure if taskmanager slots can be specified per application or not.

Can i get some help understanding this part. Because i am trying to extend the SQL client to create an API based client for my platform requirement.Awaiting for response.
RegardsDipanjan  
Reply | Threaded
Open this post in threaded view
|

Re: Use of loading flink-conf.yaml in Flink-sql client

Till Rohrmann
Hi Dipanjan,

not every configuration options in the flink-conf.yaml are relevant for the
SQL client. If you submit to an already existing cluster, then you only
need to learn about the address and the port or if it is using high
availability where ZooKeeper is running. However, in the general case, the
Flink SQL client can also deploy a new per-job mode cluster just for your
job. In order to do this, it needs to know cluster specific configurations
such as the memory or the number of slots.

The flink-conf.yaml does not contain any information about the executed
topology. This information is contained in the JobGraph which is submitted
by the client to a cluster.

Cheers,
Till

On Sun, Sep 15, 2019 at 9:37 AM Dipanjan Mazumder <[hidden email]> wrote:

> Hi,
>
>    i was reading through the Flink docs, and i have got to an
> understanding that each application will have its own instance of
> Jobamanager and TaskManagers and so every application will have to have a
> initial configuration for defining the application topology to be drawn in
> the flink cluster, so every application will have a separate flink-conf.yaml
> , which will specifically define the Flink topology for that application.
>
> Am i correct in my understanding. Please kindly confirm on the same.
>
> Waiting for the response.
>
> Regards
> Dipanjan
>
> On Saturday, September 14, 2019, 02:03:32 PM GMT+5:30, Dipanjan Mazumder <
> [hidden email]> wrote:
>
>
> Hi,
>
>               I was going through the Flink Sql client code and came
> through a flow where we are loading  flink-conf.yaml in the configuration
> object as prerequisite for the SQL client to start. I can see that the
> configuration file has properties pertaining to the Flink cluster. As far
> as my understanding for the use of SQL client it only requires the
> JobManager host and port information to connect which this configuration
> file has. The configuration file also has other properties which is
> confusing me a bit , the properties are as below:
>
> *# The heap size for the JobManager JVM *
> * jobmanager.heap.size: 1024m *
> * # The heap size for the TaskManager JVM *
> * taskmanager.heap.size: 1024m *
> * # The number of task slots that each TaskManager offers. Each slot runs
> one parallel pipeline. *
> * taskmanager.numberOfTaskSlots: 1 *
> * # The parallelism used for programs that did not specify and other
> parallelism. *
> * parallelism.default: 1*
>
> Are the the above properties even required by the SQL client because as
> far as my understanding the cluster is already there up and running , so it
> is possible that it will already have allocated resources , or is it that
> this is the configuration that is custom for the current session and the
> Jobmanager will allocate resources for the current request based on this
> config file.
>
>
>    - "jobmanager.heap.size: 1024m":
>       - The Flink standalone cluster is already running and so is the Job
>       manager also already running with predefined heap size .what will this
>       parameter help when loaded through the sqlClient. SQL client is submitting
>       configs on a running Flink cluster.
>
>
>    - "taskmanager.heap.size: 1024m" :
>       - The Flink standalone cluster is already running and so is the
>       Taskmanager is also already running with predefined heap size .what will
>       this parameter help when loaded through the sqlClient. SQL client is
>       submitting configs on a running Flink cluster.
>
>
>    - "parallelism.default: 1" :
>       - This can be used for the current session , any application
>       submitted as a part of current session if no parallelism is defined it will
>       default to 1.
>
>
>    - "taskmanager.numberOfTaskSlots: 1 ":
>       - Did not investigate this property and not sure if taskmanager
>       slots can be specified per application or not.
>
>
> Can i get some help understanding this part. Because i am trying to extend
> the SQL client to create an API based client for my platform
> requirement.Awaiting for response.
>
> Regards
> Dipanjan
>