Hi everyone,
in Ververica Platform we offer a feature to use environment variables in the Flink configuration¹, e.g. ``` s3.access-key: ${S3_ACCESS_KEY} ``` We've been discussing internally whether contributing such a feature to Flink directly would make sense and wanted to start a discussion on this topic. An alternative way to do so from the above would be parsing those directly based on their name, so instead of having it defined in the Flink configuration as above, it would get automatically set if something like $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is somewhat similar to what e.g. Spring does, and faces similar challenges (dealing with "."s etc.) Although I view both of these approaches as mostly orthogonal, supporting both very likely wouldn't make sense, of course. So I was wondering what your opinion is in terms of whether the project would benefit from environment variable support for the Flink configuration, and whether there are tendencies as to which approach to go with. ¹ https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables Best regards Ingo |
Hi Ingo,
Thanks a lot for this proposal! We had a related discussion recently in the context of FLINK-19520 (randomizing tests configuration) [1]. I believe other scenarios will benefit as well. For the end users, I think substitution in configuration files is preferable over parsing env vars in Flink code. And for cases without such a file, we could have a default one on the classpath with all substitutions defined (and then merge everything from the user-supplied file). [1] https://issues.apache.org/jira/browse/FLINK-19520 Regards, Roman On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk <[hidden email]> wrote: > Hi everyone, > > in Ververica Platform we offer a feature to use environment variables in > the Flink configuration¹, e.g. > > ``` > s3.access-key: ${S3_ACCESS_KEY} > ``` > > We've been discussing internally whether contributing such a feature to > Flink directly would make sense and wanted to start a discussion on this > topic. > > An alternative way to do so from the above would be parsing those directly > based on their name, so instead of having it defined in the Flink > configuration as above, it would get automatically set if something like > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is somewhat > similar to what e.g. Spring does, and faces similar challenges (dealing > with "."s etc.) > > Although I view both of these approaches as mostly orthogonal, supporting > both very likely wouldn't make sense, of course. So I was wondering what > your opinion is in terms of whether the project would benefit from > environment variable support for the Flink configuration, and whether there > are tendencies as to which approach to go with. > > ¹ > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > Best regards > Ingo > |
Thanks for kicking off the discussion.
I think supporting environment variables rendering in the Flink configuration yaml file is a good idea. Especially for the Kubernetes environment since we are using the secret resource to store the authentication information. But I have some questions for how to do it? 1. The environments in Flink configuration yaml will be replaced in client, JobManager, TaskManager or all of them? 2. If users do not want some config options to be replaced, how to achieve that? Best, Yang Khachatryan Roman <[hidden email]> 于2021年1月18日周一 下午8:55写道: > Hi Ingo, > > Thanks a lot for this proposal! > > We had a related discussion recently in the context of FLINK-19520 > (randomizing tests configuration) [1]. > I believe other scenarios will benefit as well. > > For the end users, I think substitution in configuration files is > preferable over parsing env vars in Flink code. > And for cases without such a file, we could have a default one on the > classpath with all substitutions defined (and then merge everything from > the user-supplied file). > > [1] https://issues.apache.org/jira/browse/FLINK-19520 > > Regards, > Roman > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk <[hidden email]> wrote: > > > Hi everyone, > > > > in Ververica Platform we offer a feature to use environment variables in > > the Flink configuration¹, e.g. > > > > ``` > > s3.access-key: ${S3_ACCESS_KEY} > > ``` > > > > We've been discussing internally whether contributing such a feature to > > Flink directly would make sense and wanted to start a discussion on this > > topic. > > > > An alternative way to do so from the above would be parsing those > directly > > based on their name, so instead of having it defined in the Flink > > configuration as above, it would get automatically set if something like > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is somewhat > > similar to what e.g. Spring does, and faces similar challenges (dealing > > with "."s etc.) > > > > Although I view both of these approaches as mostly orthogonal, supporting > > both very likely wouldn't make sense, of course. So I was wondering what > > your opinion is in terms of whether the project would benefit from > > environment variable support for the Flink configuration, and whether > there > > are tendencies as to which approach to go with. > > > > ¹ > > > > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > > > Best regards > > Ingo > > > |
Hi Yang,
thanks for your questions! I'm glad to see this feature is being received positively. ad 1) We don't distinguish JM/TM, and I can't think of a good reason why a user would want to do so. I'm not very experienced with Flink, however, so please excuse me if I'm overlooking some obvious reason here. :-) ad 2) Admittedly I don't have a good overview on all the configuration options that exist, but from those that I do know I can't imagine someone wanting to pass a value like "${MY_VAR}" verbatim. In Ververica Platform as of now we ignore this problem. If, however, this needs to be addressed, a possible solution could be to allow escaping syntax such as "\${MY_VAR}". Another point to consider here is when exactly the substitution takes place: on the "raw" file, or on the parsed key / value separately, and if so, should it support both key and value? My current thinking is that substituting only the value of the parsed entry should be sufficient. Regards Ingo On Mon, Jan 18, 2021 at 3:48 PM Yang Wang <[hidden email]> wrote: > Thanks for kicking off the discussion. > > I think supporting environment variables rendering in the Flink > configuration yaml file is a good idea. Especially for > the Kubernetes environment since we are using the secret resource to store > the authentication information. > > But I have some questions for how to do it? > 1. The environments in Flink configuration yaml will be replaced in client, > JobManager, TaskManager or all of them? > 2. If users do not want some config options to be replaced, how to > achieve that? > > Best, > Yang > > Khachatryan Roman <[hidden email]> 于2021年1月18日周一 下午8:55写道: > > > Hi Ingo, > > > > Thanks a lot for this proposal! > > > > We had a related discussion recently in the context of FLINK-19520 > > (randomizing tests configuration) [1]. > > I believe other scenarios will benefit as well. > > > > For the end users, I think substitution in configuration files is > > preferable over parsing env vars in Flink code. > > And for cases without such a file, we could have a default one on the > > classpath with all substitutions defined (and then merge everything from > > the user-supplied file). > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 > > > > Regards, > > Roman > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk <[hidden email]> wrote: > > > > > Hi everyone, > > > > > > in Ververica Platform we offer a feature to use environment variables > in > > > the Flink configuration¹, e.g. > > > > > > ``` > > > s3.access-key: ${S3_ACCESS_KEY} > > > ``` > > > > > > We've been discussing internally whether contributing such a feature to > > > Flink directly would make sense and wanted to start a discussion on > this > > > topic. > > > > > > An alternative way to do so from the above would be parsing those > > directly > > > based on their name, so instead of having it defined in the Flink > > > configuration as above, it would get automatically set if something > like > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is > somewhat > > > similar to what e.g. Spring does, and faces similar challenges (dealing > > > with "."s etc.) > > > > > > Although I view both of these approaches as mostly orthogonal, > supporting > > > both very likely wouldn't make sense, of course. So I was wondering > what > > > your opinion is in terms of whether the project would benefit from > > > environment variable support for the Flink configuration, and whether > > there > > > are tendencies as to which approach to go with. > > > > > > ¹ > > > > > > > > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > > > > > Best regards > > > Ingo > > > > > > |
Variable substitution (proposed here) is definitely useful.
For us, hierarchical override is more useful. E.g., we may have the default value of "state.checkpoints.dir=path1" defined in flink-conf.yaml. But maybe we want to override it to "state.checkpoints.dir=path2" via environment variable in some scenarios. Otherwise, we have to define a corresponding shell variable (like STATE_CHECKPOINTS_DIR) for the Flink config, which is annoying. As Ingo pointed, it is also annoying to handle Java property key naming convention (dots separated), as dots aren't allowed in shell env var naming (All caps, separated with underscore). Shell will complain. We have to bundle all env var overrides (k-v pairs) in a single property value (JSON and base64 encode) to avoid it. On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <[hidden email]> wrote: > Hi Yang, > > thanks for your questions! I'm glad to see this feature is being received > positively. > > ad 1) We don't distinguish JM/TM, and I can't think of a good reason why a > user would want to do so. I'm not very experienced with Flink, however, so > please excuse me if I'm overlooking some obvious reason here. :-) > ad 2) Admittedly I don't have a good overview on all the configuration > options that exist, but from those that I do know I can't imagine someone > wanting to pass a value like "${MY_VAR}" verbatim. In Ververica Platform as > of now we ignore this problem. If, however, this needs to be addressed, a > possible solution could be to allow escaping syntax such as "\${MY_VAR}". > > Another point to consider here is when exactly the substitution takes > place: on the "raw" file, or on the parsed key / value separately, and if > so, should it support both key and value? My current thinking is that > substituting only the value of the parsed entry should be sufficient. > > > Regards > Ingo > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang <[hidden email]> wrote: > > > Thanks for kicking off the discussion. > > > > I think supporting environment variables rendering in the Flink > > configuration yaml file is a good idea. Especially for > > the Kubernetes environment since we are using the secret resource to > store > > the authentication information. > > > > But I have some questions for how to do it? > > 1. The environments in Flink configuration yaml will be replaced in > client, > > JobManager, TaskManager or all of them? > > 2. If users do not want some config options to be replaced, how to > > achieve that? > > > > Best, > > Yang > > > > Khachatryan Roman <[hidden email]> 于2021年1月18日周一 下午8:55写道: > > > > > Hi Ingo, > > > > > > Thanks a lot for this proposal! > > > > > > We had a related discussion recently in the context of FLINK-19520 > > > (randomizing tests configuration) [1]. > > > I believe other scenarios will benefit as well. > > > > > > For the end users, I think substitution in configuration files is > > > preferable over parsing env vars in Flink code. > > > And for cases without such a file, we could have a default one on the > > > classpath with all substitutions defined (and then merge everything > from > > > the user-supplied file). > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 > > > > > > Regards, > > > Roman > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk <[hidden email]> wrote: > > > > > > > Hi everyone, > > > > > > > > in Ververica Platform we offer a feature to use environment variables > > in > > > > the Flink configuration¹, e.g. > > > > > > > > ``` > > > > s3.access-key: ${S3_ACCESS_KEY} > > > > ``` > > > > > > > > We've been discussing internally whether contributing such a feature > to > > > > Flink directly would make sense and wanted to start a discussion on > > this > > > > topic. > > > > > > > > An alternative way to do so from the above would be parsing those > > > directly > > > > based on their name, so instead of having it defined in the Flink > > > > configuration as above, it would get automatically set if something > > like > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is > > somewhat > > > > similar to what e.g. Spring does, and faces similar challenges > (dealing > > > > with "."s etc.) > > > > > > > > Although I view both of these approaches as mostly orthogonal, > > supporting > > > > both very likely wouldn't make sense, of course. So I was wondering > > what > > > > your opinion is in terms of whether the project would benefit from > > > > environment variable support for the Flink configuration, and whether > > > there > > > > are tendencies as to which approach to go with. > > > > > > > > ¹ > > > > > > > > > > > > > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > > > > > > > Best regards > > > > Ingo > > > > > > > > > > |
Hi Ingo,
Thanks for your response. 1. Not distinguishing JM/TM is reasonable, but what about the client side. For Yarn/K8s deployment, the local flink-conf.yaml will be shipped to JM/TM. So I am just confused about where should the environment variables be replaced? IIUC, it is not an issue for Ververica Platform since it is always done in the JM/TM side. 2. I believe we should support not do the substitution for specific key. A typical use case is "env.java.opts". If the value contains environment variables, they are expected to be replaced exactly when the java command is executed, not after the java process is started. Maybe escaping with single quote is enough. 3. The substitution only takes effects on the value makes sense to me. Best, Yang Steven Wu <[hidden email]> 于2021年1月19日周二 上午12:36写道: > Variable substitution (proposed here) is definitely useful. > > For us, hierarchical override is more useful. E.g., we may have the > default value of "state.checkpoints.dir=path1" defined in flink-conf.yaml. > But maybe we want to override it to "state.checkpoints.dir=path2" via > environment variable in some scenarios. Otherwise, we have to define a > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for the Flink > config, which is annoying. > > As Ingo pointed, it is also annoying to handle Java property key naming > convention (dots separated), as dots aren't allowed in shell env var naming > (All caps, separated with underscore). Shell will complain. We have to > bundle all env var overrides (k-v pairs) in a single property value (JSON > and base64 encode) to avoid it. > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <[hidden email]> wrote: > > > Hi Yang, > > > > thanks for your questions! I'm glad to see this feature is being received > > positively. > > > > ad 1) We don't distinguish JM/TM, and I can't think of a good reason why > a > > user would want to do so. I'm not very experienced with Flink, however, > so > > please excuse me if I'm overlooking some obvious reason here. :-) > > ad 2) Admittedly I don't have a good overview on all the configuration > > options that exist, but from those that I do know I can't imagine someone > > wanting to pass a value like "${MY_VAR}" verbatim. In Ververica Platform > as > > of now we ignore this problem. If, however, this needs to be addressed, a > > possible solution could be to allow escaping syntax such as "\${MY_VAR}". > > > > Another point to consider here is when exactly the substitution takes > > place: on the "raw" file, or on the parsed key / value separately, and if > > so, should it support both key and value? My current thinking is that > > substituting only the value of the parsed entry should be sufficient. > > > > > > Regards > > Ingo > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang <[hidden email]> wrote: > > > > > Thanks for kicking off the discussion. > > > > > > I think supporting environment variables rendering in the Flink > > > configuration yaml file is a good idea. Especially for > > > the Kubernetes environment since we are using the secret resource to > > store > > > the authentication information. > > > > > > But I have some questions for how to do it? > > > 1. The environments in Flink configuration yaml will be replaced in > > client, > > > JobManager, TaskManager or all of them? > > > 2. If users do not want some config options to be replaced, how to > > > achieve that? > > > > > > Best, > > > Yang > > > > > > Khachatryan Roman <[hidden email]> 于2021年1月18日周一 > 下午8:55写道: > > > > > > > Hi Ingo, > > > > > > > > Thanks a lot for this proposal! > > > > > > > > We had a related discussion recently in the context of FLINK-19520 > > > > (randomizing tests configuration) [1]. > > > > I believe other scenarios will benefit as well. > > > > > > > > For the end users, I think substitution in configuration files is > > > > preferable over parsing env vars in Flink code. > > > > And for cases without such a file, we could have a default one on the > > > > classpath with all substitutions defined (and then merge everything > > from > > > > the user-supplied file). > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 > > > > > > > > Regards, > > > > Roman > > > > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk <[hidden email]> > wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > in Ververica Platform we offer a feature to use environment > variables > > > in > > > > > the Flink configuration¹, e.g. > > > > > > > > > > ``` > > > > > s3.access-key: ${S3_ACCESS_KEY} > > > > > ``` > > > > > > > > > > We've been discussing internally whether contributing such a > feature > > to > > > > > Flink directly would make sense and wanted to start a discussion on > > > this > > > > > topic. > > > > > > > > > > An alternative way to do so from the above would be parsing those > > > > directly > > > > > based on their name, so instead of having it defined in the Flink > > > > > configuration as above, it would get automatically set if something > > > like > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is > > > somewhat > > > > > similar to what e.g. Spring does, and faces similar challenges > > (dealing > > > > > with "."s etc.) > > > > > > > > > > Although I view both of these approaches as mostly orthogonal, > > > supporting > > > > > both very likely wouldn't make sense, of course. So I was wondering > > > what > > > > > your opinion is in terms of whether the project would benefit from > > > > > environment variable support for the Flink configuration, and > whether > > > > there > > > > > are tendencies as to which approach to go with. > > > > > > > > > > ¹ > > > > > > > > > > > > > > > > > > > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > > > > > > > > > Best regards > > > > > Ingo > > > > > > > > > > > > > > > |
In reply to this post by Steven Wu
Hi Steven,
regarding the hierarchical override, we could even expand the substitution solution to support shell syntax with default values like state.checkpoints.dir: ${CHECKPOINTS_DIR:-path1} such that if the environment variable doesn't exist, path1 will be used. Regards Ingo On Mon, Jan 18, 2021 at 5:36 PM Steven Wu <[hidden email]> wrote: > Variable substitution (proposed here) is definitely useful. > > For us, hierarchical override is more useful. E.g., we may have the > default value of "state.checkpoints.dir=path1" defined in flink-conf.yaml. > But maybe we want to override it to "state.checkpoints.dir=path2" via > environment variable in some scenarios. Otherwise, we have to define a > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for the Flink > config, which is annoying. > > As Ingo pointed, it is also annoying to handle Java property key naming > convention (dots separated), as dots aren't allowed in shell env var naming > (All caps, separated with underscore). Shell will complain. We have to > bundle all env var overrides (k-v pairs) in a single property value (JSON > and base64 encode) to avoid it. > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <[hidden email]> wrote: > > > Hi Yang, > > > > thanks for your questions! I'm glad to see this feature is being received > > positively. > > > > ad 1) We don't distinguish JM/TM, and I can't think of a good reason why > a > > user would want to do so. I'm not very experienced with Flink, however, > so > > please excuse me if I'm overlooking some obvious reason here. :-) > > ad 2) Admittedly I don't have a good overview on all the configuration > > options that exist, but from those that I do know I can't imagine someone > > wanting to pass a value like "${MY_VAR}" verbatim. In Ververica Platform > as > > of now we ignore this problem. If, however, this needs to be addressed, a > > possible solution could be to allow escaping syntax such as "\${MY_VAR}". > > > > Another point to consider here is when exactly the substitution takes > > place: on the "raw" file, or on the parsed key / value separately, and if > > so, should it support both key and value? My current thinking is that > > substituting only the value of the parsed entry should be sufficient. > > > > > > Regards > > Ingo > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang <[hidden email]> wrote: > > > > > Thanks for kicking off the discussion. > > > > > > I think supporting environment variables rendering in the Flink > > > configuration yaml file is a good idea. Especially for > > > the Kubernetes environment since we are using the secret resource to > > store > > > the authentication information. > > > > > > But I have some questions for how to do it? > > > 1. The environments in Flink configuration yaml will be replaced in > > client, > > > JobManager, TaskManager or all of them? > > > 2. If users do not want some config options to be replaced, how to > > > achieve that? > > > > > > Best, > > > Yang > > > > > > Khachatryan Roman <[hidden email]> 于2021年1月18日周一 > 下午8:55写道: > > > > > > > Hi Ingo, > > > > > > > > Thanks a lot for this proposal! > > > > > > > > We had a related discussion recently in the context of FLINK-19520 > > > > (randomizing tests configuration) [1]. > > > > I believe other scenarios will benefit as well. > > > > > > > > For the end users, I think substitution in configuration files is > > > > preferable over parsing env vars in Flink code. > > > > And for cases without such a file, we could have a default one on the > > > > classpath with all substitutions defined (and then merge everything > > from > > > > the user-supplied file). > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 > > > > > > > > Regards, > > > > Roman > > > > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk <[hidden email]> > wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > in Ververica Platform we offer a feature to use environment > variables > > > in > > > > > the Flink configuration¹, e.g. > > > > > > > > > > ``` > > > > > s3.access-key: ${S3_ACCESS_KEY} > > > > > ``` > > > > > > > > > > We've been discussing internally whether contributing such a > feature > > to > > > > > Flink directly would make sense and wanted to start a discussion on > > > this > > > > > topic. > > > > > > > > > > An alternative way to do so from the above would be parsing those > > > > directly > > > > > based on their name, so instead of having it defined in the Flink > > > > > configuration as above, it would get automatically set if something > > > like > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is > > > somewhat > > > > > similar to what e.g. Spring does, and faces similar challenges > > (dealing > > > > > with "."s etc.) > > > > > > > > > > Although I view both of these approaches as mostly orthogonal, > > > supporting > > > > > both very likely wouldn't make sense, of course. So I was wondering > > > what > > > > > your opinion is in terms of whether the project would benefit from > > > > > environment variable support for the Flink configuration, and > whether > > > > there > > > > > are tendencies as to which approach to go with. > > > > > > > > > > ¹ > > > > > > > > > > > > > > > > > > > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > > > > > > > > > Best regards > > > > > Ingo > > > > > > > > > > > > > > > |
In reply to this post by Yang Wang
Hi Yang,
1. As you said I think this doesn't affect Ververica Platform, really, so I'm more than happy to hear and follow the thoughts of people more experienced with Flink than me. 2. I wasn't aware of env.java.opts, but that's definitely a candidate where a user may want to "escape" it so it doesn't get substituted immediately, I agree. Regards Ingo On Tue, Jan 19, 2021 at 4:47 AM Yang Wang <[hidden email]> wrote: > Hi Ingo, > > Thanks for your response. > > 1. Not distinguishing JM/TM is reasonable, but what about the client side. > For Yarn/K8s deployment, > the local flink-conf.yaml will be shipped to JM/TM. So I am just confused > about where should the environment > variables be replaced? IIUC, it is not an issue for Ververica Platform > since it is always done in the JM/TM side. > > 2. I believe we should support not do the substitution for specific key. A > typical use case is "env.java.opts". If the > value contains environment variables, they are expected to be replaced > exactly when the java command is executed, > not after the java process is started. Maybe escaping with single quote is > enough. > > 3. The substitution only takes effects on the value makes sense to me. > > > Best, > Yang > > Steven Wu <[hidden email]> 于2021年1月19日周二 上午12:36写道: > > > Variable substitution (proposed here) is definitely useful. > > > > For us, hierarchical override is more useful. E.g., we may have the > > default value of "state.checkpoints.dir=path1" defined in > flink-conf.yaml. > > But maybe we want to override it to "state.checkpoints.dir=path2" via > > environment variable in some scenarios. Otherwise, we have to define a > > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for the Flink > > config, which is annoying. > > > > As Ingo pointed, it is also annoying to handle Java property key naming > > convention (dots separated), as dots aren't allowed in shell env var > naming > > (All caps, separated with underscore). Shell will complain. We have to > > bundle all env var overrides (k-v pairs) in a single property value (JSON > > and base64 encode) to avoid it. > > > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <[hidden email]> wrote: > > > > > Hi Yang, > > > > > > thanks for your questions! I'm glad to see this feature is being > received > > > positively. > > > > > > ad 1) We don't distinguish JM/TM, and I can't think of a good reason > why > > a > > > user would want to do so. I'm not very experienced with Flink, however, > > so > > > please excuse me if I'm overlooking some obvious reason here. :-) > > > ad 2) Admittedly I don't have a good overview on all the configuration > > > options that exist, but from those that I do know I can't imagine > someone > > > wanting to pass a value like "${MY_VAR}" verbatim. In Ververica > Platform > > as > > > of now we ignore this problem. If, however, this needs to be > addressed, a > > > possible solution could be to allow escaping syntax such as > "\${MY_VAR}". > > > > > > Another point to consider here is when exactly the substitution takes > > > place: on the "raw" file, or on the parsed key / value separately, and > if > > > so, should it support both key and value? My current thinking is that > > > substituting only the value of the parsed entry should be sufficient. > > > > > > > > > Regards > > > Ingo > > > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang <[hidden email]> > wrote: > > > > > > > Thanks for kicking off the discussion. > > > > > > > > I think supporting environment variables rendering in the Flink > > > > configuration yaml file is a good idea. Especially for > > > > the Kubernetes environment since we are using the secret resource to > > > store > > > > the authentication information. > > > > > > > > But I have some questions for how to do it? > > > > 1. The environments in Flink configuration yaml will be replaced in > > > client, > > > > JobManager, TaskManager or all of them? > > > > 2. If users do not want some config options to be replaced, how to > > > > achieve that? > > > > > > > > Best, > > > > Yang > > > > > > > > Khachatryan Roman <[hidden email]> 于2021年1月18日周一 > > 下午8:55写道: > > > > > > > > > Hi Ingo, > > > > > > > > > > Thanks a lot for this proposal! > > > > > > > > > > We had a related discussion recently in the context of FLINK-19520 > > > > > (randomizing tests configuration) [1]. > > > > > I believe other scenarios will benefit as well. > > > > > > > > > > For the end users, I think substitution in configuration files is > > > > > preferable over parsing env vars in Flink code. > > > > > And for cases without such a file, we could have a default one on > the > > > > > classpath with all substitutions defined (and then merge everything > > > from > > > > > the user-supplied file). > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 > > > > > > > > > > Regards, > > > > > Roman > > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk <[hidden email]> > > wrote: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > in Ververica Platform we offer a feature to use environment > > variables > > > > in > > > > > > the Flink configuration¹, e.g. > > > > > > > > > > > > ``` > > > > > > s3.access-key: ${S3_ACCESS_KEY} > > > > > > ``` > > > > > > > > > > > > We've been discussing internally whether contributing such a > > feature > > > to > > > > > > Flink directly would make sense and wanted to start a discussion > on > > > > this > > > > > > topic. > > > > > > > > > > > > An alternative way to do so from the above would be parsing those > > > > > directly > > > > > > based on their name, so instead of having it defined in the Flink > > > > > > configuration as above, it would get automatically set if > something > > > > like > > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is > > > > somewhat > > > > > > similar to what e.g. Spring does, and faces similar challenges > > > (dealing > > > > > > with "."s etc.) > > > > > > > > > > > > Although I view both of these approaches as mostly orthogonal, > > > > supporting > > > > > > both very likely wouldn't make sense, of course. So I was > wondering > > > > what > > > > > > your opinion is in terms of whether the project would benefit > from > > > > > > environment variable support for the Flink configuration, and > > whether > > > > > there > > > > > > are tendencies as to which approach to go with. > > > > > > > > > > > > ¹ > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > > > > > > > > > > > Best regards > > > > > > Ingo > > > > > > > > > > > > > > > > > > > > > |
Hi everyone,
Thanks for starting this discussion Ingo. I think being able to use env variables to change Flink's configuration will be a very useful feature. Concerning the two approaches I would be in favour of the second approach ($FLINK_CONFIG_S3_ACCESS_KEY) because it does not require the user to prepare a special flink-conf.yaml where he inserts env variables for every config value he wants to configure. Since this is not required with the second approach, I think it is more general and easier to use. Also, the user does not have to remember a second set of names (env names) which he has to/can set. For how to substitute the values, I think it should happen when we load the Flink configuration. First we read the file and then overwrite values specified via an env variable or dynamic properties in some defined order. For env.java.opts and other options which are used for starting the JVM we might need special handling in the bash scripts. Cheers, Till On Tue, Jan 19, 2021 at 9:46 AM Ingo Bürk <[hidden email]> wrote: > Hi Yang, > > 1. As you said I think this doesn't affect Ververica Platform, really, so > I'm more than happy to hear and follow the thoughts of people more > experienced with Flink than me. > 2. I wasn't aware of env.java.opts, but that's definitely a candidate where > a user may want to "escape" it so it doesn't get substituted immediately, I > agree. > > > Regards > Ingo > > On Tue, Jan 19, 2021 at 4:47 AM Yang Wang <[hidden email]> wrote: > > > Hi Ingo, > > > > Thanks for your response. > > > > 1. Not distinguishing JM/TM is reasonable, but what about the client > side. > > For Yarn/K8s deployment, > > the local flink-conf.yaml will be shipped to JM/TM. So I am just confused > > about where should the environment > > variables be replaced? IIUC, it is not an issue for Ververica Platform > > since it is always done in the JM/TM side. > > > > 2. I believe we should support not do the substitution for specific key. > A > > typical use case is "env.java.opts". If the > > value contains environment variables, they are expected to be replaced > > exactly when the java command is executed, > > not after the java process is started. Maybe escaping with single quote > is > > enough. > > > > 3. The substitution only takes effects on the value makes sense to me. > > > > > > Best, > > Yang > > > > Steven Wu <[hidden email]> 于2021年1月19日周二 上午12:36写道: > > > > > Variable substitution (proposed here) is definitely useful. > > > > > > For us, hierarchical override is more useful. E.g., we may have the > > > default value of "state.checkpoints.dir=path1" defined in > > flink-conf.yaml. > > > But maybe we want to override it to "state.checkpoints.dir=path2" via > > > environment variable in some scenarios. Otherwise, we have to define a > > > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for the Flink > > > config, which is annoying. > > > > > > As Ingo pointed, it is also annoying to handle Java property key naming > > > convention (dots separated), as dots aren't allowed in shell env var > > naming > > > (All caps, separated with underscore). Shell will complain. We have to > > > bundle all env var overrides (k-v pairs) in a single property value > (JSON > > > and base64 encode) to avoid it. > > > > > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <[hidden email]> wrote: > > > > > > > Hi Yang, > > > > > > > > thanks for your questions! I'm glad to see this feature is being > > received > > > > positively. > > > > > > > > ad 1) We don't distinguish JM/TM, and I can't think of a good reason > > why > > > a > > > > user would want to do so. I'm not very experienced with Flink, > however, > > > so > > > > please excuse me if I'm overlooking some obvious reason here. :-) > > > > ad 2) Admittedly I don't have a good overview on all the > configuration > > > > options that exist, but from those that I do know I can't imagine > > someone > > > > wanting to pass a value like "${MY_VAR}" verbatim. In Ververica > > Platform > > > as > > > > of now we ignore this problem. If, however, this needs to be > > addressed, a > > > > possible solution could be to allow escaping syntax such as > > "\${MY_VAR}". > > > > > > > > Another point to consider here is when exactly the substitution takes > > > > place: on the "raw" file, or on the parsed key / value separately, > and > > if > > > > so, should it support both key and value? My current thinking is that > > > > substituting only the value of the parsed entry should be sufficient. > > > > > > > > > > > > Regards > > > > Ingo > > > > > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang <[hidden email]> > > wrote: > > > > > > > > > Thanks for kicking off the discussion. > > > > > > > > > > I think supporting environment variables rendering in the Flink > > > > > configuration yaml file is a good idea. Especially for > > > > > the Kubernetes environment since we are using the secret resource > to > > > > store > > > > > the authentication information. > > > > > > > > > > But I have some questions for how to do it? > > > > > 1. The environments in Flink configuration yaml will be replaced in > > > > client, > > > > > JobManager, TaskManager or all of them? > > > > > 2. If users do not want some config options to be replaced, how to > > > > > achieve that? > > > > > > > > > > Best, > > > > > Yang > > > > > > > > > > Khachatryan Roman <[hidden email]> 于2021年1月18日周一 > > > 下午8:55写道: > > > > > > > > > > > Hi Ingo, > > > > > > > > > > > > Thanks a lot for this proposal! > > > > > > > > > > > > We had a related discussion recently in the context of > FLINK-19520 > > > > > > (randomizing tests configuration) [1]. > > > > > > I believe other scenarios will benefit as well. > > > > > > > > > > > > For the end users, I think substitution in configuration files is > > > > > > preferable over parsing env vars in Flink code. > > > > > > And for cases without such a file, we could have a default one on > > the > > > > > > classpath with all substitutions defined (and then merge > everything > > > > from > > > > > > the user-supplied file). > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 > > > > > > > > > > > > Regards, > > > > > > Roman > > > > > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk <[hidden email]> > > > wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > in Ververica Platform we offer a feature to use environment > > > variables > > > > > in > > > > > > > the Flink configuration¹, e.g. > > > > > > > > > > > > > > ``` > > > > > > > s3.access-key: ${S3_ACCESS_KEY} > > > > > > > ``` > > > > > > > > > > > > > > We've been discussing internally whether contributing such a > > > feature > > > > to > > > > > > > Flink directly would make sense and wanted to start a > discussion > > on > > > > > this > > > > > > > topic. > > > > > > > > > > > > > > An alternative way to do so from the above would be parsing > those > > > > > > directly > > > > > > > based on their name, so instead of having it defined in the > Flink > > > > > > > configuration as above, it would get automatically set if > > something > > > > > like > > > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is > > > > > somewhat > > > > > > > similar to what e.g. Spring does, and faces similar challenges > > > > (dealing > > > > > > > with "."s etc.) > > > > > > > > > > > > > > Although I view both of these approaches as mostly orthogonal, > > > > > supporting > > > > > > > both very likely wouldn't make sense, of course. So I was > > wondering > > > > > what > > > > > > > your opinion is in terms of whether the project would benefit > > from > > > > > > > environment variable support for the Flink configuration, and > > > whether > > > > > > there > > > > > > > are tendencies as to which approach to go with. > > > > > > > > > > > > > > ¹ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > > > > > > > > > > > > > Best regards > > > > > > > Ingo > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Hey all,
I think that approach 2 is more idiomatic for container deployments where it can be cumbersome to manually map flink-conf.yaml contents to env vars [1]. The precedence order outlined by Till would also cover Steven's hierarchical overwrite requirement. I'm really excited about this feature as it will make Flink deployments a lot more ergonomic. The implementation seems to be not too complicated (which makes we wonder why we didn't tackle this earlier or whether I'm missing something). I'd also be happy to shepherd this contribution if there is consensus on the need for it and the approach. Does it make sense to formalize this decision a bit with a short FLIP? – Ufuk [1] In Ververica Platform, we support approach 1, because the Flink configuration is part of the specification for a single Deployment and it's minimally more convenient to have something like flinkConfiguration: foo: ${BAR} for us. I don't think this approach would feel natural when manually deploying Flink. There would be a clear migration path for our customers, so I'm not concerned about this too much. On Tue, Jan 19, 2021, at 10:01 AM, Till Rohrmann wrote: > Hi everyone, > > Thanks for starting this discussion Ingo. I think being able to use env > variables to change Flink's configuration will be a very useful feature. > > Concerning the two approaches I would be in favour of the second approach > ($FLINK_CONFIG_S3_ACCESS_KEY) because it does not require the user to > prepare a special flink-conf.yaml where he inserts env variables for every > config value he wants to configure. Since this is not required with the > second approach, I think it is more general and easier to use. Also, the > user does not have to remember a second set of names (env names) which he > has to/can set. > > For how to substitute the values, I think it should happen when we load the > Flink configuration. First we read the file and then overwrite values > specified via an env variable or dynamic properties in some defined order. > For env.java.opts and other options which are used for starting the JVM we > might need special handling in the bash scripts. > > Cheers, > Till > > On Tue, Jan 19, 2021 at 9:46 AM Ingo Bürk <[hidden email]> wrote: > > > Hi Yang, > > > > 1. As you said I think this doesn't affect Ververica Platform, really, so > > I'm more than happy to hear and follow the thoughts of people more > > experienced with Flink than me. > > 2. I wasn't aware of env.java.opts, but that's definitely a candidate where > > a user may want to "escape" it so it doesn't get substituted immediately, I > > agree. > > > > > > Regards > > Ingo > > > > On Tue, Jan 19, 2021 at 4:47 AM Yang Wang <[hidden email]> wrote: > > > > > Hi Ingo, > > > > > > Thanks for your response. > > > > > > 1. Not distinguishing JM/TM is reasonable, but what about the client > > side. > > > For Yarn/K8s deployment, > > > the local flink-conf.yaml will be shipped to JM/TM. So I am just confused > > > about where should the environment > > > variables be replaced? IIUC, it is not an issue for Ververica Platform > > > since it is always done in the JM/TM side. > > > > > > 2. I believe we should support not do the substitution for specific key. > > A > > > typical use case is "env.java.opts". If the > > > value contains environment variables, they are expected to be replaced > > > exactly when the java command is executed, > > > not after the java process is started. Maybe escaping with single quote > > is > > > enough. > > > > > > 3. The substitution only takes effects on the value makes sense to me. > > > > > > > > > Best, > > > Yang > > > > > > Steven Wu <[hidden email]> 于2021年1月19日周二 上午12:36写道: > > > > > > > Variable substitution (proposed here) is definitely useful. > > > > > > > > For us, hierarchical override is more useful. E.g., we may have the > > > > default value of "state.checkpoints.dir=path1" defined in > > > flink-conf.yaml. > > > > But maybe we want to override it to "state.checkpoints.dir=path2" via > > > > environment variable in some scenarios. Otherwise, we have to define a > > > > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for the Flink > > > > config, which is annoying. > > > > > > > > As Ingo pointed, it is also annoying to handle Java property key naming > > > > convention (dots separated), as dots aren't allowed in shell env var > > > naming > > > > (All caps, separated with underscore). Shell will complain. We have to > > > > bundle all env var overrides (k-v pairs) in a single property value > > (JSON > > > > and base64 encode) to avoid it. > > > > > > > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <[hidden email]> wrote: > > > > > > > > > Hi Yang, > > > > > > > > > > thanks for your questions! I'm glad to see this feature is being > > > received > > > > > positively. > > > > > > > > > > ad 1) We don't distinguish JM/TM, and I can't think of a good reason > > > why > > > > a > > > > > user would want to do so. I'm not very experienced with Flink, > > however, > > > > so > > > > > please excuse me if I'm overlooking some obvious reason here. :-) > > > > > ad 2) Admittedly I don't have a good overview on all the > > configuration > > > > > options that exist, but from those that I do know I can't imagine > > > someone > > > > > wanting to pass a value like "${MY_VAR}" verbatim. In Ververica > > > Platform > > > > as > > > > > of now we ignore this problem. If, however, this needs to be > > > addressed, a > > > > > possible solution could be to allow escaping syntax such as > > > "\${MY_VAR}". > > > > > > > > > > Another point to consider here is when exactly the substitution takes > > > > > place: on the "raw" file, or on the parsed key / value separately, > > and > > > if > > > > > so, should it support both key and value? My current thinking is that > > > > > substituting only the value of the parsed entry should be sufficient. > > > > > > > > > > > > > > > Regards > > > > > Ingo > > > > > > > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang <[hidden email]> > > > wrote: > > > > > > > > > > > Thanks for kicking off the discussion. > > > > > > > > > > > > I think supporting environment variables rendering in the Flink > > > > > > configuration yaml file is a good idea. Especially for > > > > > > the Kubernetes environment since we are using the secret resource > > to > > > > > store > > > > > > the authentication information. > > > > > > > > > > > > But I have some questions for how to do it? > > > > > > 1. The environments in Flink configuration yaml will be replaced in > > > > > client, > > > > > > JobManager, TaskManager or all of them? > > > > > > 2. If users do not want some config options to be replaced, how to > > > > > > achieve that? > > > > > > > > > > > > Best, > > > > > > Yang > > > > > > > > > > > > Khachatryan Roman <[hidden email]> 于2021年1月18日周一 > > > > 下午8:55写道: > > > > > > > > > > > > > Hi Ingo, > > > > > > > > > > > > > > Thanks a lot for this proposal! > > > > > > > > > > > > > > We had a related discussion recently in the context of > > FLINK-19520 > > > > > > > (randomizing tests configuration) [1]. > > > > > > > I believe other scenarios will benefit as well. > > > > > > > > > > > > > > For the end users, I think substitution in configuration files is > > > > > > > preferable over parsing env vars in Flink code. > > > > > > > And for cases without such a file, we could have a default one on > > > the > > > > > > > classpath with all substitutions defined (and then merge > > everything > > > > > from > > > > > > > the user-supplied file). > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 > > > > > > > > > > > > > > Regards, > > > > > > > Roman > > > > > > > > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk <[hidden email]> > > > > wrote: > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > in Ververica Platform we offer a feature to use environment > > > > variables > > > > > > in > > > > > > > > the Flink configuration¹, e.g. > > > > > > > > > > > > > > > > ``` > > > > > > > > s3.access-key: ${S3_ACCESS_KEY} > > > > > > > > ``` > > > > > > > > > > > > > > > > We've been discussing internally whether contributing such a > > > > feature > > > > > to > > > > > > > > Flink directly would make sense and wanted to start a > > discussion > > > on > > > > > > this > > > > > > > > topic. > > > > > > > > > > > > > > > > An alternative way to do so from the above would be parsing > > those > > > > > > > directly > > > > > > > > based on their name, so instead of having it defined in the > > Flink > > > > > > > > configuration as above, it would get automatically set if > > > something > > > > > > like > > > > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. This is > > > > > > somewhat > > > > > > > > similar to what e.g. Spring does, and faces similar challenges > > > > > (dealing > > > > > > > > with "."s etc.) > > > > > > > > > > > > > > > > Although I view both of these approaches as mostly orthogonal, > > > > > > supporting > > > > > > > > both very likely wouldn't make sense, of course. So I was > > > wondering > > > > > > what > > > > > > > > your opinion is in terms of whether the project would benefit > > > from > > > > > > > > environment variable support for the Flink configuration, and > > > > whether > > > > > > > there > > > > > > > > are tendencies as to which approach to go with. > > > > > > > > > > > > > > > > ¹ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > > > > > > > > > > > > > > > Best regards > > > > > > > > Ingo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
I think a short FLIP would be awesome.
I guess this feature hasn't been implemented yet because it has not been implemented yet ;-) I agree that this feature will improve configuration ergonomics big time :-) Cheers, Till On Tue, Jan 19, 2021 at 12:28 PM Ufuk Celebi <[hidden email]> wrote: > Hey all, > > I think that approach 2 is more idiomatic for container deployments where > it can be cumbersome to manually map flink-conf.yaml contents to env vars > [1]. The precedence order outlined by Till would also cover Steven's > hierarchical overwrite requirement. > > I'm really excited about this feature as it will make Flink deployments a > lot more ergonomic. The implementation seems to be not too complicated > (which makes we wonder why we didn't tackle this earlier or whether I'm > missing something). > > I'd also be happy to shepherd this contribution if there is consensus on > the need for it and the approach. Does it make sense to formalize this > decision a bit with a short FLIP? > > – Ufuk > > [1] In Ververica Platform, we support approach 1, because the Flink > configuration is part of the specification for a single Deployment and it's > minimally more convenient to have something like > > flinkConfiguration: > foo: ${BAR} > > for us. I don't think this approach would feel natural when manually > deploying Flink. There would be a clear migration path for our customers, > so I'm not concerned about this too much. > > On Tue, Jan 19, 2021, at 10:01 AM, Till Rohrmann wrote: > > Hi everyone, > > > > Thanks for starting this discussion Ingo. I think being able to use env > > variables to change Flink's configuration will be a very useful feature. > > > > Concerning the two approaches I would be in favour of the second approach > > ($FLINK_CONFIG_S3_ACCESS_KEY) because it does not require the user to > > prepare a special flink-conf.yaml where he inserts env variables for > every > > config value he wants to configure. Since this is not required with the > > second approach, I think it is more general and easier to use. Also, the > > user does not have to remember a second set of names (env names) which he > > has to/can set. > > > > For how to substitute the values, I think it should happen when we load > the > > Flink configuration. First we read the file and then overwrite values > > specified via an env variable or dynamic properties in some defined > order. > > For env.java.opts and other options which are used for starting the JVM > we > > might need special handling in the bash scripts. > > > > Cheers, > > Till > > > > On Tue, Jan 19, 2021 at 9:46 AM Ingo Bürk <[hidden email]> wrote: > > > > > Hi Yang, > > > > > > 1. As you said I think this doesn't affect Ververica Platform, really, > so > > > I'm more than happy to hear and follow the thoughts of people more > > > experienced with Flink than me. > > > 2. I wasn't aware of env.java.opts, but that's definitely a candidate > where > > > a user may want to "escape" it so it doesn't get substituted > immediately, I > > > agree. > > > > > > > > > Regards > > > Ingo > > > > > > On Tue, Jan 19, 2021 at 4:47 AM Yang Wang <[hidden email]> > wrote: > > > > > > > Hi Ingo, > > > > > > > > Thanks for your response. > > > > > > > > 1. Not distinguishing JM/TM is reasonable, but what about the client > > > side. > > > > For Yarn/K8s deployment, > > > > the local flink-conf.yaml will be shipped to JM/TM. So I am just > confused > > > > about where should the environment > > > > variables be replaced? IIUC, it is not an issue for Ververica > Platform > > > > since it is always done in the JM/TM side. > > > > > > > > 2. I believe we should support not do the substitution for specific > key. > > > A > > > > typical use case is "env.java.opts". If the > > > > value contains environment variables, they are expected to be > replaced > > > > exactly when the java command is executed, > > > > not after the java process is started. Maybe escaping with single > quote > > > is > > > > enough. > > > > > > > > 3. The substitution only takes effects on the value makes sense to > me. > > > > > > > > > > > > Best, > > > > Yang > > > > > > > > Steven Wu <[hidden email]> 于2021年1月19日周二 上午12:36写道: > > > > > > > > > Variable substitution (proposed here) is definitely useful. > > > > > > > > > > For us, hierarchical override is more useful. E.g., we may have > the > > > > > default value of "state.checkpoints.dir=path1" defined in > > > > flink-conf.yaml. > > > > > But maybe we want to override it to "state.checkpoints.dir=path2" > via > > > > > environment variable in some scenarios. Otherwise, we have to > define a > > > > > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for the > Flink > > > > > config, which is annoying. > > > > > > > > > > As Ingo pointed, it is also annoying to handle Java property key > naming > > > > > convention (dots separated), as dots aren't allowed in shell env > var > > > > naming > > > > > (All caps, separated with underscore). Shell will complain. We > have to > > > > > bundle all env var overrides (k-v pairs) in a single property value > > > (JSON > > > > > and base64 encode) to avoid it. > > > > > > > > > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <[hidden email]> > wrote: > > > > > > > > > > > Hi Yang, > > > > > > > > > > > > thanks for your questions! I'm glad to see this feature is being > > > > received > > > > > > positively. > > > > > > > > > > > > ad 1) We don't distinguish JM/TM, and I can't think of a good > reason > > > > why > > > > > a > > > > > > user would want to do so. I'm not very experienced with Flink, > > > however, > > > > > so > > > > > > please excuse me if I'm overlooking some obvious reason here. :-) > > > > > > ad 2) Admittedly I don't have a good overview on all the > > > configuration > > > > > > options that exist, but from those that I do know I can't imagine > > > > someone > > > > > > wanting to pass a value like "${MY_VAR}" verbatim. In Ververica > > > > Platform > > > > > as > > > > > > of now we ignore this problem. If, however, this needs to be > > > > addressed, a > > > > > > possible solution could be to allow escaping syntax such as > > > > "\${MY_VAR}". > > > > > > > > > > > > Another point to consider here is when exactly the substitution > takes > > > > > > place: on the "raw" file, or on the parsed key / value > separately, > > > and > > > > if > > > > > > so, should it support both key and value? My current thinking is > that > > > > > > substituting only the value of the parsed entry should be > sufficient. > > > > > > > > > > > > > > > > > > Regards > > > > > > Ingo > > > > > > > > > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang <[hidden email] > > > > > > wrote: > > > > > > > > > > > > > Thanks for kicking off the discussion. > > > > > > > > > > > > > > I think supporting environment variables rendering in the Flink > > > > > > > configuration yaml file is a good idea. Especially for > > > > > > > the Kubernetes environment since we are using the secret > resource > > > to > > > > > > store > > > > > > > the authentication information. > > > > > > > > > > > > > > But I have some questions for how to do it? > > > > > > > 1. The environments in Flink configuration yaml will be > replaced in > > > > > > client, > > > > > > > JobManager, TaskManager or all of them? > > > > > > > 2. If users do not want some config options to be replaced, > how to > > > > > > > achieve that? > > > > > > > > > > > > > > Best, > > > > > > > Yang > > > > > > > > > > > > > > Khachatryan Roman <[hidden email]> 于2021年1月18日周一 > > > > > 下午8:55写道: > > > > > > > > > > > > > > > Hi Ingo, > > > > > > > > > > > > > > > > Thanks a lot for this proposal! > > > > > > > > > > > > > > > > We had a related discussion recently in the context of > > > FLINK-19520 > > > > > > > > (randomizing tests configuration) [1]. > > > > > > > > I believe other scenarios will benefit as well. > > > > > > > > > > > > > > > > For the end users, I think substitution in configuration > files is > > > > > > > > preferable over parsing env vars in Flink code. > > > > > > > > And for cases without such a file, we could have a default > one on > > > > the > > > > > > > > classpath with all substitutions defined (and then merge > > > everything > > > > > > from > > > > > > > > the user-supplied file). > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 > > > > > > > > > > > > > > > > Regards, > > > > > > > > Roman > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk < > [hidden email]> > > > > > wrote: > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > in Ververica Platform we offer a feature to use environment > > > > > variables > > > > > > > in > > > > > > > > > the Flink configuration¹, e.g. > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > s3.access-key: ${S3_ACCESS_KEY} > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > We've been discussing internally whether contributing such > a > > > > > feature > > > > > > to > > > > > > > > > Flink directly would make sense and wanted to start a > > > discussion > > > > on > > > > > > > this > > > > > > > > > topic. > > > > > > > > > > > > > > > > > > An alternative way to do so from the above would be parsing > > > those > > > > > > > > directly > > > > > > > > > based on their name, so instead of having it defined in the > > > Flink > > > > > > > > > configuration as above, it would get automatically set if > > > > something > > > > > > > like > > > > > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. > This is > > > > > > > somewhat > > > > > > > > > similar to what e.g. Spring does, and faces similar > challenges > > > > > > (dealing > > > > > > > > > with "."s etc.) > > > > > > > > > > > > > > > > > > Although I view both of these approaches as mostly > orthogonal, > > > > > > > supporting > > > > > > > > > both very likely wouldn't make sense, of course. So I was > > > > wondering > > > > > > > what > > > > > > > > > your opinion is in terms of whether the project would > benefit > > > > from > > > > > > > > > environment variable support for the Flink configuration, > and > > > > > whether > > > > > > > > there > > > > > > > > > are tendencies as to which approach to go with. > > > > > > > > > > > > > > > > > > ¹ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > > > > > > > > > > > > > > > > > Best regards > > > > > > > > > Ingo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Ingo,
regarding "state.checkpoints.dir: ${CHECKPOINTS_DIR:-path1}", it definitely can work. but now users need to know that we can use "CHECKPOINTS_DIR" env var to override "state.checkpoints.dir". That is the inconvenience that I am trying to avoid. "state.checkpoints.dir" is well documented in the Flink website. Now, we need to document "CHECKPOINTS_DIR" separately. Ideally, we want to just let the user define a env var like "state.checkpoints.dir=some/overriden/path". But it suffers the problem that dot chars are invalid characters for shell env var names. That is the dilemma. That is why we bundled all the non-conformning env var overrides (where variable names containing non-conforming chars) into a single base64 encoded string (like "NON_CONFORMING_OVERRIDES_BASE64=<base64 encoded json string>" in our deployment infrastructure and unpack them in the container startup. It is hacky. I am hoping that there is a more elegant solution. If the configuration key is conforming to shell standard (like S3_ACCESS_KEY), then we don't have a problem. but it will be deviating from the Flink config naming convention (Java property style, dot separated). Thanks, Steven On Tue, Jan 19, 2021 at 6:56 AM Till Rohrmann <[hidden email]> wrote: > I think a short FLIP would be awesome. > > I guess this feature hasn't been implemented yet because it has not been > implemented yet ;-) I agree that this feature will improve configuration > ergonomics big time :-) > > Cheers, > Till > > On Tue, Jan 19, 2021 at 12:28 PM Ufuk Celebi <[hidden email]> wrote: > > > Hey all, > > > > I think that approach 2 is more idiomatic for container deployments where > > it can be cumbersome to manually map flink-conf.yaml contents to env vars > > [1]. The precedence order outlined by Till would also cover Steven's > > hierarchical overwrite requirement. > > > > I'm really excited about this feature as it will make Flink deployments a > > lot more ergonomic. The implementation seems to be not too complicated > > (which makes we wonder why we didn't tackle this earlier or whether I'm > > missing something). > > > > I'd also be happy to shepherd this contribution if there is consensus on > > the need for it and the approach. Does it make sense to formalize this > > decision a bit with a short FLIP? > > > > – Ufuk > > > > [1] In Ververica Platform, we support approach 1, because the Flink > > configuration is part of the specification for a single Deployment and > it's > > minimally more convenient to have something like > > > > flinkConfiguration: > > foo: ${BAR} > > > > for us. I don't think this approach would feel natural when manually > > deploying Flink. There would be a clear migration path for our customers, > > so I'm not concerned about this too much. > > > > On Tue, Jan 19, 2021, at 10:01 AM, Till Rohrmann wrote: > > > Hi everyone, > > > > > > Thanks for starting this discussion Ingo. I think being able to use env > > > variables to change Flink's configuration will be a very useful > feature. > > > > > > Concerning the two approaches I would be in favour of the second > approach > > > ($FLINK_CONFIG_S3_ACCESS_KEY) because it does not require the user to > > > prepare a special flink-conf.yaml where he inserts env variables for > > every > > > config value he wants to configure. Since this is not required with the > > > second approach, I think it is more general and easier to use. Also, > the > > > user does not have to remember a second set of names (env names) which > he > > > has to/can set. > > > > > > For how to substitute the values, I think it should happen when we load > > the > > > Flink configuration. First we read the file and then overwrite values > > > specified via an env variable or dynamic properties in some defined > > order. > > > For env.java.opts and other options which are used for starting the JVM > > we > > > might need special handling in the bash scripts. > > > > > > Cheers, > > > Till > > > > > > On Tue, Jan 19, 2021 at 9:46 AM Ingo Bürk <[hidden email]> wrote: > > > > > > > Hi Yang, > > > > > > > > 1. As you said I think this doesn't affect Ververica Platform, > really, > > so > > > > I'm more than happy to hear and follow the thoughts of people more > > > > experienced with Flink than me. > > > > 2. I wasn't aware of env.java.opts, but that's definitely a candidate > > where > > > > a user may want to "escape" it so it doesn't get substituted > > immediately, I > > > > agree. > > > > > > > > > > > > Regards > > > > Ingo > > > > > > > > On Tue, Jan 19, 2021 at 4:47 AM Yang Wang <[hidden email]> > > wrote: > > > > > > > > > Hi Ingo, > > > > > > > > > > Thanks for your response. > > > > > > > > > > 1. Not distinguishing JM/TM is reasonable, but what about the > client > > > > side. > > > > > For Yarn/K8s deployment, > > > > > the local flink-conf.yaml will be shipped to JM/TM. So I am just > > confused > > > > > about where should the environment > > > > > variables be replaced? IIUC, it is not an issue for Ververica > > Platform > > > > > since it is always done in the JM/TM side. > > > > > > > > > > 2. I believe we should support not do the substitution for specific > > key. > > > > A > > > > > typical use case is "env.java.opts". If the > > > > > value contains environment variables, they are expected to be > > replaced > > > > > exactly when the java command is executed, > > > > > not after the java process is started. Maybe escaping with single > > quote > > > > is > > > > > enough. > > > > > > > > > > 3. The substitution only takes effects on the value makes sense to > > me. > > > > > > > > > > > > > > > Best, > > > > > Yang > > > > > > > > > > Steven Wu <[hidden email]> 于2021年1月19日周二 上午12:36写道: > > > > > > > > > > > Variable substitution (proposed here) is definitely useful. > > > > > > > > > > > > For us, hierarchical override is more useful. E.g., we may have > > the > > > > > > default value of "state.checkpoints.dir=path1" defined in > > > > > flink-conf.yaml. > > > > > > But maybe we want to override it to "state.checkpoints.dir=path2" > > via > > > > > > environment variable in some scenarios. Otherwise, we have to > > define a > > > > > > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for the > > Flink > > > > > > config, which is annoying. > > > > > > > > > > > > As Ingo pointed, it is also annoying to handle Java property key > > naming > > > > > > convention (dots separated), as dots aren't allowed in shell env > > var > > > > > naming > > > > > > (All caps, separated with underscore). Shell will complain. We > > have to > > > > > > bundle all env var overrides (k-v pairs) in a single property > value > > > > (JSON > > > > > > and base64 encode) to avoid it. > > > > > > > > > > > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <[hidden email]> > > wrote: > > > > > > > > > > > > > Hi Yang, > > > > > > > > > > > > > > thanks for your questions! I'm glad to see this feature is > being > > > > > received > > > > > > > positively. > > > > > > > > > > > > > > ad 1) We don't distinguish JM/TM, and I can't think of a good > > reason > > > > > why > > > > > > a > > > > > > > user would want to do so. I'm not very experienced with Flink, > > > > however, > > > > > > so > > > > > > > please excuse me if I'm overlooking some obvious reason here. > :-) > > > > > > > ad 2) Admittedly I don't have a good overview on all the > > > > configuration > > > > > > > options that exist, but from those that I do know I can't > imagine > > > > > someone > > > > > > > wanting to pass a value like "${MY_VAR}" verbatim. In Ververica > > > > > Platform > > > > > > as > > > > > > > of now we ignore this problem. If, however, this needs to be > > > > > addressed, a > > > > > > > possible solution could be to allow escaping syntax such as > > > > > "\${MY_VAR}". > > > > > > > > > > > > > > Another point to consider here is when exactly the substitution > > takes > > > > > > > place: on the "raw" file, or on the parsed key / value > > separately, > > > > and > > > > > if > > > > > > > so, should it support both key and value? My current thinking > is > > that > > > > > > > substituting only the value of the parsed entry should be > > sufficient. > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > Ingo > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang < > [hidden email] > > > > > > > > wrote: > > > > > > > > > > > > > > > Thanks for kicking off the discussion. > > > > > > > > > > > > > > > > I think supporting environment variables rendering in the > Flink > > > > > > > > configuration yaml file is a good idea. Especially for > > > > > > > > the Kubernetes environment since we are using the secret > > resource > > > > to > > > > > > > store > > > > > > > > the authentication information. > > > > > > > > > > > > > > > > But I have some questions for how to do it? > > > > > > > > 1. The environments in Flink configuration yaml will be > > replaced in > > > > > > > client, > > > > > > > > JobManager, TaskManager or all of them? > > > > > > > > 2. If users do not want some config options to be replaced, > > how to > > > > > > > > achieve that? > > > > > > > > > > > > > > > > Best, > > > > > > > > Yang > > > > > > > > > > > > > > > > Khachatryan Roman <[hidden email]> > 于2021年1月18日周一 > > > > > > 下午8:55写道: > > > > > > > > > > > > > > > > > Hi Ingo, > > > > > > > > > > > > > > > > > > Thanks a lot for this proposal! > > > > > > > > > > > > > > > > > > We had a related discussion recently in the context of > > > > FLINK-19520 > > > > > > > > > (randomizing tests configuration) [1]. > > > > > > > > > I believe other scenarios will benefit as well. > > > > > > > > > > > > > > > > > > For the end users, I think substitution in configuration > > files is > > > > > > > > > preferable over parsing env vars in Flink code. > > > > > > > > > And for cases without such a file, we could have a default > > one on > > > > > the > > > > > > > > > classpath with all substitutions defined (and then merge > > > > everything > > > > > > > from > > > > > > > > > the user-supplied file). > > > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Roman > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk < > > [hidden email]> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > in Ververica Platform we offer a feature to use > environment > > > > > > variables > > > > > > > > in > > > > > > > > > > the Flink configuration¹, e.g. > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > > s3.access-key: ${S3_ACCESS_KEY} > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > We've been discussing internally whether contributing > such > > a > > > > > > feature > > > > > > > to > > > > > > > > > > Flink directly would make sense and wanted to start a > > > > discussion > > > > > on > > > > > > > > this > > > > > > > > > > topic. > > > > > > > > > > > > > > > > > > > > An alternative way to do so from the above would be > parsing > > > > those > > > > > > > > > directly > > > > > > > > > > based on their name, so instead of having it defined in > the > > > > Flink > > > > > > > > > > configuration as above, it would get automatically set if > > > > > something > > > > > > > > like > > > > > > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. > > This is > > > > > > > > somewhat > > > > > > > > > > similar to what e.g. Spring does, and faces similar > > challenges > > > > > > > (dealing > > > > > > > > > > with "."s etc.) > > > > > > > > > > > > > > > > > > > > Although I view both of these approaches as mostly > > orthogonal, > > > > > > > > supporting > > > > > > > > > > both very likely wouldn't make sense, of course. So I was > > > > > wondering > > > > > > > > what > > > > > > > > > > your opinion is in terms of whether the project would > > benefit > > > > > from > > > > > > > > > > environment variable support for the Flink configuration, > > and > > > > > > whether > > > > > > > > > there > > > > > > > > > > are tendencies as to which approach to go with. > > > > > > > > > > > > > > > > > > > > ¹ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > > > > > > > > > > > > > > > > > > > Best regards > > > > > > > > > > Ingo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Hi Steven,
understood, thanks for expanding on your point. I will prepare a FLIP now going for the solution of S3_ACCESS_KEY style environment variables since overall this fits Flink use-cases better. I think your point that users have to know how to convert s3.access-key to S3_ACCESS_KEY is valid, but it would follow a logical pattern also used by a big project like Spring. I'll update this thread with the FLIP number once I have written the first draft. Thanks everyone! Regards Ingo On Tue, Jan 19, 2021 at 6:28 PM Steven Wu <[hidden email]> wrote: > Ingo, > > regarding "state.checkpoints.dir: ${CHECKPOINTS_DIR:-path1}", it definitely > can work. but now users need to know that we can use "CHECKPOINTS_DIR" env > var to override "state.checkpoints.dir". That is the inconvenience that I > am trying to avoid. "state.checkpoints.dir" is well documented in the Flink > website. Now, we need to document "CHECKPOINTS_DIR" separately. > > Ideally, we want to just let the user define a env var like > "state.checkpoints.dir=some/overriden/path". But it suffers the problem > that dot chars are invalid characters for shell env var names. That is the > dilemma. > > That is why we bundled all the non-conformning env var overrides (where > variable names containing non-conforming chars) into a single base64 > encoded string (like "NON_CONFORMING_OVERRIDES_BASE64=<base64 encoded json > string>" in our deployment infrastructure and unpack them in the container > startup. It is hacky. I am hoping that there is a more elegant solution. > > If the configuration key is conforming to shell standard (like > S3_ACCESS_KEY), then we don't have a problem. but it will be deviating from > the Flink config naming convention (Java property style, dot separated). > > Thanks, > Steven > > > On Tue, Jan 19, 2021 at 6:56 AM Till Rohrmann <[hidden email]> > wrote: > > > I think a short FLIP would be awesome. > > > > I guess this feature hasn't been implemented yet because it has not been > > implemented yet ;-) I agree that this feature will improve configuration > > ergonomics big time :-) > > > > Cheers, > > Till > > > > On Tue, Jan 19, 2021 at 12:28 PM Ufuk Celebi <[hidden email]> wrote: > > > > > Hey all, > > > > > > I think that approach 2 is more idiomatic for container deployments > where > > > it can be cumbersome to manually map flink-conf.yaml contents to env > vars > > > [1]. The precedence order outlined by Till would also cover Steven's > > > hierarchical overwrite requirement. > > > > > > I'm really excited about this feature as it will make Flink > deployments a > > > lot more ergonomic. The implementation seems to be not too complicated > > > (which makes we wonder why we didn't tackle this earlier or whether I'm > > > missing something). > > > > > > I'd also be happy to shepherd this contribution if there is consensus > on > > > the need for it and the approach. Does it make sense to formalize this > > > decision a bit with a short FLIP? > > > > > > – Ufuk > > > > > > [1] In Ververica Platform, we support approach 1, because the Flink > > > configuration is part of the specification for a single Deployment and > > it's > > > minimally more convenient to have something like > > > > > > flinkConfiguration: > > > foo: ${BAR} > > > > > > for us. I don't think this approach would feel natural when manually > > > deploying Flink. There would be a clear migration path for our > customers, > > > so I'm not concerned about this too much. > > > > > > On Tue, Jan 19, 2021, at 10:01 AM, Till Rohrmann wrote: > > > > Hi everyone, > > > > > > > > Thanks for starting this discussion Ingo. I think being able to use > env > > > > variables to change Flink's configuration will be a very useful > > feature. > > > > > > > > Concerning the two approaches I would be in favour of the second > > approach > > > > ($FLINK_CONFIG_S3_ACCESS_KEY) because it does not require the user to > > > > prepare a special flink-conf.yaml where he inserts env variables for > > > every > > > > config value he wants to configure. Since this is not required with > the > > > > second approach, I think it is more general and easier to use. Also, > > the > > > > user does not have to remember a second set of names (env names) > which > > he > > > > has to/can set. > > > > > > > > For how to substitute the values, I think it should happen when we > load > > > the > > > > Flink configuration. First we read the file and then overwrite values > > > > specified via an env variable or dynamic properties in some defined > > > order. > > > > For env.java.opts and other options which are used for starting the > JVM > > > we > > > > might need special handling in the bash scripts. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Tue, Jan 19, 2021 at 9:46 AM Ingo Bürk <[hidden email]> > wrote: > > > > > > > > > Hi Yang, > > > > > > > > > > 1. As you said I think this doesn't affect Ververica Platform, > > really, > > > so > > > > > I'm more than happy to hear and follow the thoughts of people more > > > > > experienced with Flink than me. > > > > > 2. I wasn't aware of env.java.opts, but that's definitely a > candidate > > > where > > > > > a user may want to "escape" it so it doesn't get substituted > > > immediately, I > > > > > agree. > > > > > > > > > > > > > > > Regards > > > > > Ingo > > > > > > > > > > On Tue, Jan 19, 2021 at 4:47 AM Yang Wang <[hidden email]> > > > wrote: > > > > > > > > > > > Hi Ingo, > > > > > > > > > > > > Thanks for your response. > > > > > > > > > > > > 1. Not distinguishing JM/TM is reasonable, but what about the > > client > > > > > side. > > > > > > For Yarn/K8s deployment, > > > > > > the local flink-conf.yaml will be shipped to JM/TM. So I am just > > > confused > > > > > > about where should the environment > > > > > > variables be replaced? IIUC, it is not an issue for Ververica > > > Platform > > > > > > since it is always done in the JM/TM side. > > > > > > > > > > > > 2. I believe we should support not do the substitution for > specific > > > key. > > > > > A > > > > > > typical use case is "env.java.opts". If the > > > > > > value contains environment variables, they are expected to be > > > replaced > > > > > > exactly when the java command is executed, > > > > > > not after the java process is started. Maybe escaping with single > > > quote > > > > > is > > > > > > enough. > > > > > > > > > > > > 3. The substitution only takes effects on the value makes sense > to > > > me. > > > > > > > > > > > > > > > > > > Best, > > > > > > Yang > > > > > > > > > > > > Steven Wu <[hidden email]> 于2021年1月19日周二 上午12:36写道: > > > > > > > > > > > > > Variable substitution (proposed here) is definitely useful. > > > > > > > > > > > > > > For us, hierarchical override is more useful. E.g., we may > have > > > the > > > > > > > default value of "state.checkpoints.dir=path1" defined in > > > > > > flink-conf.yaml. > > > > > > > But maybe we want to override it to > "state.checkpoints.dir=path2" > > > via > > > > > > > environment variable in some scenarios. Otherwise, we have to > > > define a > > > > > > > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for > the > > > Flink > > > > > > > config, which is annoying. > > > > > > > > > > > > > > As Ingo pointed, it is also annoying to handle Java property > key > > > naming > > > > > > > convention (dots separated), as dots aren't allowed in shell > env > > > var > > > > > > naming > > > > > > > (All caps, separated with underscore). Shell will complain. We > > > have to > > > > > > > bundle all env var overrides (k-v pairs) in a single property > > value > > > > > (JSON > > > > > > > and base64 encode) to avoid it. > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <[hidden email]> > > > wrote: > > > > > > > > > > > > > > > Hi Yang, > > > > > > > > > > > > > > > > thanks for your questions! I'm glad to see this feature is > > being > > > > > > received > > > > > > > > positively. > > > > > > > > > > > > > > > > ad 1) We don't distinguish JM/TM, and I can't think of a good > > > reason > > > > > > why > > > > > > > a > > > > > > > > user would want to do so. I'm not very experienced with > Flink, > > > > > however, > > > > > > > so > > > > > > > > please excuse me if I'm overlooking some obvious reason here. > > :-) > > > > > > > > ad 2) Admittedly I don't have a good overview on all the > > > > > configuration > > > > > > > > options that exist, but from those that I do know I can't > > imagine > > > > > > someone > > > > > > > > wanting to pass a value like "${MY_VAR}" verbatim. In > Ververica > > > > > > Platform > > > > > > > as > > > > > > > > of now we ignore this problem. If, however, this needs to be > > > > > > addressed, a > > > > > > > > possible solution could be to allow escaping syntax such as > > > > > > "\${MY_VAR}". > > > > > > > > > > > > > > > > Another point to consider here is when exactly the > substitution > > > takes > > > > > > > > place: on the "raw" file, or on the parsed key / value > > > separately, > > > > > and > > > > > > if > > > > > > > > so, should it support both key and value? My current thinking > > is > > > that > > > > > > > > substituting only the value of the parsed entry should be > > > sufficient. > > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > Ingo > > > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang < > > [hidden email] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks for kicking off the discussion. > > > > > > > > > > > > > > > > > > I think supporting environment variables rendering in the > > Flink > > > > > > > > > configuration yaml file is a good idea. Especially for > > > > > > > > > the Kubernetes environment since we are using the secret > > > resource > > > > > to > > > > > > > > store > > > > > > > > > the authentication information. > > > > > > > > > > > > > > > > > > But I have some questions for how to do it? > > > > > > > > > 1. The environments in Flink configuration yaml will be > > > replaced in > > > > > > > > client, > > > > > > > > > JobManager, TaskManager or all of them? > > > > > > > > > 2. If users do not want some config options to be replaced, > > > how to > > > > > > > > > achieve that? > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Yang > > > > > > > > > > > > > > > > > > Khachatryan Roman <[hidden email]> > > 于2021年1月18日周一 > > > > > > > 下午8:55写道: > > > > > > > > > > > > > > > > > > > Hi Ingo, > > > > > > > > > > > > > > > > > > > > Thanks a lot for this proposal! > > > > > > > > > > > > > > > > > > > > We had a related discussion recently in the context of > > > > > FLINK-19520 > > > > > > > > > > (randomizing tests configuration) [1]. > > > > > > > > > > I believe other scenarios will benefit as well. > > > > > > > > > > > > > > > > > > > > For the end users, I think substitution in configuration > > > files is > > > > > > > > > > preferable over parsing env vars in Flink code. > > > > > > > > > > And for cases without such a file, we could have a > default > > > one on > > > > > > the > > > > > > > > > > classpath with all substitutions defined (and then merge > > > > > everything > > > > > > > > from > > > > > > > > > > the user-supplied file). > > > > > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Roman > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk < > > > [hidden email]> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > in Ververica Platform we offer a feature to use > > environment > > > > > > > variables > > > > > > > > > in > > > > > > > > > > > the Flink configuration¹, e.g. > > > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > > > s3.access-key: ${S3_ACCESS_KEY} > > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > > > We've been discussing internally whether contributing > > such > > > a > > > > > > > feature > > > > > > > > to > > > > > > > > > > > Flink directly would make sense and wanted to start a > > > > > discussion > > > > > > on > > > > > > > > > this > > > > > > > > > > > topic. > > > > > > > > > > > > > > > > > > > > > > An alternative way to do so from the above would be > > parsing > > > > > those > > > > > > > > > > directly > > > > > > > > > > > based on their name, so instead of having it defined in > > the > > > > > Flink > > > > > > > > > > > configuration as above, it would get automatically set > if > > > > > > something > > > > > > > > > like > > > > > > > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the environment. > > > This is > > > > > > > > > somewhat > > > > > > > > > > > similar to what e.g. Spring does, and faces similar > > > challenges > > > > > > > > (dealing > > > > > > > > > > > with "."s etc.) > > > > > > > > > > > > > > > > > > > > > > Although I view both of these approaches as mostly > > > orthogonal, > > > > > > > > > supporting > > > > > > > > > > > both very likely wouldn't make sense, of course. So I > was > > > > > > wondering > > > > > > > > > what > > > > > > > > > > > your opinion is in terms of whether the project would > > > benefit > > > > > > from > > > > > > > > > > > environment variable support for the Flink > configuration, > > > and > > > > > > > whether > > > > > > > > > > there > > > > > > > > > > > are tendencies as to which approach to go with. > > > > > > > > > > > > > > > > > > > > > > ¹ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables > > > > > > > > > > > > > > > > > > > > > > Best regards > > > > > > > > > > > Ingo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Hi,
sorry, I almost forgot, so just to update this thread: I now started a new thread for the actual FLIP: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-161-Configuration-through-environment-variables-td48094.html Ingo On Wed, Jan 20, 2021 at 11:10 AM Ingo Bürk <[hidden email]> wrote: > Hi Steven, > > understood, thanks for expanding on your point. I will prepare a FLIP now > going for the solution of S3_ACCESS_KEY style environment variables since > overall this fits Flink use-cases better. I think your point that users > have to know how to convert s3.access-key to S3_ACCESS_KEY is valid, but it > would follow a logical pattern also used by a big project like Spring. > > I'll update this thread with the FLIP number once I have written the first > draft. Thanks everyone! > > > Regards > Ingo > > On Tue, Jan 19, 2021 at 6:28 PM Steven Wu <[hidden email]> wrote: > >> Ingo, >> >> regarding "state.checkpoints.dir: ${CHECKPOINTS_DIR:-path1}", it >> definitely >> can work. but now users need to know that we can use "CHECKPOINTS_DIR" env >> var to override "state.checkpoints.dir". That is the inconvenience that I >> am trying to avoid. "state.checkpoints.dir" is well documented in the >> Flink >> website. Now, we need to document "CHECKPOINTS_DIR" separately. >> >> Ideally, we want to just let the user define a env var like >> "state.checkpoints.dir=some/overriden/path". But it suffers the problem >> that dot chars are invalid characters for shell env var names. That is the >> dilemma. >> >> That is why we bundled all the non-conformning env var overrides (where >> variable names containing non-conforming chars) into a single base64 >> encoded string (like "NON_CONFORMING_OVERRIDES_BASE64=<base64 encoded json >> string>" in our deployment infrastructure and unpack them in the >> container >> startup. It is hacky. I am hoping that there is a more elegant solution. >> >> If the configuration key is conforming to shell standard (like >> S3_ACCESS_KEY), then we don't have a problem. but it will be deviating >> from >> the Flink config naming convention (Java property style, dot separated). >> >> Thanks, >> Steven >> >> >> On Tue, Jan 19, 2021 at 6:56 AM Till Rohrmann <[hidden email]> >> wrote: >> >> > I think a short FLIP would be awesome. >> > >> > I guess this feature hasn't been implemented yet because it has not been >> > implemented yet ;-) I agree that this feature will improve configuration >> > ergonomics big time :-) >> > >> > Cheers, >> > Till >> > >> > On Tue, Jan 19, 2021 at 12:28 PM Ufuk Celebi <[hidden email]> wrote: >> > >> > > Hey all, >> > > >> > > I think that approach 2 is more idiomatic for container deployments >> where >> > > it can be cumbersome to manually map flink-conf.yaml contents to env >> vars >> > > [1]. The precedence order outlined by Till would also cover Steven's >> > > hierarchical overwrite requirement. >> > > >> > > I'm really excited about this feature as it will make Flink >> deployments a >> > > lot more ergonomic. The implementation seems to be not too complicated >> > > (which makes we wonder why we didn't tackle this earlier or whether >> I'm >> > > missing something). >> > > >> > > I'd also be happy to shepherd this contribution if there is consensus >> on >> > > the need for it and the approach. Does it make sense to formalize this >> > > decision a bit with a short FLIP? >> > > >> > > – Ufuk >> > > >> > > [1] In Ververica Platform, we support approach 1, because the Flink >> > > configuration is part of the specification for a single Deployment and >> > it's >> > > minimally more convenient to have something like >> > > >> > > flinkConfiguration: >> > > foo: ${BAR} >> > > >> > > for us. I don't think this approach would feel natural when manually >> > > deploying Flink. There would be a clear migration path for our >> customers, >> > > so I'm not concerned about this too much. >> > > >> > > On Tue, Jan 19, 2021, at 10:01 AM, Till Rohrmann wrote: >> > > > Hi everyone, >> > > > >> > > > Thanks for starting this discussion Ingo. I think being able to use >> env >> > > > variables to change Flink's configuration will be a very useful >> > feature. >> > > > >> > > > Concerning the two approaches I would be in favour of the second >> > approach >> > > > ($FLINK_CONFIG_S3_ACCESS_KEY) because it does not require the user >> to >> > > > prepare a special flink-conf.yaml where he inserts env variables for >> > > every >> > > > config value he wants to configure. Since this is not required with >> the >> > > > second approach, I think it is more general and easier to use. Also, >> > the >> > > > user does not have to remember a second set of names (env names) >> which >> > he >> > > > has to/can set. >> > > > >> > > > For how to substitute the values, I think it should happen when we >> load >> > > the >> > > > Flink configuration. First we read the file and then overwrite >> values >> > > > specified via an env variable or dynamic properties in some defined >> > > order. >> > > > For env.java.opts and other options which are used for starting the >> JVM >> > > we >> > > > might need special handling in the bash scripts. >> > > > >> > > > Cheers, >> > > > Till >> > > > >> > > > On Tue, Jan 19, 2021 at 9:46 AM Ingo Bürk <[hidden email]> >> wrote: >> > > > >> > > > > Hi Yang, >> > > > > >> > > > > 1. As you said I think this doesn't affect Ververica Platform, >> > really, >> > > so >> > > > > I'm more than happy to hear and follow the thoughts of people more >> > > > > experienced with Flink than me. >> > > > > 2. I wasn't aware of env.java.opts, but that's definitely a >> candidate >> > > where >> > > > > a user may want to "escape" it so it doesn't get substituted >> > > immediately, I >> > > > > agree. >> > > > > >> > > > > >> > > > > Regards >> > > > > Ingo >> > > > > >> > > > > On Tue, Jan 19, 2021 at 4:47 AM Yang Wang <[hidden email]> >> > > wrote: >> > > > > >> > > > > > Hi Ingo, >> > > > > > >> > > > > > Thanks for your response. >> > > > > > >> > > > > > 1. Not distinguishing JM/TM is reasonable, but what about the >> > client >> > > > > side. >> > > > > > For Yarn/K8s deployment, >> > > > > > the local flink-conf.yaml will be shipped to JM/TM. So I am just >> > > confused >> > > > > > about where should the environment >> > > > > > variables be replaced? IIUC, it is not an issue for Ververica >> > > Platform >> > > > > > since it is always done in the JM/TM side. >> > > > > > >> > > > > > 2. I believe we should support not do the substitution for >> specific >> > > key. >> > > > > A >> > > > > > typical use case is "env.java.opts". If the >> > > > > > value contains environment variables, they are expected to be >> > > replaced >> > > > > > exactly when the java command is executed, >> > > > > > not after the java process is started. Maybe escaping with >> single >> > > quote >> > > > > is >> > > > > > enough. >> > > > > > >> > > > > > 3. The substitution only takes effects on the value makes sense >> to >> > > me. >> > > > > > >> > > > > > >> > > > > > Best, >> > > > > > Yang >> > > > > > >> > > > > > Steven Wu <[hidden email]> 于2021年1月19日周二 上午12:36写道: >> > > > > > >> > > > > > > Variable substitution (proposed here) is definitely useful. >> > > > > > > >> > > > > > > For us, hierarchical override is more useful. E.g., we may >> have >> > > the >> > > > > > > default value of "state.checkpoints.dir=path1" defined in >> > > > > > flink-conf.yaml. >> > > > > > > But maybe we want to override it to >> "state.checkpoints.dir=path2" >> > > via >> > > > > > > environment variable in some scenarios. Otherwise, we have to >> > > define a >> > > > > > > corresponding shell variable (like STATE_CHECKPOINTS_DIR) for >> the >> > > Flink >> > > > > > > config, which is annoying. >> > > > > > > >> > > > > > > As Ingo pointed, it is also annoying to handle Java property >> key >> > > naming >> > > > > > > convention (dots separated), as dots aren't allowed in shell >> env >> > > var >> > > > > > naming >> > > > > > > (All caps, separated with underscore). Shell will complain. We >> > > have to >> > > > > > > bundle all env var overrides (k-v pairs) in a single property >> > value >> > > > > (JSON >> > > > > > > and base64 encode) to avoid it. >> > > > > > > >> > > > > > > On Mon, Jan 18, 2021 at 8:15 AM Ingo Bürk <[hidden email] >> > >> > > wrote: >> > > > > > > >> > > > > > > > Hi Yang, >> > > > > > > > >> > > > > > > > thanks for your questions! I'm glad to see this feature is >> > being >> > > > > > received >> > > > > > > > positively. >> > > > > > > > >> > > > > > > > ad 1) We don't distinguish JM/TM, and I can't think of a >> good >> > > reason >> > > > > > why >> > > > > > > a >> > > > > > > > user would want to do so. I'm not very experienced with >> Flink, >> > > > > however, >> > > > > > > so >> > > > > > > > please excuse me if I'm overlooking some obvious reason >> here. >> > :-) >> > > > > > > > ad 2) Admittedly I don't have a good overview on all the >> > > > > configuration >> > > > > > > > options that exist, but from those that I do know I can't >> > imagine >> > > > > > someone >> > > > > > > > wanting to pass a value like "${MY_VAR}" verbatim. In >> Ververica >> > > > > > Platform >> > > > > > > as >> > > > > > > > of now we ignore this problem. If, however, this needs to be >> > > > > > addressed, a >> > > > > > > > possible solution could be to allow escaping syntax such as >> > > > > > "\${MY_VAR}". >> > > > > > > > >> > > > > > > > Another point to consider here is when exactly the >> substitution >> > > takes >> > > > > > > > place: on the "raw" file, or on the parsed key / value >> > > separately, >> > > > > and >> > > > > > if >> > > > > > > > so, should it support both key and value? My current >> thinking >> > is >> > > that >> > > > > > > > substituting only the value of the parsed entry should be >> > > sufficient. >> > > > > > > > >> > > > > > > > >> > > > > > > > Regards >> > > > > > > > Ingo >> > > > > > > > >> > > > > > > > On Mon, Jan 18, 2021 at 3:48 PM Yang Wang < >> > [hidden email] >> > > > >> > > > > > wrote: >> > > > > > > > >> > > > > > > > > Thanks for kicking off the discussion. >> > > > > > > > > >> > > > > > > > > I think supporting environment variables rendering in the >> > Flink >> > > > > > > > > configuration yaml file is a good idea. Especially for >> > > > > > > > > the Kubernetes environment since we are using the secret >> > > resource >> > > > > to >> > > > > > > > store >> > > > > > > > > the authentication information. >> > > > > > > > > >> > > > > > > > > But I have some questions for how to do it? >> > > > > > > > > 1. The environments in Flink configuration yaml will be >> > > replaced in >> > > > > > > > client, >> > > > > > > > > JobManager, TaskManager or all of them? >> > > > > > > > > 2. If users do not want some config options to be >> replaced, >> > > how to >> > > > > > > > > achieve that? >> > > > > > > > > >> > > > > > > > > Best, >> > > > > > > > > Yang >> > > > > > > > > >> > > > > > > > > Khachatryan Roman <[hidden email]> >> > 于2021年1月18日周一 >> > > > > > > 下午8:55写道: >> > > > > > > > > >> > > > > > > > > > Hi Ingo, >> > > > > > > > > > >> > > > > > > > > > Thanks a lot for this proposal! >> > > > > > > > > > >> > > > > > > > > > We had a related discussion recently in the context of >> > > > > FLINK-19520 >> > > > > > > > > > (randomizing tests configuration) [1]. >> > > > > > > > > > I believe other scenarios will benefit as well. >> > > > > > > > > > >> > > > > > > > > > For the end users, I think substitution in configuration >> > > files is >> > > > > > > > > > preferable over parsing env vars in Flink code. >> > > > > > > > > > And for cases without such a file, we could have a >> default >> > > one on >> > > > > > the >> > > > > > > > > > classpath with all substitutions defined (and then merge >> > > > > everything >> > > > > > > > from >> > > > > > > > > > the user-supplied file). >> > > > > > > > > > >> > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19520 >> > > > > > > > > > >> > > > > > > > > > Regards, >> > > > > > > > > > Roman >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Mon, Jan 18, 2021 at 11:11 AM Ingo Bürk < >> > > [hidden email]> >> > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > Hi everyone, >> > > > > > > > > > > >> > > > > > > > > > > in Ververica Platform we offer a feature to use >> > environment >> > > > > > > variables >> > > > > > > > > in >> > > > > > > > > > > the Flink configuration¹, e.g. >> > > > > > > > > > > >> > > > > > > > > > > ``` >> > > > > > > > > > > s3.access-key: ${S3_ACCESS_KEY} >> > > > > > > > > > > ``` >> > > > > > > > > > > >> > > > > > > > > > > We've been discussing internally whether contributing >> > such >> > > a >> > > > > > > feature >> > > > > > > > to >> > > > > > > > > > > Flink directly would make sense and wanted to start a >> > > > > discussion >> > > > > > on >> > > > > > > > > this >> > > > > > > > > > > topic. >> > > > > > > > > > > >> > > > > > > > > > > An alternative way to do so from the above would be >> > parsing >> > > > > those >> > > > > > > > > > directly >> > > > > > > > > > > based on their name, so instead of having it defined >> in >> > the >> > > > > Flink >> > > > > > > > > > > configuration as above, it would get automatically >> set if >> > > > > > something >> > > > > > > > > like >> > > > > > > > > > > $FLINK_CONFIG_S3_ACCESS_KEY was set in the >> environment. >> > > This is >> > > > > > > > > somewhat >> > > > > > > > > > > similar to what e.g. Spring does, and faces similar >> > > challenges >> > > > > > > > (dealing >> > > > > > > > > > > with "."s etc.) >> > > > > > > > > > > >> > > > > > > > > > > Although I view both of these approaches as mostly >> > > orthogonal, >> > > > > > > > > supporting >> > > > > > > > > > > both very likely wouldn't make sense, of course. So I >> was >> > > > > > wondering >> > > > > > > > > what >> > > > > > > > > > > your opinion is in terms of whether the project would >> > > benefit >> > > > > > from >> > > > > > > > > > > environment variable support for the Flink >> configuration, >> > > and >> > > > > > > whether >> > > > > > > > > > there >> > > > > > > > > > > are tendencies as to which approach to go with. >> > > > > > > > > > > >> > > > > > > > > > > ¹ >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > >> > >> https://docs.ververica.com/user_guide/application_operations/deployments/configure_flink.html#environment-variables >> > > > > > > > > > > >> > > > > > > > > > > Best regards >> > > > > > > > > > > Ingo >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > |
Free forum by Nabble | Edit this page |