Hi everyone,
Dawid and I are working on making parts of ExecutionConfig and TableConfig configurable via config options. This is necessary to make all properties also available in SQL. Additionally, with the new SQL DDL based on properties as well as more connectors and formats coming up, unified configuration becomes more important. We need more features around string-based configuration in the future, which is why Dawid and I would like to propose FLIP-54 for evolving the ConfigOption and Configuration classes: https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit In summary it adds: - documented types and validation - more common types such as memory size, duration, list - simple non-nested object types Looking forward to your feedback, Timo |
Hi Timo,
It looks interesting. Thanks for preparing this FLIP! Client API enhancement benefit from this evolution which hopefully provides a better view of configuration of Flink. In client API enhancement, we likely make the deployment of cluster and submission of job totally defined by configuration. Will take a look at the document in days. Best, tison. Timo Walther <[hidden email]> 于2019年8月16日周五 下午10:12写道: > Hi everyone, > > Dawid and I are working on making parts of ExecutionConfig and > TableConfig configurable via config options. This is necessary to make > all properties also available in SQL. Additionally, with the new SQL DDL > based on properties as well as more connectors and formats coming up, > unified configuration becomes more important. > > We need more features around string-based configuration in the future, > which is why Dawid and I would like to propose FLIP-54 for evolving the > ConfigOption and Configuration classes: > > > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > > In summary it adds: > - documented types and validation > - more common types such as memory size, duration, list > - simple non-nested object types > > Looking forward to your feedback, > Timo > > |
Thanks for starting this design Timo and Dawid,
Improving ConfigOption has been hovering in my mind for a long time. We have seen the benefit when developing blink configurations and connector properties in 1.9 release. Thanks for bringing it up and make such a detailed design. I will leave my thoughts and comments there. Cheers, Jark On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> wrote: > Hi Timo, > > It looks interesting. Thanks for preparing this FLIP! > > Client API enhancement benefit from this evolution which > hopefully provides a better view of configuration of Flink. > In client API enhancement, we likely make the deployment > of cluster and submission of job totally defined by configuration. > > Will take a look at the document in days. > > Best, > tison. > > > Timo Walther <[hidden email]> 于2019年8月16日周五 下午10:12写道: > > > Hi everyone, > > > > Dawid and I are working on making parts of ExecutionConfig and > > TableConfig configurable via config options. This is necessary to make > > all properties also available in SQL. Additionally, with the new SQL DDL > > based on properties as well as more connectors and formats coming up, > > unified configuration becomes more important. > > > > We need more features around string-based configuration in the future, > > which is why Dawid and I would like to propose FLIP-54 for evolving the > > ConfigOption and Configuration classes: > > > > > > > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > > > > In summary it adds: > > - documented types and validation > > - more common types such as memory size, duration, list > > - simple non-nested object types > > > > Looking forward to your feedback, > > Timo > > > > > |
+1 to this, thanks Timo and Dawid for the design.
This allows the currently cluttered configuration of various modules to be unified. This is also first step of one of the keys to making new unified TableEnvironment available for production. Previously, we did encounter complex configurations, such as specifying the skewed values of column in DDL. The skew may be a single field or a combination of multiple fields. So the configuration is very troublesome. We used JSON string to configure it. Best, Jingsong Lee ------------------------------------------------------------------ From:Jark Wu <[hidden email]> Send Time:2019年8月16日(星期五) 16:44 To:dev <[hidden email]> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration Thanks for starting this design Timo and Dawid, Improving ConfigOption has been hovering in my mind for a long time. We have seen the benefit when developing blink configurations and connector properties in 1.9 release. Thanks for bringing it up and make such a detailed design. I will leave my thoughts and comments there. Cheers, Jark On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> wrote: > Hi Timo, > > It looks interesting. Thanks for preparing this FLIP! > > Client API enhancement benefit from this evolution which > hopefully provides a better view of configuration of Flink. > In client API enhancement, we likely make the deployment > of cluster and submission of job totally defined by configuration. > > Will take a look at the document in days. > > Best, > tison. > > > Timo Walther <[hidden email]> 于2019年8月16日周五 下午10:12写道: > > > Hi everyone, > > > > Dawid and I are working on making parts of ExecutionConfig and > > TableConfig configurable via config options. This is necessary to make > > all properties also available in SQL. Additionally, with the new SQL DDL > > based on properties as well as more connectors and formats coming up, > > unified configuration becomes more important. > > > > We need more features around string-based configuration in the future, > > which is why Dawid and I would like to propose FLIP-54 for evolving the > > ConfigOption and Configuration classes: > > > > > > > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > > > > In summary it adds: > > - documented types and validation > > - more common types such as memory size, duration, list > > - simple non-nested object types > > > > Looking forward to your feedback, > > Timo > > > > > |
I like the idea of enhancing the configuration and to do early validation.
I feel that some of the ideas in the FLIP seem a bit ad hoc, though. For example, having a boolean "isList" is a clear indication of not having thought through the type/category system. Also, having a more clear category system makes validation simpler. For example, I have seen systems distinguishing between numeric parameters (valid ranges), category parameters (set of possible values), quantities like duration and memory size (need measure and unit), which results in an elegant system for validation. On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <[hidden email]> wrote: > +1 to this, thanks Timo and Dawid for the design. > This allows the currently cluttered configuration of various > modules to be unified. > This is also first step of one of the keys to making new unified > TableEnvironment available for production. > > Previously, we did encounter complex configurations, such as > specifying the skewed values of column in DDL. The skew may > be a single field or a combination of multiple fields. So the > configuration is very troublesome. We used JSON string to > configure it. > > Best, > Jingsong Lee > > > > ------------------------------------------------------------------ > From:Jark Wu <[hidden email]> > Send Time:2019年8月16日(星期五) 16:44 > To:dev <[hidden email]> > Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration > > Thanks for starting this design Timo and Dawid, > > Improving ConfigOption has been hovering in my mind for a long time. > We have seen the benefit when developing blink configurations and connector > properties in 1.9 release. > Thanks for bringing it up and make such a detailed design. > I will leave my thoughts and comments there. > > Cheers, > Jark > > > On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> wrote: > > > Hi Timo, > > > > It looks interesting. Thanks for preparing this FLIP! > > > > Client API enhancement benefit from this evolution which > > hopefully provides a better view of configuration of Flink. > > In client API enhancement, we likely make the deployment > > of cluster and submission of job totally defined by configuration. > > > > Will take a look at the document in days. > > > > Best, > > tison. > > > > > > Timo Walther <[hidden email]> 于2019年8月16日周五 下午10:12写道: > > > > > Hi everyone, > > > > > > Dawid and I are working on making parts of ExecutionConfig and > > > TableConfig configurable via config options. This is necessary to make > > > all properties also available in SQL. Additionally, with the new SQL > DDL > > > based on properties as well as more connectors and formats coming up, > > > unified configuration becomes more important. > > > > > > We need more features around string-based configuration in the future, > > > which is why Dawid and I would like to propose FLIP-54 for evolving the > > > ConfigOption and Configuration classes: > > > > > > > > > > > > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > > > > > > In summary it adds: > > > - documented types and validation > > > - more common types such as memory size, duration, list > > > - simple non-nested object types > > > > > > Looking forward to your feedback, > > > Timo > > > > > > > > > > |
Hi Stephan,
Thank you for your opinion. Actually list/composite types are the topics we spent the most of the time. I understand that from a perspective of a full blown type system, a field like isList may look weird. Please let me elaborate a bit more on the reason behind it though. Maybe we weren't clear enough about it in the FLIP. The key feature of all the conifg options is that they must have a string representation as they might come from a configuration file. Moreover it must be a human readable format, so that the values might be manually adjusted. Having that in mind we did not want to add a support of an arbitrary nesting and we decided to allow for lists only (and flat objects - I think though in the current design there is a mistake around the Configurable interface). I think though you have a point here and it would be better to have a ListConfigOption instead of this field. Does it make sense to you? As for the second part of your message. I am not sure if I understood it. The validators work with parse/deserialized values from Configuration that means they can be bound to the generic parameter of Configuration. You can have a RangeValidator<? extends Comparable/Number>. I don't think the type hierarchy in the ConfigOption has anything to do with the validation logic. Could you elaborate a bit more what did you mean? Best, Dawid On 18/08/2019 16:42, Stephan Ewen wrote: > I like the idea of enhancing the configuration and to do early validation. > > I feel that some of the ideas in the FLIP seem a bit ad hoc, though. For > example, having a boolean "isList" is a clear indication of not having > thought through the type/category system. > Also, having a more clear category system makes validation simpler. > > For example, I have seen systems distinguishing between numeric parameters > (valid ranges), category parameters (set of possible values), quantities > like duration and memory size (need measure and unit), which results in an > elegant system for validation. > > > On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <[hidden email]> > wrote: > >> +1 to this, thanks Timo and Dawid for the design. >> This allows the currently cluttered configuration of various >> modules to be unified. >> This is also first step of one of the keys to making new unified >> TableEnvironment available for production. >> >> Previously, we did encounter complex configurations, such as >> specifying the skewed values of column in DDL. The skew may >> be a single field or a combination of multiple fields. So the >> configuration is very troublesome. We used JSON string to >> configure it. >> >> Best, >> Jingsong Lee >> >> >> >> ------------------------------------------------------------------ >> From:Jark Wu <[hidden email]> >> Send Time:2019年8月16日(星期五) 16:44 >> To:dev <[hidden email]> >> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration >> >> Thanks for starting this design Timo and Dawid, >> >> Improving ConfigOption has been hovering in my mind for a long time. >> We have seen the benefit when developing blink configurations and connector >> properties in 1.9 release. >> Thanks for bringing it up and make such a detailed design. >> I will leave my thoughts and comments there. >> >> Cheers, >> Jark >> >> >> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> wrote: >> >>> Hi Timo, >>> >>> It looks interesting. Thanks for preparing this FLIP! >>> >>> Client API enhancement benefit from this evolution which >>> hopefully provides a better view of configuration of Flink. >>> In client API enhancement, we likely make the deployment >>> of cluster and submission of job totally defined by configuration. >>> >>> Will take a look at the document in days. >>> >>> Best, >>> tison. >>> >>> >>> Timo Walther <[hidden email]> 于2019年8月16日周五 下午10:12写道: >>> >>>> Hi everyone, >>>> >>>> Dawid and I are working on making parts of ExecutionConfig and >>>> TableConfig configurable via config options. This is necessary to make >>>> all properties also available in SQL. Additionally, with the new SQL >> DDL >>>> based on properties as well as more connectors and formats coming up, >>>> unified configuration becomes more important. >>>> >>>> We need more features around string-based configuration in the future, >>>> which is why Dawid and I would like to propose FLIP-54 for evolving the >>>> ConfigOption and Configuration classes: >>>> >>>> >>>> >> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>>> In summary it adds: >>>> - documented types and validation >>>> - more common types such as memory size, duration, list >>>> - simple non-nested object types >>>> >>>> Looking forward to your feedback, >>>> Timo >>>> >>>> >> signature.asc (849 bytes) Download Attachment |
A "List Type" sounds like a good direction to me.
The comment on the type system was a bit brief, I agree. The idea is to see if something like that can ease validation. Especially the correlation system seems quite complex (proxies to work around order of initialization). For example, let's assume we don't think primarily about "java types" but would define types as one of the following (just examples, haven't thought all the details through): (a) category type: implies string, and a fix set of possible values. Those would be passes and naturally make it into the docs and validation. Maps to a String or Enum in Java. (b) numeric integer type: implies long (or optionally integer, if we want to automatically check overflow / underflow). would take typical domain validators, like non-negative, etc. (c) numeric real type: same as above (double or float) (d) numeric interval type: either defined as an interval, or references other parameter by key. validation by valid interval. (e) quantity: a measure and a unit. separately parsable. The measure's type could be any of the numeric types above, with same validation rules. With a system like the above, would we still correlation validators? Are there still cases that we need to catch early (config loading) or are the remaining cases sufficiently rare and runtime or setup specific, that it is fine to handle them in component initialization? On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz <[hidden email]> wrote: > Hi Stephan, > > Thank you for your opinion. > > Actually list/composite types are the topics we spent the most of the > time. I understand that from a perspective of a full blown type system, > a field like isList may look weird. Please let me elaborate a bit more > on the reason behind it though. Maybe we weren't clear enough about it > in the FLIP. The key feature of all the conifg options is that they must > have a string representation as they might come from a configuration > file. Moreover it must be a human readable format, so that the values > might be manually adjusted. Having that in mind we did not want to add a > support of an arbitrary nesting and we decided to allow for lists only > (and flat objects - I think though in the current design there is a > mistake around the Configurable interface). I think though you have a > point here and it would be better to have a ListConfigOption instead of > this field. Does it make sense to you? > > As for the second part of your message. I am not sure if I understood > it. The validators work with parse/deserialized values from > Configuration that means they can be bound to the generic parameter of > Configuration. You can have a RangeValidator<? extends > Comparable/Number>. I don't think the type hierarchy in the ConfigOption > has anything to do with the validation logic. Could you elaborate a bit > more what did you mean? > > Best, > > Dawid > > On 18/08/2019 16:42, Stephan Ewen wrote: > > I like the idea of enhancing the configuration and to do early > validation. > > > > I feel that some of the ideas in the FLIP seem a bit ad hoc, though. For > > example, having a boolean "isList" is a clear indication of not having > > thought through the type/category system. > > Also, having a more clear category system makes validation simpler. > > > > For example, I have seen systems distinguishing between numeric > parameters > > (valid ranges), category parameters (set of possible values), quantities > > like duration and memory size (need measure and unit), which results in > an > > elegant system for validation. > > > > > > On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <[hidden email] > .invalid> > > wrote: > > > >> +1 to this, thanks Timo and Dawid for the design. > >> This allows the currently cluttered configuration of various > >> modules to be unified. > >> This is also first step of one of the keys to making new unified > >> TableEnvironment available for production. > >> > >> Previously, we did encounter complex configurations, such as > >> specifying the skewed values of column in DDL. The skew may > >> be a single field or a combination of multiple fields. So the > >> configuration is very troublesome. We used JSON string to > >> configure it. > >> > >> Best, > >> Jingsong Lee > >> > >> > >> > >> ------------------------------------------------------------------ > >> From:Jark Wu <[hidden email]> > >> Send Time:2019年8月16日(星期五) 16:44 > >> To:dev <[hidden email]> > >> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration > >> > >> Thanks for starting this design Timo and Dawid, > >> > >> Improving ConfigOption has been hovering in my mind for a long time. > >> We have seen the benefit when developing blink configurations and > connector > >> properties in 1.9 release. > >> Thanks for bringing it up and make such a detailed design. > >> I will leave my thoughts and comments there. > >> > >> Cheers, > >> Jark > >> > >> > >> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> wrote: > >> > >>> Hi Timo, > >>> > >>> It looks interesting. Thanks for preparing this FLIP! > >>> > >>> Client API enhancement benefit from this evolution which > >>> hopefully provides a better view of configuration of Flink. > >>> In client API enhancement, we likely make the deployment > >>> of cluster and submission of job totally defined by configuration. > >>> > >>> Will take a look at the document in days. > >>> > >>> Best, > >>> tison. > >>> > >>> > >>> Timo Walther <[hidden email]> 于2019年8月16日周五 下午10:12写道: > >>> > >>>> Hi everyone, > >>>> > >>>> Dawid and I are working on making parts of ExecutionConfig and > >>>> TableConfig configurable via config options. This is necessary to make > >>>> all properties also available in SQL. Additionally, with the new SQL > >> DDL > >>>> based on properties as well as more connectors and formats coming up, > >>>> unified configuration becomes more important. > >>>> > >>>> We need more features around string-based configuration in the future, > >>>> which is why Dawid and I would like to propose FLIP-54 for evolving > the > >>>> ConfigOption and Configuration classes: > >>>> > >>>> > >>>> > >> > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > >>>> In summary it adds: > >>>> - documented types and validation > >>>> - more common types such as memory size, duration, list > >>>> - simple non-nested object types > >>>> > >>>> Looking forward to your feedback, > >>>> Timo > >>>> > >>>> > >> > > |
Hi Stephan,
thanks for your suggestions. Let me give you some background about the decisions made in this FLIP: 1. Goal: The FLIP is labelled "evolve" not "rework" because we did not want to change the entire configuration infrastructure. Both for backwards-compatibility reasons and the amount of work that would be required to update all options. If our goal is to rework the configuration option entirely, I might suggest to switch to JSON format with JSON schema and JSON validator. However, setting properties in a CLI or web interface becomes more tricky the more nested structures are allowed. 2. Class-based Options: The current ConfigOption<T> class is centered around Java classes where T is determined by the default value. The FLIP just makes this more explicit by offering an explicit `intType()` method etc. The current design of validators centered around Java classes makes it possible to have typical domain validators baked by generics as you suggested. If we introduce types such as "quantity with measure and unit" we still need to get a class out of this option at the end, so why changing a proven concept? 3. List Options: The `isList` prevents having arbitrary nesting. As Dawid mentioned, we kept human readability in mind. For every atomic option like "key=12" can be represented by a list "keys=12;13". But we don't want to go further; esp. no nesting. A dedicated list option would start making this more complicated such as "ListOption(ObjectOption(ListOption(IntOption, ...), StringOption(...)))", do we want that? 4. Correlation: The correlation part is one of the suggestions that I like least in the document. We can also discuss removing it entirely, but I think it solves the use case of relating options with each other in a flexible way right next to the actual option. Instead of being hidden in some component initialization, we should put it close to the option to also perform validation eagerly instead of failing at runtime when the option is accessed the first time. Regards, Timo Am 18.08.19 um 23:32 schrieb Stephan Ewen: > A "List Type" sounds like a good direction to me. > > The comment on the type system was a bit brief, I agree. The idea is to see > if something like that can ease validation. Especially the correlation > system seems quite complex (proxies to work around order of initialization). > > For example, let's assume we don't think primarily about "java types" but > would define types as one of the following (just examples, haven't thought > all the details through): > > (a) category type: implies string, and a fix set of possible values. > Those would be passes and naturally make it into the docs and validation. > Maps to a String or Enum in Java. > > (b) numeric integer type: implies long (or optionally integer, if we want > to automatically check overflow / underflow). would take typical domain > validators, like non-negative, etc. > > (c) numeric real type: same as above (double or float) > > (d) numeric interval type: either defined as an interval, or references > other parameter by key. validation by valid interval. > > (e) quantity: a measure and a unit. separately parsable. The measure's > type could be any of the numeric types above, with same validation rules. > > With a system like the above, would we still correlation validators? Are > there still cases that we need to catch early (config loading) or are the > remaining cases sufficiently rare and runtime or setup specific, that it is > fine to handle them in component initialization? > > > On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz <[hidden email]> > wrote: > >> Hi Stephan, >> >> Thank you for your opinion. >> >> Actually list/composite types are the topics we spent the most of the >> time. I understand that from a perspective of a full blown type system, >> a field like isList may look weird. Please let me elaborate a bit more >> on the reason behind it though. Maybe we weren't clear enough about it >> in the FLIP. The key feature of all the conifg options is that they must >> have a string representation as they might come from a configuration >> file. Moreover it must be a human readable format, so that the values >> might be manually adjusted. Having that in mind we did not want to add a >> support of an arbitrary nesting and we decided to allow for lists only >> (and flat objects - I think though in the current design there is a >> mistake around the Configurable interface). I think though you have a >> point here and it would be better to have a ListConfigOption instead of >> this field. Does it make sense to you? >> >> As for the second part of your message. I am not sure if I understood >> it. The validators work with parse/deserialized values from >> Configuration that means they can be bound to the generic parameter of >> Configuration. You can have a RangeValidator<? extends >> Comparable/Number>. I don't think the type hierarchy in the ConfigOption >> has anything to do with the validation logic. Could you elaborate a bit >> more what did you mean? >> >> Best, >> >> Dawid >> >> On 18/08/2019 16:42, Stephan Ewen wrote: >>> I like the idea of enhancing the configuration and to do early >> validation. >>> I feel that some of the ideas in the FLIP seem a bit ad hoc, though. For >>> example, having a boolean "isList" is a clear indication of not having >>> thought through the type/category system. >>> Also, having a more clear category system makes validation simpler. >>> >>> For example, I have seen systems distinguishing between numeric >> parameters >>> (valid ranges), category parameters (set of possible values), quantities >>> like duration and memory size (need measure and unit), which results in >> an >>> elegant system for validation. >>> >>> >>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <[hidden email] >> .invalid> >>> wrote: >>> >>>> +1 to this, thanks Timo and Dawid for the design. >>>> This allows the currently cluttered configuration of various >>>> modules to be unified. >>>> This is also first step of one of the keys to making new unified >>>> TableEnvironment available for production. >>>> >>>> Previously, we did encounter complex configurations, such as >>>> specifying the skewed values of column in DDL. The skew may >>>> be a single field or a combination of multiple fields. So the >>>> configuration is very troublesome. We used JSON string to >>>> configure it. >>>> >>>> Best, >>>> Jingsong Lee >>>> >>>> >>>> >>>> ------------------------------------------------------------------ >>>> From:Jark Wu <[hidden email]> >>>> Send Time:2019年8月16日(星期五) 16:44 >>>> To:dev <[hidden email]> >>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration >>>> >>>> Thanks for starting this design Timo and Dawid, >>>> >>>> Improving ConfigOption has been hovering in my mind for a long time. >>>> We have seen the benefit when developing blink configurations and >> connector >>>> properties in 1.9 release. >>>> Thanks for bringing it up and make such a detailed design. >>>> I will leave my thoughts and comments there. >>>> >>>> Cheers, >>>> Jark >>>> >>>> >>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> wrote: >>>> >>>>> Hi Timo, >>>>> >>>>> It looks interesting. Thanks for preparing this FLIP! >>>>> >>>>> Client API enhancement benefit from this evolution which >>>>> hopefully provides a better view of configuration of Flink. >>>>> In client API enhancement, we likely make the deployment >>>>> of cluster and submission of job totally defined by configuration. >>>>> >>>>> Will take a look at the document in days. >>>>> >>>>> Best, >>>>> tison. >>>>> >>>>> >>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 下午10:12写道: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> Dawid and I are working on making parts of ExecutionConfig and >>>>>> TableConfig configurable via config options. This is necessary to make >>>>>> all properties also available in SQL. Additionally, with the new SQL >>>> DDL >>>>>> based on properties as well as more connectors and formats coming up, >>>>>> unified configuration becomes more important. >>>>>> >>>>>> We need more features around string-based configuration in the future, >>>>>> which is why Dawid and I would like to propose FLIP-54 for evolving >> the >>>>>> ConfigOption and Configuration classes: >>>>>> >>>>>> >>>>>> >> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>>>>> In summary it adds: >>>>>> - documented types and validation >>>>>> - more common types such as memory size, duration, list >>>>>> - simple non-nested object types >>>>>> >>>>>> Looking forward to your feedback, >>>>>> Timo >>>>>> >>>>>> >> |
Hi everyone,
thanks for all the feedback we have received online and offline. It showed that many people support the idea of evolving the Flink configuration functionality. I'm almost sure that this FLIP will not solve all issues but at least will improve the current status. We've updated the document and replaced the Correlation part with the concept of a ConfigOptionGroup that can provide all available options of a group plus custom group validators for eager validation. For now, this eager group validation will only be used at certain locations in the Flink code but it prepares for maybe validating the entire global configuration before submitting a job in the future. Please take another look if you find time. I hope we can proceed with the voting process if there are no objections. Regards, Timo Am 19.08.19 um 12:54 schrieb Timo Walther: > Hi Stephan, > > thanks for your suggestions. Let me give you some background about the > decisions made in this FLIP: > > 1. Goal: The FLIP is labelled "evolve" not "rework" because we did not > want to change the entire configuration infrastructure. Both for > backwards-compatibility reasons and the amount of work that would be > required to update all options. If our goal is to rework the > configuration option entirely, I might suggest to switch to JSON > format with JSON schema and JSON validator. However, setting > properties in a CLI or web interface becomes more tricky the more > nested structures are allowed. > > 2. Class-based Options: The current ConfigOption<T> class is centered > around Java classes where T is determined by the default value. The > FLIP just makes this more explicit by offering an explicit `intType()` > method etc. The current design of validators centered around Java > classes makes it possible to have typical domain validators baked by > generics as you suggested. If we introduce types such as "quantity > with measure and unit" we still need to get a class out of this option > at the end, so why changing a proven concept? > > 3. List Options: The `isList` prevents having arbitrary nesting. As > Dawid mentioned, we kept human readability in mind. For every atomic > option like "key=12" can be represented by a list "keys=12;13". But we > don't want to go further; esp. no nesting. A dedicated list option > would start making this more complicated such as > "ListOption(ObjectOption(ListOption(IntOption, ...), > StringOption(...)))", do we want that? > > 4. Correlation: The correlation part is one of the suggestions that I > like least in the document. We can also discuss removing it entirely, > but I think it solves the use case of relating options with each other > in a flexible way right next to the actual option. Instead of being > hidden in some component initialization, we should put it close to the > option to also perform validation eagerly instead of failing at > runtime when the option is accessed the first time. > > Regards, > Timo > > > Am 18.08.19 um 23:32 schrieb Stephan Ewen: >> A "List Type" sounds like a good direction to me. >> >> The comment on the type system was a bit brief, I agree. The idea is >> to see >> if something like that can ease validation. Especially the correlation >> system seems quite complex (proxies to work around order of >> initialization). >> >> For example, let's assume we don't think primarily about "java types" >> but >> would define types as one of the following (just examples, haven't >> thought >> all the details through): >> >> (a) category type: implies string, and a fix set of possible values. >> Those would be passes and naturally make it into the docs and >> validation. >> Maps to a String or Enum in Java. >> >> (b) numeric integer type: implies long (or optionally integer, if >> we want >> to automatically check overflow / underflow). would take typical domain >> validators, like non-negative, etc. >> >> (c) numeric real type: same as above (double or float) >> >> (d) numeric interval type: either defined as an interval, or >> references >> other parameter by key. validation by valid interval. >> >> (e) quantity: a measure and a unit. separately parsable. The >> measure's >> type could be any of the numeric types above, with same validation >> rules. >> >> With a system like the above, would we still correlation validators? Are >> there still cases that we need to catch early (config loading) or are >> the >> remaining cases sufficiently rare and runtime or setup specific, that >> it is >> fine to handle them in component initialization? >> >> >> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >> <[hidden email]> >> wrote: >> >>> Hi Stephan, >>> >>> Thank you for your opinion. >>> >>> Actually list/composite types are the topics we spent the most of the >>> time. I understand that from a perspective of a full blown type system, >>> a field like isList may look weird. Please let me elaborate a bit more >>> on the reason behind it though. Maybe we weren't clear enough about it >>> in the FLIP. The key feature of all the conifg options is that they >>> must >>> have a string representation as they might come from a configuration >>> file. Moreover it must be a human readable format, so that the values >>> might be manually adjusted. Having that in mind we did not want to >>> add a >>> support of an arbitrary nesting and we decided to allow for lists only >>> (and flat objects - I think though in the current design there is a >>> mistake around the Configurable interface). I think though you have a >>> point here and it would be better to have a ListConfigOption instead of >>> this field. Does it make sense to you? >>> >>> As for the second part of your message. I am not sure if I understood >>> it. The validators work with parse/deserialized values from >>> Configuration that means they can be bound to the generic parameter of >>> Configuration. You can have a RangeValidator<? extends >>> Comparable/Number>. I don't think the type hierarchy in the >>> ConfigOption >>> has anything to do with the validation logic. Could you elaborate a bit >>> more what did you mean? >>> >>> Best, >>> >>> Dawid >>> >>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>> I like the idea of enhancing the configuration and to do early >>> validation. >>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, >>>> though. For >>>> example, having a boolean "isList" is a clear indication of not having >>>> thought through the type/category system. >>>> Also, having a more clear category system makes validation simpler. >>>> >>>> For example, I have seen systems distinguishing between numeric >>> parameters >>>> (valid ranges), category parameters (set of possible values), >>>> quantities >>>> like duration and memory size (need measure and unit), which >>>> results in >>> an >>>> elegant system for validation. >>>> >>>> >>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <[hidden email] >>> .invalid> >>>> wrote: >>>> >>>>> +1 to this, thanks Timo and Dawid for the design. >>>>> This allows the currently cluttered configuration of various >>>>> modules to be unified. >>>>> This is also first step of one of the keys to making new unified >>>>> TableEnvironment available for production. >>>>> >>>>> Previously, we did encounter complex configurations, such as >>>>> specifying the skewed values of column in DDL. The skew may >>>>> be a single field or a combination of multiple fields. So the >>>>> configuration is very troublesome. We used JSON string to >>>>> configure it. >>>>> >>>>> Best, >>>>> Jingsong Lee >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------ >>>>> From:Jark Wu <[hidden email]> >>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>> To:dev <[hidden email]> >>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration >>>>> >>>>> Thanks for starting this design Timo and Dawid, >>>>> >>>>> Improving ConfigOption has been hovering in my mind for a long time. >>>>> We have seen the benefit when developing blink configurations and >>> connector >>>>> properties in 1.9 release. >>>>> Thanks for bringing it up and make such a detailed design. >>>>> I will leave my thoughts and comments there. >>>>> >>>>> Cheers, >>>>> Jark >>>>> >>>>> >>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> wrote: >>>>> >>>>>> Hi Timo, >>>>>> >>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>> >>>>>> Client API enhancement benefit from this evolution which >>>>>> hopefully provides a better view of configuration of Flink. >>>>>> In client API enhancement, we likely make the deployment >>>>>> of cluster and submission of job totally defined by configuration. >>>>>> >>>>>> Will take a look at the document in days. >>>>>> >>>>>> Best, >>>>>> tison. >>>>>> >>>>>> >>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 下午10:12写道: >>>>>> >>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> Dawid and I are working on making parts of ExecutionConfig and >>>>>>> TableConfig configurable via config options. This is necessary >>>>>>> to make >>>>>>> all properties also available in SQL. Additionally, with the new >>>>>>> SQL >>>>> DDL >>>>>>> based on properties as well as more connectors and formats >>>>>>> coming up, >>>>>>> unified configuration becomes more important. >>>>>>> >>>>>>> We need more features around string-based configuration in the >>>>>>> future, >>>>>>> which is why Dawid and I would like to propose FLIP-54 for evolving >>> the >>>>>>> ConfigOption and Configuration classes: >>>>>>> >>>>>>> >>>>>>> >>> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>> >>>>>>> In summary it adds: >>>>>>> - documented types and validation >>>>>>> - more common types such as memory size, duration, list >>>>>>> - simple non-nested object types >>>>>>> >>>>>>> Looking forward to your feedback, >>>>>>> Timo >>>>>>> >>>>>>> >>> |
Hi everyone,
I updated the FLIP proposal one more time as mentioned in the voting thread. If there are no objections, I will start a new voting thread tomorrow at 9am Berlin time. Thanks, Timo On 22.08.19 14:19, Timo Walther wrote: > Hi everyone, > > thanks for all the feedback we have received online and offline. It > showed that many people support the idea of evolving the Flink > configuration functionality. I'm almost sure that this FLIP will not > solve all issues but at least will improve the current status. > > We've updated the document and replaced the Correlation part with the > concept of a ConfigOptionGroup that can provide all available options > of a group plus custom group validators for eager validation. For now, > this eager group validation will only be used at certain locations in > the Flink code but it prepares for maybe validating the entire global > configuration before submitting a job in the future. > > Please take another look if you find time. I hope we can proceed with > the voting process if there are no objections. > > Regards, > Timo > > Am 19.08.19 um 12:54 schrieb Timo Walther: >> Hi Stephan, >> >> thanks for your suggestions. Let me give you some background about >> the decisions made in this FLIP: >> >> 1. Goal: The FLIP is labelled "evolve" not "rework" because we did >> not want to change the entire configuration infrastructure. Both for >> backwards-compatibility reasons and the amount of work that would be >> required to update all options. If our goal is to rework the >> configuration option entirely, I might suggest to switch to JSON >> format with JSON schema and JSON validator. However, setting >> properties in a CLI or web interface becomes more tricky the more >> nested structures are allowed. >> >> 2. Class-based Options: The current ConfigOption<T> class is centered >> around Java classes where T is determined by the default value. The >> FLIP just makes this more explicit by offering an explicit >> `intType()` method etc. The current design of validators centered >> around Java classes makes it possible to have typical domain >> validators baked by generics as you suggested. If we introduce types >> such as "quantity with measure and unit" we still need to get a class >> out of this option at the end, so why changing a proven concept? >> >> 3. List Options: The `isList` prevents having arbitrary nesting. As >> Dawid mentioned, we kept human readability in mind. For every atomic >> option like "key=12" can be represented by a list "keys=12;13". But >> we don't want to go further; esp. no nesting. A dedicated list option >> would start making this more complicated such as >> "ListOption(ObjectOption(ListOption(IntOption, ...), >> StringOption(...)))", do we want that? >> >> 4. Correlation: The correlation part is one of the suggestions that I >> like least in the document. We can also discuss removing it entirely, >> but I think it solves the use case of relating options with each >> other in a flexible way right next to the actual option. Instead of >> being hidden in some component initialization, we should put it close >> to the option to also perform validation eagerly instead of failing >> at runtime when the option is accessed the first time. >> >> Regards, >> Timo >> >> >> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>> A "List Type" sounds like a good direction to me. >>> >>> The comment on the type system was a bit brief, I agree. The idea is >>> to see >>> if something like that can ease validation. Especially the correlation >>> system seems quite complex (proxies to work around order of >>> initialization). >>> >>> For example, let's assume we don't think primarily about "java >>> types" but >>> would define types as one of the following (just examples, haven't >>> thought >>> all the details through): >>> >>> (a) category type: implies string, and a fix set of possible values. >>> Those would be passes and naturally make it into the docs and >>> validation. >>> Maps to a String or Enum in Java. >>> >>> (b) numeric integer type: implies long (or optionally integer, if >>> we want >>> to automatically check overflow / underflow). would take typical domain >>> validators, like non-negative, etc. >>> >>> (c) numeric real type: same as above (double or float) >>> >>> (d) numeric interval type: either defined as an interval, or >>> references >>> other parameter by key. validation by valid interval. >>> >>> (e) quantity: a measure and a unit. separately parsable. The >>> measure's >>> type could be any of the numeric types above, with same validation >>> rules. >>> >>> With a system like the above, would we still correlation validators? >>> Are >>> there still cases that we need to catch early (config loading) or >>> are the >>> remaining cases sufficiently rare and runtime or setup specific, >>> that it is >>> fine to handle them in component initialization? >>> >>> >>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>> <[hidden email]> >>> wrote: >>> >>>> Hi Stephan, >>>> >>>> Thank you for your opinion. >>>> >>>> Actually list/composite types are the topics we spent the most of the >>>> time. I understand that from a perspective of a full blown type >>>> system, >>>> a field like isList may look weird. Please let me elaborate a bit more >>>> on the reason behind it though. Maybe we weren't clear enough about it >>>> in the FLIP. The key feature of all the conifg options is that they >>>> must >>>> have a string representation as they might come from a configuration >>>> file. Moreover it must be a human readable format, so that the values >>>> might be manually adjusted. Having that in mind we did not want to >>>> add a >>>> support of an arbitrary nesting and we decided to allow for lists only >>>> (and flat objects - I think though in the current design there is a >>>> mistake around the Configurable interface). I think though you have a >>>> point here and it would be better to have a ListConfigOption >>>> instead of >>>> this field. Does it make sense to you? >>>> >>>> As for the second part of your message. I am not sure if I understood >>>> it. The validators work with parse/deserialized values from >>>> Configuration that means they can be bound to the generic parameter of >>>> Configuration. You can have a RangeValidator<? extends >>>> Comparable/Number>. I don't think the type hierarchy in the >>>> ConfigOption >>>> has anything to do with the validation logic. Could you elaborate a >>>> bit >>>> more what did you mean? >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>> I like the idea of enhancing the configuration and to do early >>>> validation. >>>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, >>>>> though. For >>>>> example, having a boolean "isList" is a clear indication of not >>>>> having >>>>> thought through the type/category system. >>>>> Also, having a more clear category system makes validation simpler. >>>>> >>>>> For example, I have seen systems distinguishing between numeric >>>> parameters >>>>> (valid ranges), category parameters (set of possible values), >>>>> quantities >>>>> like duration and memory size (need measure and unit), which >>>>> results in >>>> an >>>>> elegant system for validation. >>>>> >>>>> >>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <[hidden email] >>>> .invalid> >>>>> wrote: >>>>> >>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>> This allows the currently cluttered configuration of various >>>>>> modules to be unified. >>>>>> This is also first step of one of the keys to making new unified >>>>>> TableEnvironment available for production. >>>>>> >>>>>> Previously, we did encounter complex configurations, such as >>>>>> specifying the skewed values of column in DDL. The skew may >>>>>> be a single field or a combination of multiple fields. So the >>>>>> configuration is very troublesome. We used JSON string to >>>>>> configure it. >>>>>> >>>>>> Best, >>>>>> Jingsong Lee >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------ >>>>>> From:Jark Wu <[hidden email]> >>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>> To:dev <[hidden email]> >>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration >>>>>> >>>>>> Thanks for starting this design Timo and Dawid, >>>>>> >>>>>> Improving ConfigOption has been hovering in my mind for a long time. >>>>>> We have seen the benefit when developing blink configurations and >>>> connector >>>>>> properties in 1.9 release. >>>>>> Thanks for bringing it up and make such a detailed design. >>>>>> I will leave my thoughts and comments there. >>>>>> >>>>>> Cheers, >>>>>> Jark >>>>>> >>>>>> >>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> >>>>>> wrote: >>>>>> >>>>>>> Hi Timo, >>>>>>> >>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>> >>>>>>> Client API enhancement benefit from this evolution which >>>>>>> hopefully provides a better view of configuration of Flink. >>>>>>> In client API enhancement, we likely make the deployment >>>>>>> of cluster and submission of job totally defined by configuration. >>>>>>> >>>>>>> Will take a look at the document in days. >>>>>>> >>>>>>> Best, >>>>>>> tison. >>>>>>> >>>>>>> >>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 >>>>>>> 下午10:12写道: >>>>>>> >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> Dawid and I are working on making parts of ExecutionConfig and >>>>>>>> TableConfig configurable via config options. This is necessary >>>>>>>> to make >>>>>>>> all properties also available in SQL. Additionally, with the >>>>>>>> new SQL >>>>>> DDL >>>>>>>> based on properties as well as more connectors and formats >>>>>>>> coming up, >>>>>>>> unified configuration becomes more important. >>>>>>>> >>>>>>>> We need more features around string-based configuration in the >>>>>>>> future, >>>>>>>> which is why Dawid and I would like to propose FLIP-54 for >>>>>>>> evolving >>>> the >>>>>>>> ConfigOption and Configuration classes: >>>>>>>> >>>>>>>> >>>>>>>> >>>> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>>> >>>>>>>> In summary it adds: >>>>>>>> - documented types and validation >>>>>>>> - more common types such as memory size, duration, list >>>>>>>> - simple non-nested object types >>>>>>>> >>>>>>>> Looking forward to your feedback, >>>>>>>> Timo >>>>>>>> >>>>>>>> >>>> |
Hi Timo,
Thanks for the proposal. Sorry for the late comments, but I have a few questions / comments. 1. Is a new field of isList necessary in the ConfigOption? Would it be enough to just check the atomicClass to see if it is a List or not? Also, in the ConfigOption<Map> class case, are we always assume both key and value types are String? Can we just apply the same to the ConfigOption<List>? BTW, I did a quick search in the codebase but did not find any usage of ConfigOption<List>. 2. The same config name, but with two ConfigOption with different semantic in different component seems super confusing. For example, when users set both configs, they may have no idea one is overriding the other. There might be two cases: - If it is just the same config used by different components to act accordingly, it might be better to just have one config, but describe clearly on how that config will be used. - If it is actually two configurations that can be set differently, I think the config names should just be different. 3. Regarding the ConfigurableFactory, is the toConfiguration() method pretty much means getConfiguration()? The toConfiguration() method sounds like converting an object to a configuration, which only works if the object does not contain any state / value. I am also wondering if there is a real use case of this method. Because supposedly the configurations could just be passed around to caller of this method. Also, can you put the proposal into the FLIP wiki instead of in the Google doc before voting? The FLIP wiki allows track the modification history and has a more established structure to ensure nothing is missed. Thanks, Jiangjie (Becket) Qin On Tue, Aug 27, 2019 at 11:34 PM Timo Walther <[hidden email]> wrote: > Hi everyone, > > I updated the FLIP proposal one more time as mentioned in the voting > thread. If there are no objections, I will start a new voting thread > tomorrow at 9am Berlin time. > > Thanks, > Timo > > > On 22.08.19 14:19, Timo Walther wrote: > > Hi everyone, > > > > thanks for all the feedback we have received online and offline. It > > showed that many people support the idea of evolving the Flink > > configuration functionality. I'm almost sure that this FLIP will not > > solve all issues but at least will improve the current status. > > > > We've updated the document and replaced the Correlation part with the > > concept of a ConfigOptionGroup that can provide all available options > > of a group plus custom group validators for eager validation. For now, > > this eager group validation will only be used at certain locations in > > the Flink code but it prepares for maybe validating the entire global > > configuration before submitting a job in the future. > > > > Please take another look if you find time. I hope we can proceed with > > the voting process if there are no objections. > > > > Regards, > > Timo > > > > Am 19.08.19 um 12:54 schrieb Timo Walther: > >> Hi Stephan, > >> > >> thanks for your suggestions. Let me give you some background about > >> the decisions made in this FLIP: > >> > >> 1. Goal: The FLIP is labelled "evolve" not "rework" because we did > >> not want to change the entire configuration infrastructure. Both for > >> backwards-compatibility reasons and the amount of work that would be > >> required to update all options. If our goal is to rework the > >> configuration option entirely, I might suggest to switch to JSON > >> format with JSON schema and JSON validator. However, setting > >> properties in a CLI or web interface becomes more tricky the more > >> nested structures are allowed. > >> > >> 2. Class-based Options: The current ConfigOption<T> class is centered > >> around Java classes where T is determined by the default value. The > >> FLIP just makes this more explicit by offering an explicit > >> `intType()` method etc. The current design of validators centered > >> around Java classes makes it possible to have typical domain > >> validators baked by generics as you suggested. If we introduce types > >> such as "quantity with measure and unit" we still need to get a class > >> out of this option at the end, so why changing a proven concept? > >> > >> 3. List Options: The `isList` prevents having arbitrary nesting. As > >> Dawid mentioned, we kept human readability in mind. For every atomic > >> option like "key=12" can be represented by a list "keys=12;13". But > >> we don't want to go further; esp. no nesting. A dedicated list option > >> would start making this more complicated such as > >> "ListOption(ObjectOption(ListOption(IntOption, ...), > >> StringOption(...)))", do we want that? > >> > >> 4. Correlation: The correlation part is one of the suggestions that I > >> like least in the document. We can also discuss removing it entirely, > >> but I think it solves the use case of relating options with each > >> other in a flexible way right next to the actual option. Instead of > >> being hidden in some component initialization, we should put it close > >> to the option to also perform validation eagerly instead of failing > >> at runtime when the option is accessed the first time. > >> > >> Regards, > >> Timo > >> > >> > >> Am 18.08.19 um 23:32 schrieb Stephan Ewen: > >>> A "List Type" sounds like a good direction to me. > >>> > >>> The comment on the type system was a bit brief, I agree. The idea is > >>> to see > >>> if something like that can ease validation. Especially the correlation > >>> system seems quite complex (proxies to work around order of > >>> initialization). > >>> > >>> For example, let's assume we don't think primarily about "java > >>> types" but > >>> would define types as one of the following (just examples, haven't > >>> thought > >>> all the details through): > >>> > >>> (a) category type: implies string, and a fix set of possible values. > >>> Those would be passes and naturally make it into the docs and > >>> validation. > >>> Maps to a String or Enum in Java. > >>> > >>> (b) numeric integer type: implies long (or optionally integer, if > >>> we want > >>> to automatically check overflow / underflow). would take typical domain > >>> validators, like non-negative, etc. > >>> > >>> (c) numeric real type: same as above (double or float) > >>> > >>> (d) numeric interval type: either defined as an interval, or > >>> references > >>> other parameter by key. validation by valid interval. > >>> > >>> (e) quantity: a measure and a unit. separately parsable. The > >>> measure's > >>> type could be any of the numeric types above, with same validation > >>> rules. > >>> > >>> With a system like the above, would we still correlation validators? > >>> Are > >>> there still cases that we need to catch early (config loading) or > >>> are the > >>> remaining cases sufficiently rare and runtime or setup specific, > >>> that it is > >>> fine to handle them in component initialization? > >>> > >>> > >>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz > >>> <[hidden email]> > >>> wrote: > >>> > >>>> Hi Stephan, > >>>> > >>>> Thank you for your opinion. > >>>> > >>>> Actually list/composite types are the topics we spent the most of the > >>>> time. I understand that from a perspective of a full blown type > >>>> system, > >>>> a field like isList may look weird. Please let me elaborate a bit more > >>>> on the reason behind it though. Maybe we weren't clear enough about it > >>>> in the FLIP. The key feature of all the conifg options is that they > >>>> must > >>>> have a string representation as they might come from a configuration > >>>> file. Moreover it must be a human readable format, so that the values > >>>> might be manually adjusted. Having that in mind we did not want to > >>>> add a > >>>> support of an arbitrary nesting and we decided to allow for lists only > >>>> (and flat objects - I think though in the current design there is a > >>>> mistake around the Configurable interface). I think though you have a > >>>> point here and it would be better to have a ListConfigOption > >>>> instead of > >>>> this field. Does it make sense to you? > >>>> > >>>> As for the second part of your message. I am not sure if I understood > >>>> it. The validators work with parse/deserialized values from > >>>> Configuration that means they can be bound to the generic parameter of > >>>> Configuration. You can have a RangeValidator<? extends > >>>> Comparable/Number>. I don't think the type hierarchy in the > >>>> ConfigOption > >>>> has anything to do with the validation logic. Could you elaborate a > >>>> bit > >>>> more what did you mean? > >>>> > >>>> Best, > >>>> > >>>> Dawid > >>>> > >>>> On 18/08/2019 16:42, Stephan Ewen wrote: > >>>>> I like the idea of enhancing the configuration and to do early > >>>> validation. > >>>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, > >>>>> though. For > >>>>> example, having a boolean "isList" is a clear indication of not > >>>>> having > >>>>> thought through the type/category system. > >>>>> Also, having a more clear category system makes validation simpler. > >>>>> > >>>>> For example, I have seen systems distinguishing between numeric > >>>> parameters > >>>>> (valid ranges), category parameters (set of possible values), > >>>>> quantities > >>>>> like duration and memory size (need measure and unit), which > >>>>> results in > >>>> an > >>>>> elegant system for validation. > >>>>> > >>>>> > >>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <[hidden email] > >>>> .invalid> > >>>>> wrote: > >>>>> > >>>>>> +1 to this, thanks Timo and Dawid for the design. > >>>>>> This allows the currently cluttered configuration of various > >>>>>> modules to be unified. > >>>>>> This is also first step of one of the keys to making new unified > >>>>>> TableEnvironment available for production. > >>>>>> > >>>>>> Previously, we did encounter complex configurations, such as > >>>>>> specifying the skewed values of column in DDL. The skew may > >>>>>> be a single field or a combination of multiple fields. So the > >>>>>> configuration is very troublesome. We used JSON string to > >>>>>> configure it. > >>>>>> > >>>>>> Best, > >>>>>> Jingsong Lee > >>>>>> > >>>>>> > >>>>>> > >>>>>> ------------------------------------------------------------------ > >>>>>> From:Jark Wu <[hidden email]> > >>>>>> Send Time:2019年8月16日(星期五) 16:44 > >>>>>> To:dev <[hidden email]> > >>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration > >>>>>> > >>>>>> Thanks for starting this design Timo and Dawid, > >>>>>> > >>>>>> Improving ConfigOption has been hovering in my mind for a long time. > >>>>>> We have seen the benefit when developing blink configurations and > >>>> connector > >>>>>> properties in 1.9 release. > >>>>>> Thanks for bringing it up and make such a detailed design. > >>>>>> I will leave my thoughts and comments there. > >>>>>> > >>>>>> Cheers, > >>>>>> Jark > >>>>>> > >>>>>> > >>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi Timo, > >>>>>>> > >>>>>>> It looks interesting. Thanks for preparing this FLIP! > >>>>>>> > >>>>>>> Client API enhancement benefit from this evolution which > >>>>>>> hopefully provides a better view of configuration of Flink. > >>>>>>> In client API enhancement, we likely make the deployment > >>>>>>> of cluster and submission of job totally defined by configuration. > >>>>>>> > >>>>>>> Will take a look at the document in days. > >>>>>>> > >>>>>>> Best, > >>>>>>> tison. > >>>>>>> > >>>>>>> > >>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 > >>>>>>> 下午10:12写道: > >>>>>>> > >>>>>>>> Hi everyone, > >>>>>>>> > >>>>>>>> Dawid and I are working on making parts of ExecutionConfig and > >>>>>>>> TableConfig configurable via config options. This is necessary > >>>>>>>> to make > >>>>>>>> all properties also available in SQL. Additionally, with the > >>>>>>>> new SQL > >>>>>> DDL > >>>>>>>> based on properties as well as more connectors and formats > >>>>>>>> coming up, > >>>>>>>> unified configuration becomes more important. > >>>>>>>> > >>>>>>>> We need more features around string-based configuration in the > >>>>>>>> future, > >>>>>>>> which is why Dawid and I would like to propose FLIP-54 for > >>>>>>>> evolving > >>>> the > >>>>>>>> ConfigOption and Configuration classes: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>> > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > >>>> > >>>>>>>> In summary it adds: > >>>>>>>> - documented types and validation > >>>>>>>> - more common types such as memory size, duration, list > >>>>>>>> - simple non-nested object types > >>>>>>>> > >>>>>>>> Looking forward to your feedback, > >>>>>>>> Timo > >>>>>>>> > >>>>>>>> > >>>> > > |
@Becket One thing that may be non-obvious is that the Configuration class
also defines serialization / persistence logic at the moment. So it needs to know the set of types it supports. That stands in the way of an arbitrary generic map type. @Timo I agree though that it seems a bit inconsistent to have one collection orthogonal to the type (List) and another one bound to specific types (Map). On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email]> wrote: > Hi Timo, > > Thanks for the proposal. Sorry for the late comments, but I have a few > questions / comments. > > 1. Is a new field of isList necessary in the ConfigOption? > Would it be enough to just check the atomicClass to see if it is a List or > not? > Also, in the ConfigOption<Map> class case, are we always assume both key > and value types are String? Can we just apply the same to the > ConfigOption<List>? > BTW, I did a quick search in the codebase but did not find any usage of > ConfigOption<List>. > > 2. The same config name, but with two ConfigOption with different semantic > in different component seems super confusing. For example, when users set > both configs, they may have no idea one is overriding the other. There > might be two cases: > - If it is just the same config used by different components to act > accordingly, it might be better to just have one config, but describe > clearly on how that config will be used. > - If it is actually two configurations that can be set differently, I > think the config names should just be different. > > 3. Regarding the ConfigurableFactory, is the toConfiguration() method > pretty much means getConfiguration()? The toConfiguration() method sounds > like converting an object to a configuration, which only works if the > object does not contain any state / value. I am also wondering if there is > a real use case of this method. Because supposedly the configurations could > just be passed around to caller of this method. > > Also, can you put the proposal into the FLIP wiki instead of in the Google > doc before voting? The FLIP wiki allows track the modification history and > has a more established structure to ensure nothing is missed. > > Thanks, > > Jiangjie (Becket) Qin > > On Tue, Aug 27, 2019 at 11:34 PM Timo Walther <[hidden email]> wrote: > > > Hi everyone, > > > > I updated the FLIP proposal one more time as mentioned in the voting > > thread. If there are no objections, I will start a new voting thread > > tomorrow at 9am Berlin time. > > > > Thanks, > > Timo > > > > > > On 22.08.19 14:19, Timo Walther wrote: > > > Hi everyone, > > > > > > thanks for all the feedback we have received online and offline. It > > > showed that many people support the idea of evolving the Flink > > > configuration functionality. I'm almost sure that this FLIP will not > > > solve all issues but at least will improve the current status. > > > > > > We've updated the document and replaced the Correlation part with the > > > concept of a ConfigOptionGroup that can provide all available options > > > of a group plus custom group validators for eager validation. For now, > > > this eager group validation will only be used at certain locations in > > > the Flink code but it prepares for maybe validating the entire global > > > configuration before submitting a job in the future. > > > > > > Please take another look if you find time. I hope we can proceed with > > > the voting process if there are no objections. > > > > > > Regards, > > > Timo > > > > > > Am 19.08.19 um 12:54 schrieb Timo Walther: > > >> Hi Stephan, > > >> > > >> thanks for your suggestions. Let me give you some background about > > >> the decisions made in this FLIP: > > >> > > >> 1. Goal: The FLIP is labelled "evolve" not "rework" because we did > > >> not want to change the entire configuration infrastructure. Both for > > >> backwards-compatibility reasons and the amount of work that would be > > >> required to update all options. If our goal is to rework the > > >> configuration option entirely, I might suggest to switch to JSON > > >> format with JSON schema and JSON validator. However, setting > > >> properties in a CLI or web interface becomes more tricky the more > > >> nested structures are allowed. > > >> > > >> 2. Class-based Options: The current ConfigOption<T> class is centered > > >> around Java classes where T is determined by the default value. The > > >> FLIP just makes this more explicit by offering an explicit > > >> `intType()` method etc. The current design of validators centered > > >> around Java classes makes it possible to have typical domain > > >> validators baked by generics as you suggested. If we introduce types > > >> such as "quantity with measure and unit" we still need to get a class > > >> out of this option at the end, so why changing a proven concept? > > >> > > >> 3. List Options: The `isList` prevents having arbitrary nesting. As > > >> Dawid mentioned, we kept human readability in mind. For every atomic > > >> option like "key=12" can be represented by a list "keys=12;13". But > > >> we don't want to go further; esp. no nesting. A dedicated list option > > >> would start making this more complicated such as > > >> "ListOption(ObjectOption(ListOption(IntOption, ...), > > >> StringOption(...)))", do we want that? > > >> > > >> 4. Correlation: The correlation part is one of the suggestions that I > > >> like least in the document. We can also discuss removing it entirely, > > >> but I think it solves the use case of relating options with each > > >> other in a flexible way right next to the actual option. Instead of > > >> being hidden in some component initialization, we should put it close > > >> to the option to also perform validation eagerly instead of failing > > >> at runtime when the option is accessed the first time. > > >> > > >> Regards, > > >> Timo > > >> > > >> > > >> Am 18.08.19 um 23:32 schrieb Stephan Ewen: > > >>> A "List Type" sounds like a good direction to me. > > >>> > > >>> The comment on the type system was a bit brief, I agree. The idea is > > >>> to see > > >>> if something like that can ease validation. Especially the > correlation > > >>> system seems quite complex (proxies to work around order of > > >>> initialization). > > >>> > > >>> For example, let's assume we don't think primarily about "java > > >>> types" but > > >>> would define types as one of the following (just examples, haven't > > >>> thought > > >>> all the details through): > > >>> > > >>> (a) category type: implies string, and a fix set of possible > values. > > >>> Those would be passes and naturally make it into the docs and > > >>> validation. > > >>> Maps to a String or Enum in Java. > > >>> > > >>> (b) numeric integer type: implies long (or optionally integer, if > > >>> we want > > >>> to automatically check overflow / underflow). would take typical > domain > > >>> validators, like non-negative, etc. > > >>> > > >>> (c) numeric real type: same as above (double or float) > > >>> > > >>> (d) numeric interval type: either defined as an interval, or > > >>> references > > >>> other parameter by key. validation by valid interval. > > >>> > > >>> (e) quantity: a measure and a unit. separately parsable. The > > >>> measure's > > >>> type could be any of the numeric types above, with same validation > > >>> rules. > > >>> > > >>> With a system like the above, would we still correlation validators? > > >>> Are > > >>> there still cases that we need to catch early (config loading) or > > >>> are the > > >>> remaining cases sufficiently rare and runtime or setup specific, > > >>> that it is > > >>> fine to handle them in component initialization? > > >>> > > >>> > > >>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz > > >>> <[hidden email]> > > >>> wrote: > > >>> > > >>>> Hi Stephan, > > >>>> > > >>>> Thank you for your opinion. > > >>>> > > >>>> Actually list/composite types are the topics we spent the most of > the > > >>>> time. I understand that from a perspective of a full blown type > > >>>> system, > > >>>> a field like isList may look weird. Please let me elaborate a bit > more > > >>>> on the reason behind it though. Maybe we weren't clear enough about > it > > >>>> in the FLIP. The key feature of all the conifg options is that they > > >>>> must > > >>>> have a string representation as they might come from a configuration > > >>>> file. Moreover it must be a human readable format, so that the > values > > >>>> might be manually adjusted. Having that in mind we did not want to > > >>>> add a > > >>>> support of an arbitrary nesting and we decided to allow for lists > only > > >>>> (and flat objects - I think though in the current design there is a > > >>>> mistake around the Configurable interface). I think though you have > a > > >>>> point here and it would be better to have a ListConfigOption > > >>>> instead of > > >>>> this field. Does it make sense to you? > > >>>> > > >>>> As for the second part of your message. I am not sure if I > understood > > >>>> it. The validators work with parse/deserialized values from > > >>>> Configuration that means they can be bound to the generic parameter > of > > >>>> Configuration. You can have a RangeValidator<? extends > > >>>> Comparable/Number>. I don't think the type hierarchy in the > > >>>> ConfigOption > > >>>> has anything to do with the validation logic. Could you elaborate a > > >>>> bit > > >>>> more what did you mean? > > >>>> > > >>>> Best, > > >>>> > > >>>> Dawid > > >>>> > > >>>> On 18/08/2019 16:42, Stephan Ewen wrote: > > >>>>> I like the idea of enhancing the configuration and to do early > > >>>> validation. > > >>>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, > > >>>>> though. For > > >>>>> example, having a boolean "isList" is a clear indication of not > > >>>>> having > > >>>>> thought through the type/category system. > > >>>>> Also, having a more clear category system makes validation simpler. > > >>>>> > > >>>>> For example, I have seen systems distinguishing between numeric > > >>>> parameters > > >>>>> (valid ranges), category parameters (set of possible values), > > >>>>> quantities > > >>>>> like duration and memory size (need measure and unit), which > > >>>>> results in > > >>>> an > > >>>>> elegant system for validation. > > >>>>> > > >>>>> > > >>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < > [hidden email] > > >>>> .invalid> > > >>>>> wrote: > > >>>>> > > >>>>>> +1 to this, thanks Timo and Dawid for the design. > > >>>>>> This allows the currently cluttered configuration of various > > >>>>>> modules to be unified. > > >>>>>> This is also first step of one of the keys to making new unified > > >>>>>> TableEnvironment available for production. > > >>>>>> > > >>>>>> Previously, we did encounter complex configurations, such as > > >>>>>> specifying the skewed values of column in DDL. The skew may > > >>>>>> be a single field or a combination of multiple fields. So the > > >>>>>> configuration is very troublesome. We used JSON string to > > >>>>>> configure it. > > >>>>>> > > >>>>>> Best, > > >>>>>> Jingsong Lee > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> ------------------------------------------------------------------ > > >>>>>> From:Jark Wu <[hidden email]> > > >>>>>> Send Time:2019年8月16日(星期五) 16:44 > > >>>>>> To:dev <[hidden email]> > > >>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and > Configuration > > >>>>>> > > >>>>>> Thanks for starting this design Timo and Dawid, > > >>>>>> > > >>>>>> Improving ConfigOption has been hovering in my mind for a long > time. > > >>>>>> We have seen the benefit when developing blink configurations and > > >>>> connector > > >>>>>> properties in 1.9 release. > > >>>>>> Thanks for bringing it up and make such a detailed design. > > >>>>>> I will leave my thoughts and comments there. > > >>>>>> > > >>>>>> Cheers, > > >>>>>> Jark > > >>>>>> > > >>>>>> > > >>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Hi Timo, > > >>>>>>> > > >>>>>>> It looks interesting. Thanks for preparing this FLIP! > > >>>>>>> > > >>>>>>> Client API enhancement benefit from this evolution which > > >>>>>>> hopefully provides a better view of configuration of Flink. > > >>>>>>> In client API enhancement, we likely make the deployment > > >>>>>>> of cluster and submission of job totally defined by > configuration. > > >>>>>>> > > >>>>>>> Will take a look at the document in days. > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> tison. > > >>>>>>> > > >>>>>>> > > >>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 > > >>>>>>> 下午10:12写道: > > >>>>>>> > > >>>>>>>> Hi everyone, > > >>>>>>>> > > >>>>>>>> Dawid and I are working on making parts of ExecutionConfig and > > >>>>>>>> TableConfig configurable via config options. This is necessary > > >>>>>>>> to make > > >>>>>>>> all properties also available in SQL. Additionally, with the > > >>>>>>>> new SQL > > >>>>>> DDL > > >>>>>>>> based on properties as well as more connectors and formats > > >>>>>>>> coming up, > > >>>>>>>> unified configuration becomes more important. > > >>>>>>>> > > >>>>>>>> We need more features around string-based configuration in the > > >>>>>>>> future, > > >>>>>>>> which is why Dawid and I would like to propose FLIP-54 for > > >>>>>>>> evolving > > >>>> the > > >>>>>>>> ConfigOption and Configuration classes: > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>> > > > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > > >>>> > > >>>>>>>> In summary it adds: > > >>>>>>>> - documented types and validation > > >>>>>>>> - more common types such as memory size, duration, list > > >>>>>>>> - simple non-nested object types > > >>>>>>>> > > >>>>>>>> Looking forward to your feedback, > > >>>>>>>> Timo > > >>>>>>>> > > >>>>>>>> > > >>>> > > > > > |
Hi Becket,
let me try to clarify some of your questions: 1. For every option, we also needed to think about how to represent it in a human readable format. We do not want to allow arbitrary nesting because that would easily allow to bypass the flattened hierarchy of config options (`session.memory.min`). The current design allows to represent every option type as a list. E.g.: `myIntOption: 12` can be `myIntListOption: 12;12` `myObjectOption: field=12,other=true` can be `myObjectListOption: field=12,other=true; field=12,other=true` `myPropertyOption: key=str0,other=str1` can be `myPropertyListOption: key=str0,other=str1;key=str0,other=str1` We need the atomic class for serialization/deserialization both in binary and string format. ConfigOption<List> is not present in the code base yet, but this FLIP is a preparation of making ExecutionConfig configurable. If you look into this class or also in existing table connectors/formats, you will see that each proposed option type has its requirements. 2. Regarding extending the description of ConfigOptions, the semantic of one option should be a super set of the other option. E.g. in Table API we might use general ExecutionConfig properties. But we would like to a) make external options more prominent in the Table API config docs to link people to properties they should pay attention b) notice about side effects. The core semantic of a property should not change. 3. The factory will not receive the entire configuration but works in a separate key space. For `myObjectOption` above, it would receive a configuration that consists of `field: 12` and `other: true`. I agree. I will convert the document into a Wiki page today. Thanks, Timo On 29.08.19 09:00, Stephan Ewen wrote: > @Becket One thing that may be non-obvious is that the Configuration class > also defines serialization / persistence logic at the moment. So it needs > to know the set of types it supports. That stands in the way of an > arbitrary generic map type. > > @Timo I agree though that it seems a bit inconsistent to have one > collection orthogonal to the type (List) and another one bound to specific > types (Map). > > On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email]> wrote: > >> Hi Timo, >> >> Thanks for the proposal. Sorry for the late comments, but I have a few >> questions / comments. >> >> 1. Is a new field of isList necessary in the ConfigOption? >> Would it be enough to just check the atomicClass to see if it is a List or >> not? >> Also, in the ConfigOption<Map> class case, are we always assume both key >> and value types are String? Can we just apply the same to the >> ConfigOption<List>? >> BTW, I did a quick search in the codebase but did not find any usage of >> ConfigOption<List>. >> >> 2. The same config name, but with two ConfigOption with different semantic >> in different component seems super confusing. For example, when users set >> both configs, they may have no idea one is overriding the other. There >> might be two cases: >> - If it is just the same config used by different components to act >> accordingly, it might be better to just have one config, but describe >> clearly on how that config will be used. >> - If it is actually two configurations that can be set differently, I >> think the config names should just be different. >> >> 3. Regarding the ConfigurableFactory, is the toConfiguration() method >> pretty much means getConfiguration()? The toConfiguration() method sounds >> like converting an object to a configuration, which only works if the >> object does not contain any state / value. I am also wondering if there is >> a real use case of this method. Because supposedly the configurations could >> just be passed around to caller of this method. >> >> Also, can you put the proposal into the FLIP wiki instead of in the Google >> doc before voting? The FLIP wiki allows track the modification history and >> has a more established structure to ensure nothing is missed. >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther <[hidden email]> wrote: >> >>> Hi everyone, >>> >>> I updated the FLIP proposal one more time as mentioned in the voting >>> thread. If there are no objections, I will start a new voting thread >>> tomorrow at 9am Berlin time. >>> >>> Thanks, >>> Timo >>> >>> >>> On 22.08.19 14:19, Timo Walther wrote: >>>> Hi everyone, >>>> >>>> thanks for all the feedback we have received online and offline. It >>>> showed that many people support the idea of evolving the Flink >>>> configuration functionality. I'm almost sure that this FLIP will not >>>> solve all issues but at least will improve the current status. >>>> >>>> We've updated the document and replaced the Correlation part with the >>>> concept of a ConfigOptionGroup that can provide all available options >>>> of a group plus custom group validators for eager validation. For now, >>>> this eager group validation will only be used at certain locations in >>>> the Flink code but it prepares for maybe validating the entire global >>>> configuration before submitting a job in the future. >>>> >>>> Please take another look if you find time. I hope we can proceed with >>>> the voting process if there are no objections. >>>> >>>> Regards, >>>> Timo >>>> >>>> Am 19.08.19 um 12:54 schrieb Timo Walther: >>>>> Hi Stephan, >>>>> >>>>> thanks for your suggestions. Let me give you some background about >>>>> the decisions made in this FLIP: >>>>> >>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" because we did >>>>> not want to change the entire configuration infrastructure. Both for >>>>> backwards-compatibility reasons and the amount of work that would be >>>>> required to update all options. If our goal is to rework the >>>>> configuration option entirely, I might suggest to switch to JSON >>>>> format with JSON schema and JSON validator. However, setting >>>>> properties in a CLI or web interface becomes more tricky the more >>>>> nested structures are allowed. >>>>> >>>>> 2. Class-based Options: The current ConfigOption<T> class is centered >>>>> around Java classes where T is determined by the default value. The >>>>> FLIP just makes this more explicit by offering an explicit >>>>> `intType()` method etc. The current design of validators centered >>>>> around Java classes makes it possible to have typical domain >>>>> validators baked by generics as you suggested. If we introduce types >>>>> such as "quantity with measure and unit" we still need to get a class >>>>> out of this option at the end, so why changing a proven concept? >>>>> >>>>> 3. List Options: The `isList` prevents having arbitrary nesting. As >>>>> Dawid mentioned, we kept human readability in mind. For every atomic >>>>> option like "key=12" can be represented by a list "keys=12;13". But >>>>> we don't want to go further; esp. no nesting. A dedicated list option >>>>> would start making this more complicated such as >>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), >>>>> StringOption(...)))", do we want that? >>>>> >>>>> 4. Correlation: The correlation part is one of the suggestions that I >>>>> like least in the document. We can also discuss removing it entirely, >>>>> but I think it solves the use case of relating options with each >>>>> other in a flexible way right next to the actual option. Instead of >>>>> being hidden in some component initialization, we should put it close >>>>> to the option to also perform validation eagerly instead of failing >>>>> at runtime when the option is accessed the first time. >>>>> >>>>> Regards, >>>>> Timo >>>>> >>>>> >>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>>>>> A "List Type" sounds like a good direction to me. >>>>>> >>>>>> The comment on the type system was a bit brief, I agree. The idea is >>>>>> to see >>>>>> if something like that can ease validation. Especially the >> correlation >>>>>> system seems quite complex (proxies to work around order of >>>>>> initialization). >>>>>> >>>>>> For example, let's assume we don't think primarily about "java >>>>>> types" but >>>>>> would define types as one of the following (just examples, haven't >>>>>> thought >>>>>> all the details through): >>>>>> >>>>>> (a) category type: implies string, and a fix set of possible >> values. >>>>>> Those would be passes and naturally make it into the docs and >>>>>> validation. >>>>>> Maps to a String or Enum in Java. >>>>>> >>>>>> (b) numeric integer type: implies long (or optionally integer, if >>>>>> we want >>>>>> to automatically check overflow / underflow). would take typical >> domain >>>>>> validators, like non-negative, etc. >>>>>> >>>>>> (c) numeric real type: same as above (double or float) >>>>>> >>>>>> (d) numeric interval type: either defined as an interval, or >>>>>> references >>>>>> other parameter by key. validation by valid interval. >>>>>> >>>>>> (e) quantity: a measure and a unit. separately parsable. The >>>>>> measure's >>>>>> type could be any of the numeric types above, with same validation >>>>>> rules. >>>>>> >>>>>> With a system like the above, would we still correlation validators? >>>>>> Are >>>>>> there still cases that we need to catch early (config loading) or >>>>>> are the >>>>>> remaining cases sufficiently rare and runtime or setup specific, >>>>>> that it is >>>>>> fine to handle them in component initialization? >>>>>> >>>>>> >>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>>>>> <[hidden email]> >>>>>> wrote: >>>>>> >>>>>>> Hi Stephan, >>>>>>> >>>>>>> Thank you for your opinion. >>>>>>> >>>>>>> Actually list/composite types are the topics we spent the most of >> the >>>>>>> time. I understand that from a perspective of a full blown type >>>>>>> system, >>>>>>> a field like isList may look weird. Please let me elaborate a bit >> more >>>>>>> on the reason behind it though. Maybe we weren't clear enough about >> it >>>>>>> in the FLIP. The key feature of all the conifg options is that they >>>>>>> must >>>>>>> have a string representation as they might come from a configuration >>>>>>> file. Moreover it must be a human readable format, so that the >> values >>>>>>> might be manually adjusted. Having that in mind we did not want to >>>>>>> add a >>>>>>> support of an arbitrary nesting and we decided to allow for lists >> only >>>>>>> (and flat objects - I think though in the current design there is a >>>>>>> mistake around the Configurable interface). I think though you have >> a >>>>>>> point here and it would be better to have a ListConfigOption >>>>>>> instead of >>>>>>> this field. Does it make sense to you? >>>>>>> >>>>>>> As for the second part of your message. I am not sure if I >> understood >>>>>>> it. The validators work with parse/deserialized values from >>>>>>> Configuration that means they can be bound to the generic parameter >> of >>>>>>> Configuration. You can have a RangeValidator<? extends >>>>>>> Comparable/Number>. I don't think the type hierarchy in the >>>>>>> ConfigOption >>>>>>> has anything to do with the validation logic. Could you elaborate a >>>>>>> bit >>>>>>> more what did you mean? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Dawid >>>>>>> >>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>>>>> I like the idea of enhancing the configuration and to do early >>>>>>> validation. >>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, >>>>>>>> though. For >>>>>>>> example, having a boolean "isList" is a clear indication of not >>>>>>>> having >>>>>>>> thought through the type/category system. >>>>>>>> Also, having a more clear category system makes validation simpler. >>>>>>>> >>>>>>>> For example, I have seen systems distinguishing between numeric >>>>>>> parameters >>>>>>>> (valid ranges), category parameters (set of possible values), >>>>>>>> quantities >>>>>>>> like duration and memory size (need measure and unit), which >>>>>>>> results in >>>>>>> an >>>>>>>> elegant system for validation. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < >> [hidden email] >>>>>>> .invalid> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>>>>> This allows the currently cluttered configuration of various >>>>>>>>> modules to be unified. >>>>>>>>> This is also first step of one of the keys to making new unified >>>>>>>>> TableEnvironment available for production. >>>>>>>>> >>>>>>>>> Previously, we did encounter complex configurations, such as >>>>>>>>> specifying the skewed values of column in DDL. The skew may >>>>>>>>> be a single field or a combination of multiple fields. So the >>>>>>>>> configuration is very troublesome. We used JSON string to >>>>>>>>> configure it. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Jingsong Lee >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------ >>>>>>>>> From:Jark Wu <[hidden email]> >>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>>>>> To:dev <[hidden email]> >>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and >> Configuration >>>>>>>>> Thanks for starting this design Timo and Dawid, >>>>>>>>> >>>>>>>>> Improving ConfigOption has been hovering in my mind for a long >> time. >>>>>>>>> We have seen the benefit when developing blink configurations and >>>>>>> connector >>>>>>>>> properties in 1.9 release. >>>>>>>>> Thanks for bringing it up and make such a detailed design. >>>>>>>>> I will leave my thoughts and comments there. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Jark >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Timo, >>>>>>>>>> >>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>>>>> >>>>>>>>>> Client API enhancement benefit from this evolution which >>>>>>>>>> hopefully provides a better view of configuration of Flink. >>>>>>>>>> In client API enhancement, we likely make the deployment >>>>>>>>>> of cluster and submission of job totally defined by >> configuration. >>>>>>>>>> Will take a look at the document in days. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> tison. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 >>>>>>>>>> 下午10:12写道: >>>>>>>>>> >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> Dawid and I are working on making parts of ExecutionConfig and >>>>>>>>>>> TableConfig configurable via config options. This is necessary >>>>>>>>>>> to make >>>>>>>>>>> all properties also available in SQL. Additionally, with the >>>>>>>>>>> new SQL >>>>>>>>> DDL >>>>>>>>>>> based on properties as well as more connectors and formats >>>>>>>>>>> coming up, >>>>>>>>>>> unified configuration becomes more important. >>>>>>>>>>> >>>>>>>>>>> We need more features around string-based configuration in the >>>>>>>>>>> future, >>>>>>>>>>> which is why Dawid and I would like to propose FLIP-54 for >>>>>>>>>>> evolving >>>>>>> the >>>>>>>>>>> ConfigOption and Configuration classes: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>>>>>>>>>> In summary it adds: >>>>>>>>>>> - documented types and validation >>>>>>>>>>> - more common types such as memory size, duration, list >>>>>>>>>>> - simple non-nested object types >>>>>>>>>>> >>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> >>> |
Hi Timo and Stephan,
Thanks for the detail explanation. 1. I agree that each config should be in a human readable format. My concern is that the current List<Configurable> looks going a little too far from what the configuration is supposed to do. They are essentially creating some Configurable objects instead of defining the config to create those Configurable. This mixes ConfigOption and the usage of it. API wise it would be good to keep the configs and their usages (such as how to create objects using the ConfigOption) apart from each other. I am wondering if we can just make List also only take string. For example, is the following definition of map and list sufficient? A MapConfigOption is ConfigOption<Map<String, String>>. It can be defined as: map_config_name: k1=v1, k2=v2, k3=v3, ... A ListConfigOption is ConfigOption<List<String>>. It can be defined as: list_config_name: v1, v2, v3, ... A ListOfMapConfigOption is ConfigOption<List<Map<String, String>>. It can be defined as: list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... All the key and values in the configuration are String. This also guarantees that the configuration is always serializable. If we want to do one more step, we can allow the ConfigOption to set all the primitive types and parse that for the users. So something like List<Integer>, List<Class<?>> seems fine. The configuration class could also have util methods to create a list of configurable such as: <T> List<T> <Configuration#getConfigurableInstances(ListMapConfigOption, Class<T> clazz). But the configuration class will not take arbitrary Configurable as the value of its config. 2. I might have misunderstood this. But my concern on the description extension is in the following example. public static final ConfigOption<Integer> MAX_PARALLELISM = CoreOptions.MAX_PARALLELISM.withExtendedDescription( "Note: That this property means that a table program has a side-effect XYZ."); In this case, we will have two MAX_PARALLELISM configs now. One is CoreOptions.MAX_PARALLELISM. The other one is defined here. I suppose users will see both configurations. One with an extended description and one without. Let's say there is a third component which also users MAX_PARALLELISM, will there be yet another MAX_PARALLELISM ConfigOption? If so, what would that ConfigOption's description look like? Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM and the description should clearly state all the usage of this ConfigOption. 3. I see, in that case, how about we name it something like extractConfiguration()? I am just trying to see if we can make it clear this is not something like fromBytes() and toBytes(). Thanks, Jiangjie (Becket) Qin On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]> wrote: > Hi Becket, > > let me try to clarify some of your questions: > > 1. For every option, we also needed to think about how to represent it > in a human readable format. We do not want to allow arbitrary nesting > because that would easily allow to bypass the flattened hierarchy of > config options (`session.memory.min`). The current design allows to > represent every option type as a list. E.g.: > > `myIntOption: 12` can be `myIntListOption: 12;12` > `myObjectOption: field=12,other=true` can be `myObjectListOption: > field=12,other=true; field=12,other=true` > `myPropertyOption: key=str0,other=str1` can be `myPropertyListOption: > key=str0,other=str1;key=str0,other=str1` > > We need the atomic class for serialization/deserialization both in > binary and string format. > > ConfigOption<List> is not present in the code base yet, but this FLIP is > a preparation of making ExecutionConfig configurable. If you look into > this class or also in existing table connectors/formats, you will see > that each proposed option type has its requirements. > > 2. Regarding extending the description of ConfigOptions, the semantic of > one option should be a super set of the other option. E.g. in Table API > we might use general ExecutionConfig properties. But we would like to a) > make external options more prominent in the Table API config docs to > link people to properties they should pay attention b) notice about side > effects. The core semantic of a property should not change. > > 3. The factory will not receive the entire configuration but works in a > separate key space. For `myObjectOption` above, it would receive a > configuration that consists of `field: 12` and `other: true`. > > I agree. I will convert the document into a Wiki page today. > > Thanks, > Timo > > On 29.08.19 09:00, Stephan Ewen wrote: > > @Becket One thing that may be non-obvious is that the Configuration class > > also defines serialization / persistence logic at the moment. So it needs > > to know the set of types it supports. That stands in the way of an > > arbitrary generic map type. > > > > @Timo I agree though that it seems a bit inconsistent to have one > > collection orthogonal to the type (List) and another one bound to > specific > > types (Map). > > > > On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email]> wrote: > > > >> Hi Timo, > >> > >> Thanks for the proposal. Sorry for the late comments, but I have a few > >> questions / comments. > >> > >> 1. Is a new field of isList necessary in the ConfigOption? > >> Would it be enough to just check the atomicClass to see if it is a List > or > >> not? > >> Also, in the ConfigOption<Map> class case, are we always assume both key > >> and value types are String? Can we just apply the same to the > >> ConfigOption<List>? > >> BTW, I did a quick search in the codebase but did not find any usage of > >> ConfigOption<List>. > >> > >> 2. The same config name, but with two ConfigOption with different > semantic > >> in different component seems super confusing. For example, when users > set > >> both configs, they may have no idea one is overriding the other. There > >> might be two cases: > >> - If it is just the same config used by different components to act > >> accordingly, it might be better to just have one config, but describe > >> clearly on how that config will be used. > >> - If it is actually two configurations that can be set differently, I > >> think the config names should just be different. > >> > >> 3. Regarding the ConfigurableFactory, is the toConfiguration() method > >> pretty much means getConfiguration()? The toConfiguration() method > sounds > >> like converting an object to a configuration, which only works if the > >> object does not contain any state / value. I am also wondering if there > is > >> a real use case of this method. Because supposedly the configurations > could > >> just be passed around to caller of this method. > >> > >> Also, can you put the proposal into the FLIP wiki instead of in the > >> doc before voting? The FLIP wiki allows track the modification history > and > >> has a more established structure to ensure nothing is missed. > >> > >> Thanks, > >> > >> Jiangjie (Becket) Qin > >> > >> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther <[hidden email]> > wrote: > >> > >>> Hi everyone, > >>> > >>> I updated the FLIP proposal one more time as mentioned in the voting > >>> thread. If there are no objections, I will start a new voting thread > >>> tomorrow at 9am Berlin time. > >>> > >>> Thanks, > >>> Timo > >>> > >>> > >>> On 22.08.19 14:19, Timo Walther wrote: > >>>> Hi everyone, > >>>> > >>>> thanks for all the feedback we have received online and offline. It > >>>> showed that many people support the idea of evolving the Flink > >>>> configuration functionality. I'm almost sure that this FLIP will not > >>>> solve all issues but at least will improve the current status. > >>>> > >>>> We've updated the document and replaced the Correlation part with the > >>>> concept of a ConfigOptionGroup that can provide all available options > >>>> of a group plus custom group validators for eager validation. For now, > >>>> this eager group validation will only be used at certain locations in > >>>> the Flink code but it prepares for maybe validating the entire global > >>>> configuration before submitting a job in the future. > >>>> > >>>> Please take another look if you find time. I hope we can proceed with > >>>> the voting process if there are no objections. > >>>> > >>>> Regards, > >>>> Timo > >>>> > >>>> Am 19.08.19 um 12:54 schrieb Timo Walther: > >>>>> Hi Stephan, > >>>>> > >>>>> thanks for your suggestions. Let me give you some background about > >>>>> the decisions made in this FLIP: > >>>>> > >>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" because we did > >>>>> not want to change the entire configuration infrastructure. Both for > >>>>> backwards-compatibility reasons and the amount of work that would be > >>>>> required to update all options. If our goal is to rework the > >>>>> configuration option entirely, I might suggest to switch to JSON > >>>>> format with JSON schema and JSON validator. However, setting > >>>>> properties in a CLI or web interface becomes more tricky the more > >>>>> nested structures are allowed. > >>>>> > >>>>> 2. Class-based Options: The current ConfigOption<T> class is centered > >>>>> around Java classes where T is determined by the default value. The > >>>>> FLIP just makes this more explicit by offering an explicit > >>>>> `intType()` method etc. The current design of validators centered > >>>>> around Java classes makes it possible to have typical domain > >>>>> validators baked by generics as you suggested. If we introduce types > >>>>> such as "quantity with measure and unit" we still need to get a class > >>>>> out of this option at the end, so why changing a proven concept? > >>>>> > >>>>> 3. List Options: The `isList` prevents having arbitrary nesting. As > >>>>> Dawid mentioned, we kept human readability in mind. For every atomic > >>>>> option like "key=12" can be represented by a list "keys=12;13". But > >>>>> we don't want to go further; esp. no nesting. A dedicated list option > >>>>> would start making this more complicated such as > >>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), > >>>>> StringOption(...)))", do we want that? > >>>>> > >>>>> 4. Correlation: The correlation part is one of the suggestions that I > >>>>> like least in the document. We can also discuss removing it entirely, > >>>>> but I think it solves the use case of relating options with each > >>>>> other in a flexible way right next to the actual option. Instead of > >>>>> being hidden in some component initialization, we should put it close > >>>>> to the option to also perform validation eagerly instead of failing > >>>>> at runtime when the option is accessed the first time. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> > >>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: > >>>>>> A "List Type" sounds like a good direction to me. > >>>>>> > >>>>>> The comment on the type system was a bit brief, I agree. The idea is > >>>>>> to see > >>>>>> if something like that can ease validation. Especially the > >> correlation > >>>>>> system seems quite complex (proxies to work around order of > >>>>>> initialization). > >>>>>> > >>>>>> For example, let's assume we don't think primarily about "java > >>>>>> types" but > >>>>>> would define types as one of the following (just examples, haven't > >>>>>> thought > >>>>>> all the details through): > >>>>>> > >>>>>> (a) category type: implies string, and a fix set of possible > >> values. > >>>>>> Those would be passes and naturally make it into the docs and > >>>>>> validation. > >>>>>> Maps to a String or Enum in Java. > >>>>>> > >>>>>> (b) numeric integer type: implies long (or optionally integer, > if > >>>>>> we want > >>>>>> to automatically check overflow / underflow). would take typical > >> domain > >>>>>> validators, like non-negative, etc. > >>>>>> > >>>>>> (c) numeric real type: same as above (double or float) > >>>>>> > >>>>>> (d) numeric interval type: either defined as an interval, or > >>>>>> references > >>>>>> other parameter by key. validation by valid interval. > >>>>>> > >>>>>> (e) quantity: a measure and a unit. separately parsable. The > >>>>>> measure's > >>>>>> type could be any of the numeric types above, with same validation > >>>>>> rules. > >>>>>> > >>>>>> With a system like the above, would we still correlation validators? > >>>>>> Are > >>>>>> there still cases that we need to catch early (config loading) or > >>>>>> are the > >>>>>> remaining cases sufficiently rare and runtime or setup specific, > >>>>>> that it is > >>>>>> fine to handle them in component initialization? > >>>>>> > >>>>>> > >>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz > >>>>>> <[hidden email]> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi Stephan, > >>>>>>> > >>>>>>> Thank you for your opinion. > >>>>>>> > >>>>>>> Actually list/composite types are the topics we spent the most of > >> the > >>>>>>> time. I understand that from a perspective of a full blown type > >>>>>>> system, > >>>>>>> a field like isList may look weird. Please let me elaborate a bit > >> more > >>>>>>> on the reason behind it though. Maybe we weren't clear enough about > >> it > >>>>>>> in the FLIP. The key feature of all the conifg options is that they > >>>>>>> must > >>>>>>> have a string representation as they might come from a > configuration > >>>>>>> file. Moreover it must be a human readable format, so that the > >> values > >>>>>>> might be manually adjusted. Having that in mind we did not want to > >>>>>>> add a > >>>>>>> support of an arbitrary nesting and we decided to allow for lists > >> only > >>>>>>> (and flat objects - I think though in the current design there is a > >>>>>>> mistake around the Configurable interface). I think though you have > >> a > >>>>>>> point here and it would be better to have a ListConfigOption > >>>>>>> instead of > >>>>>>> this field. Does it make sense to you? > >>>>>>> > >>>>>>> As for the second part of your message. I am not sure if I > >> understood > >>>>>>> it. The validators work with parse/deserialized values from > >>>>>>> Configuration that means they can be bound to the generic parameter > >> of > >>>>>>> Configuration. You can have a RangeValidator<? extends > >>>>>>> Comparable/Number>. I don't think the type hierarchy in the > >>>>>>> ConfigOption > >>>>>>> has anything to do with the validation logic. Could you elaborate a > >>>>>>> bit > >>>>>>> more what did you mean? > >>>>>>> > >>>>>>> Best, > >>>>>>> > >>>>>>> Dawid > >>>>>>> > >>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: > >>>>>>>> I like the idea of enhancing the configuration and to do early > >>>>>>> validation. > >>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, > >>>>>>>> though. For > >>>>>>>> example, having a boolean "isList" is a clear indication of not > >>>>>>>> having > >>>>>>>> thought through the type/category system. > >>>>>>>> Also, having a more clear category system makes validation > simpler. > >>>>>>>> > >>>>>>>> For example, I have seen systems distinguishing between numeric > >>>>>>> parameters > >>>>>>>> (valid ranges), category parameters (set of possible values), > >>>>>>>> quantities > >>>>>>>> like duration and memory size (need measure and unit), which > >>>>>>>> results in > >>>>>>> an > >>>>>>>> elegant system for validation. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < > >> [hidden email] > >>>>>>> .invalid> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> +1 to this, thanks Timo and Dawid for the design. > >>>>>>>>> This allows the currently cluttered configuration of various > >>>>>>>>> modules to be unified. > >>>>>>>>> This is also first step of one of the keys to making new unified > >>>>>>>>> TableEnvironment available for production. > >>>>>>>>> > >>>>>>>>> Previously, we did encounter complex configurations, such as > >>>>>>>>> specifying the skewed values of column in DDL. The skew may > >>>>>>>>> be a single field or a combination of multiple fields. So the > >>>>>>>>> configuration is very troublesome. We used JSON string to > >>>>>>>>> configure it. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Jingsong Lee > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > ------------------------------------------------------------------ > >>>>>>>>> From:Jark Wu <[hidden email]> > >>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 > >>>>>>>>> To:dev <[hidden email]> > >>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and > >> Configuration > >>>>>>>>> Thanks for starting this design Timo and Dawid, > >>>>>>>>> > >>>>>>>>> Improving ConfigOption has been hovering in my mind for a long > >> time. > >>>>>>>>> We have seen the benefit when developing blink configurations and > >>>>>>> connector > >>>>>>>>> properties in 1.9 release. > >>>>>>>>> Thanks for bringing it up and make such a detailed design. > >>>>>>>>> I will leave my thoughts and comments there. > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> Jark > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi Timo, > >>>>>>>>>> > >>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! > >>>>>>>>>> > >>>>>>>>>> Client API enhancement benefit from this evolution which > >>>>>>>>>> hopefully provides a better view of configuration of Flink. > >>>>>>>>>> In client API enhancement, we likely make the deployment > >>>>>>>>>> of cluster and submission of job totally defined by > >> configuration. > >>>>>>>>>> Will take a look at the document in days. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> tison. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 > >>>>>>>>>> 下午10:12写道: > >>>>>>>>>> > >>>>>>>>>>> Hi everyone, > >>>>>>>>>>> > >>>>>>>>>>> Dawid and I are working on making parts of ExecutionConfig and > >>>>>>>>>>> TableConfig configurable via config options. This is necessary > >>>>>>>>>>> to make > >>>>>>>>>>> all properties also available in SQL. Additionally, with the > >>>>>>>>>>> new SQL > >>>>>>>>> DDL > >>>>>>>>>>> based on properties as well as more connectors and formats > >>>>>>>>>>> coming up, > >>>>>>>>>>> unified configuration becomes more important. > >>>>>>>>>>> > >>>>>>>>>>> We need more features around string-based configuration in the > >>>>>>>>>>> future, > >>>>>>>>>>> which is why Dawid and I would like to propose FLIP-54 for > >>>>>>>>>>> evolving > >>>>>>> the > >>>>>>>>>>> ConfigOption and Configuration classes: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >> > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > >>>>>>>>>>> In summary it adds: > >>>>>>>>>>> - documented types and validation > >>>>>>>>>>> - more common types such as memory size, duration, list > >>>>>>>>>>> - simple non-nested object types > >>>>>>>>>>> > >>>>>>>>>>> Looking forward to your feedback, > >>>>>>>>>>> Timo > >>>>>>>>>>> > >>>>>>>>>>> > >>> > > |
Hi Becket,
thanks for the discussion. 1. ConfigOptions in their current design are bound to classes. Regarding, the option is "creating some Configurable objects instead of defining the config to create those Configurable"? We just moved this logic to a factory, this factory can then also be used for other purposes. However, how the option and objects are serialized to Configuration is still not part of the option. The option is just pure declaration. If we would allow only List<String>, implementers would need to start implementing own parsing and validation logic all the time. We would like to avoid that. Please also keep in mind that Configuration must not consist of only strings, it manages a Map<String, Object> for efficient access. Every map entry can have a string representation for persistence, but in most cases consists of unserialized objects. 2. MAX_PARALLELISM is still defined just once. We don't overwrite keys, types or default values. But different layers might want to add helpful information. In our concrete use case for FLIP-59, ExecutionConfig has 50 properties and many of them are not relevant for the Table layer or have no effect at all. We would like to list and mention the most important config options again in the Table Configuration section, so that users are not confused, but with a strong link to the core option. E.g.: registered kryo serializers are also important also for Table users, we would like to add the comment "This option allows to modify the serialization of the ANY SQL data type.". I think we should not spam the core configuration page with comments from all layers, connectors, or libraries but keep this in the corresponding component documentation. 3. But it is something like fromBytes() and toBytes()? It serializes and deserializes T from a configuration? Regards, Timo On 29.08.19 19:14, Becket Qin wrote: > Hi Timo and Stephan, > > Thanks for the detail explanation. > > 1. I agree that each config should be in a human readable format. My > concern is that the current List<Configurable> looks going a little too far > from what the configuration is supposed to do. They are essentially > creating some Configurable objects instead of defining the config to create > those Configurable. This mixes ConfigOption and the usage of it. API wise > it would be good to keep the configs and their usages (such as how to > create objects using the ConfigOption) apart from each other. > I am wondering if we can just make List also only take string. For example, > is the following definition of map and list sufficient? > > A MapConfigOption is ConfigOption<Map<String, String>>. It can be defined > as: > map_config_name: k1=v1, k2=v2, k3=v3, ... > > A ListConfigOption is ConfigOption<List<String>>. It can be defined as: > list_config_name: v1, v2, v3, ... > > A ListOfMapConfigOption is ConfigOption<List<Map<String, String>>. It can > be defined as: > list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... > > All the key and values in the configuration are String. This also > guarantees that the configuration is always serializable. > If we want to do one more step, we can allow the ConfigOption to set all > the primitive types and parse that for the users. So something like > List<Integer>, List<Class<?>> seems fine. > > The configuration class could also have util methods to create a list of > configurable such as: > <T> List<T> <Configuration#getConfigurableInstances(ListMapConfigOption, > Class<T> clazz). > But the configuration class will not take arbitrary Configurable as the > value of its config. > > 2. I might have misunderstood this. But my concern on the description > extension is in the following example. > > public static final ConfigOption<Integer> MAX_PARALLELISM = > > CoreOptions.MAX_PARALLELISM.withExtendedDescription( > "Note: That this property means that a table program has a side-effect > XYZ."); > > In this case, we will have two MAX_PARALLELISM configs now. One is > CoreOptions.MAX_PARALLELISM. The other one is defined here. I suppose users > will see both configurations. One with an extended description and one > without. Let's say there is a third component which also users > MAX_PARALLELISM, will there be yet another MAX_PARALLELISM ConfigOption? If > so, what would that ConfigOption's description look like? > > Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM and the > description should clearly state all the usage of this ConfigOption. > > 3. I see, in that case, how about we name it something like > extractConfiguration()? I am just trying to see if we can make it clear > this is not something like fromBytes() and toBytes(). > > Thanks, > > Jiangjie (Becket) Qin > > On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]> wrote: > >> Hi Becket, >> >> let me try to clarify some of your questions: >> >> 1. For every option, we also needed to think about how to represent it >> in a human readable format. We do not want to allow arbitrary nesting >> because that would easily allow to bypass the flattened hierarchy of >> config options (`session.memory.min`). The current design allows to >> represent every option type as a list. E.g.: >> >> `myIntOption: 12` can be `myIntListOption: 12;12` >> `myObjectOption: field=12,other=true` can be `myObjectListOption: >> field=12,other=true; field=12,other=true` >> `myPropertyOption: key=str0,other=str1` can be `myPropertyListOption: >> key=str0,other=str1;key=str0,other=str1` >> >> We need the atomic class for serialization/deserialization both in >> binary and string format. >> >> ConfigOption<List> is not present in the code base yet, but this FLIP is >> a preparation of making ExecutionConfig configurable. If you look into >> this class or also in existing table connectors/formats, you will see >> that each proposed option type has its requirements. >> >> 2. Regarding extending the description of ConfigOptions, the semantic of >> one option should be a super set of the other option. E.g. in Table API >> we might use general ExecutionConfig properties. But we would like to a) >> make external options more prominent in the Table API config docs to >> link people to properties they should pay attention b) notice about side >> effects. The core semantic of a property should not change. >> >> 3. The factory will not receive the entire configuration but works in a >> separate key space. For `myObjectOption` above, it would receive a >> configuration that consists of `field: 12` and `other: true`. >> >> I agree. I will convert the document into a Wiki page today. >> >> Thanks, >> Timo >> >> On 29.08.19 09:00, Stephan Ewen wrote: >>> @Becket One thing that may be non-obvious is that the Configuration class >>> also defines serialization / persistence logic at the moment. So it needs >>> to know the set of types it supports. That stands in the way of an >>> arbitrary generic map type. >>> >>> @Timo I agree though that it seems a bit inconsistent to have one >>> collection orthogonal to the type (List) and another one bound to >> specific >>> types (Map). >>> >>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email]> wrote: >>> >>>> Hi Timo, >>>> >>>> Thanks for the proposal. Sorry for the late comments, but I have a few >>>> questions / comments. >>>> >>>> 1. Is a new field of isList necessary in the ConfigOption? >>>> Would it be enough to just check the atomicClass to see if it is a List >> or >>>> not? >>>> Also, in the ConfigOption<Map> class case, are we always assume both key >>>> and value types are String? Can we just apply the same to the >>>> ConfigOption<List>? >>>> BTW, I did a quick search in the codebase but did not find any usage of >>>> ConfigOption<List>. >>>> >>>> 2. The same config name, but with two ConfigOption with different >> semantic >>>> in different component seems super confusing. For example, when users >> set >>>> both configs, they may have no idea one is overriding the other. There >>>> might be two cases: >>>> - If it is just the same config used by different components to act >>>> accordingly, it might be better to just have one config, but describe >>>> clearly on how that config will be used. >>>> - If it is actually two configurations that can be set differently, I >>>> think the config names should just be different. >>>> >>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() method >>>> pretty much means getConfiguration()? The toConfiguration() method >> sounds >>>> like converting an object to a configuration, which only works if the >>>> object does not contain any state / value. I am also wondering if there >> is >>>> a real use case of this method. Because supposedly the configurations >> could >>>> just be passed around to caller of this method. >>>> >>>> Also, can you put the proposal into the FLIP wiki instead of in the >>>> doc before voting? The FLIP wiki allows track the modification history >> and >>>> has a more established structure to ensure nothing is missed. >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther <[hidden email]> >> wrote: >>>>> Hi everyone, >>>>> >>>>> I updated the FLIP proposal one more time as mentioned in the voting >>>>> thread. If there are no objections, I will start a new voting thread >>>>> tomorrow at 9am Berlin time. >>>>> >>>>> Thanks, >>>>> Timo >>>>> >>>>> >>>>> On 22.08.19 14:19, Timo Walther wrote: >>>>>> Hi everyone, >>>>>> >>>>>> thanks for all the feedback we have received online and offline. It >>>>>> showed that many people support the idea of evolving the Flink >>>>>> configuration functionality. I'm almost sure that this FLIP will not >>>>>> solve all issues but at least will improve the current status. >>>>>> >>>>>> We've updated the document and replaced the Correlation part with the >>>>>> concept of a ConfigOptionGroup that can provide all available options >>>>>> of a group plus custom group validators for eager validation. For now, >>>>>> this eager group validation will only be used at certain locations in >>>>>> the Flink code but it prepares for maybe validating the entire global >>>>>> configuration before submitting a job in the future. >>>>>> >>>>>> Please take another look if you find time. I hope we can proceed with >>>>>> the voting process if there are no objections. >>>>>> >>>>>> Regards, >>>>>> Timo >>>>>> >>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: >>>>>>> Hi Stephan, >>>>>>> >>>>>>> thanks for your suggestions. Let me give you some background about >>>>>>> the decisions made in this FLIP: >>>>>>> >>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" because we did >>>>>>> not want to change the entire configuration infrastructure. Both for >>>>>>> backwards-compatibility reasons and the amount of work that would be >>>>>>> required to update all options. If our goal is to rework the >>>>>>> configuration option entirely, I might suggest to switch to JSON >>>>>>> format with JSON schema and JSON validator. However, setting >>>>>>> properties in a CLI or web interface becomes more tricky the more >>>>>>> nested structures are allowed. >>>>>>> >>>>>>> 2. Class-based Options: The current ConfigOption<T> class is centered >>>>>>> around Java classes where T is determined by the default value. The >>>>>>> FLIP just makes this more explicit by offering an explicit >>>>>>> `intType()` method etc. The current design of validators centered >>>>>>> around Java classes makes it possible to have typical domain >>>>>>> validators baked by generics as you suggested. If we introduce types >>>>>>> such as "quantity with measure and unit" we still need to get a class >>>>>>> out of this option at the end, so why changing a proven concept? >>>>>>> >>>>>>> 3. List Options: The `isList` prevents having arbitrary nesting. As >>>>>>> Dawid mentioned, we kept human readability in mind. For every atomic >>>>>>> option like "key=12" can be represented by a list "keys=12;13". But >>>>>>> we don't want to go further; esp. no nesting. A dedicated list option >>>>>>> would start making this more complicated such as >>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), >>>>>>> StringOption(...)))", do we want that? >>>>>>> >>>>>>> 4. Correlation: The correlation part is one of the suggestions that I >>>>>>> like least in the document. We can also discuss removing it entirely, >>>>>>> but I think it solves the use case of relating options with each >>>>>>> other in a flexible way right next to the actual option. Instead of >>>>>>> being hidden in some component initialization, we should put it close >>>>>>> to the option to also perform validation eagerly instead of failing >>>>>>> at runtime when the option is accessed the first time. >>>>>>> >>>>>>> Regards, >>>>>>> Timo >>>>>>> >>>>>>> >>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>>>>>>> A "List Type" sounds like a good direction to me. >>>>>>>> >>>>>>>> The comment on the type system was a bit brief, I agree. The idea is >>>>>>>> to see >>>>>>>> if something like that can ease validation. Especially the >>>> correlation >>>>>>>> system seems quite complex (proxies to work around order of >>>>>>>> initialization). >>>>>>>> >>>>>>>> For example, let's assume we don't think primarily about "java >>>>>>>> types" but >>>>>>>> would define types as one of the following (just examples, haven't >>>>>>>> thought >>>>>>>> all the details through): >>>>>>>> >>>>>>>> (a) category type: implies string, and a fix set of possible >>>> values. >>>>>>>> Those would be passes and naturally make it into the docs and >>>>>>>> validation. >>>>>>>> Maps to a String or Enum in Java. >>>>>>>> >>>>>>>> (b) numeric integer type: implies long (or optionally integer, >> if >>>>>>>> we want >>>>>>>> to automatically check overflow / underflow). would take typical >>>> domain >>>>>>>> validators, like non-negative, etc. >>>>>>>> >>>>>>>> (c) numeric real type: same as above (double or float) >>>>>>>> >>>>>>>> (d) numeric interval type: either defined as an interval, or >>>>>>>> references >>>>>>>> other parameter by key. validation by valid interval. >>>>>>>> >>>>>>>> (e) quantity: a measure and a unit. separately parsable. The >>>>>>>> measure's >>>>>>>> type could be any of the numeric types above, with same validation >>>>>>>> rules. >>>>>>>> >>>>>>>> With a system like the above, would we still correlation validators? >>>>>>>> Are >>>>>>>> there still cases that we need to catch early (config loading) or >>>>>>>> are the >>>>>>>> remaining cases sufficiently rare and runtime or setup specific, >>>>>>>> that it is >>>>>>>> fine to handle them in component initialization? >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>>>>>>> <[hidden email]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Stephan, >>>>>>>>> >>>>>>>>> Thank you for your opinion. >>>>>>>>> >>>>>>>>> Actually list/composite types are the topics we spent the most of >>>> the >>>>>>>>> time. I understand that from a perspective of a full blown type >>>>>>>>> system, >>>>>>>>> a field like isList may look weird. Please let me elaborate a bit >>>> more >>>>>>>>> on the reason behind it though. Maybe we weren't clear enough about >>>> it >>>>>>>>> in the FLIP. The key feature of all the conifg options is that they >>>>>>>>> must >>>>>>>>> have a string representation as they might come from a >> configuration >>>>>>>>> file. Moreover it must be a human readable format, so that the >>>> values >>>>>>>>> might be manually adjusted. Having that in mind we did not want to >>>>>>>>> add a >>>>>>>>> support of an arbitrary nesting and we decided to allow for lists >>>> only >>>>>>>>> (and flat objects - I think though in the current design there is a >>>>>>>>> mistake around the Configurable interface). I think though you have >>>> a >>>>>>>>> point here and it would be better to have a ListConfigOption >>>>>>>>> instead of >>>>>>>>> this field. Does it make sense to you? >>>>>>>>> >>>>>>>>> As for the second part of your message. I am not sure if I >>>> understood >>>>>>>>> it. The validators work with parse/deserialized values from >>>>>>>>> Configuration that means they can be bound to the generic parameter >>>> of >>>>>>>>> Configuration. You can have a RangeValidator<? extends >>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the >>>>>>>>> ConfigOption >>>>>>>>> has anything to do with the validation logic. Could you elaborate a >>>>>>>>> bit >>>>>>>>> more what did you mean? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Dawid >>>>>>>>> >>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>>>>>>> I like the idea of enhancing the configuration and to do early >>>>>>>>> validation. >>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, >>>>>>>>>> though. For >>>>>>>>>> example, having a boolean "isList" is a clear indication of not >>>>>>>>>> having >>>>>>>>>> thought through the type/category system. >>>>>>>>>> Also, having a more clear category system makes validation >> simpler. >>>>>>>>>> For example, I have seen systems distinguishing between numeric >>>>>>>>> parameters >>>>>>>>>> (valid ranges), category parameters (set of possible values), >>>>>>>>>> quantities >>>>>>>>>> like duration and memory size (need measure and unit), which >>>>>>>>>> results in >>>>>>>>> an >>>>>>>>>> elegant system for validation. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < >>>> [hidden email] >>>>>>>>> .invalid> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>>>>>>> This allows the currently cluttered configuration of various >>>>>>>>>>> modules to be unified. >>>>>>>>>>> This is also first step of one of the keys to making new unified >>>>>>>>>>> TableEnvironment available for production. >>>>>>>>>>> >>>>>>>>>>> Previously, we did encounter complex configurations, such as >>>>>>>>>>> specifying the skewed values of column in DDL. The skew may >>>>>>>>>>> be a single field or a combination of multiple fields. So the >>>>>>>>>>> configuration is very troublesome. We used JSON string to >>>>>>>>>>> configure it. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Jingsong Lee >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >> ------------------------------------------------------------------ >>>>>>>>>>> From:Jark Wu <[hidden email]> >>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>>>>>>> To:dev <[hidden email]> >>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and >>>> Configuration >>>>>>>>>>> Thanks for starting this design Timo and Dawid, >>>>>>>>>>> >>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a long >>>> time. >>>>>>>>>>> We have seen the benefit when developing blink configurations and >>>>>>>>> connector >>>>>>>>>>> properties in 1.9 release. >>>>>>>>>>> Thanks for bringing it up and make such a detailed design. >>>>>>>>>>> I will leave my thoughts and comments there. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Jark >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Timo, >>>>>>>>>>>> >>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>>>>>>> >>>>>>>>>>>> Client API enhancement benefit from this evolution which >>>>>>>>>>>> hopefully provides a better view of configuration of Flink. >>>>>>>>>>>> In client API enhancement, we likely make the deployment >>>>>>>>>>>> of cluster and submission of job totally defined by >>>> configuration. >>>>>>>>>>>> Will take a look at the document in days. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> tison. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 >>>>>>>>>>>> 下午10:12写道: >>>>>>>>>>>> >>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>> >>>>>>>>>>>>> Dawid and I are working on making parts of ExecutionConfig and >>>>>>>>>>>>> TableConfig configurable via config options. This is necessary >>>>>>>>>>>>> to make >>>>>>>>>>>>> all properties also available in SQL. Additionally, with the >>>>>>>>>>>>> new SQL >>>>>>>>>>> DDL >>>>>>>>>>>>> based on properties as well as more connectors and formats >>>>>>>>>>>>> coming up, >>>>>>>>>>>>> unified configuration becomes more important. >>>>>>>>>>>>> >>>>>>>>>>>>> We need more features around string-based configuration in the >>>>>>>>>>>>> future, >>>>>>>>>>>>> which is why Dawid and I would like to propose FLIP-54 for >>>>>>>>>>>>> evolving >>>>>>>>> the >>>>>>>>>>>>> ConfigOption and Configuration classes: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>>>>>>>>>>>> In summary it adds: >>>>>>>>>>>>> - documented types and validation >>>>>>>>>>>>> - more common types such as memory size, duration, list >>>>>>>>>>>>> - simple non-nested object types >>>>>>>>>>>>> >>>>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>>>> Timo >>>>>>>>>>>>> >>>>>>>>>>>>> >> |
Hi,
Ad. 1 The advantage of our approach is that you have the type definition close to the option definition. The only difference is that it enables expressing simple pojos with the primitives like ConfigOption<Integer>, ConfigOption<Long> etc. Otherwise as Timo said you will start having the parsing logic scattered everywhere in the code base as it is now. The string representation in our proposal is exactly the same as you explained for those three options. The only difference is that you don't have to parse the elements of a List, Map etc. afterwards. Ad. 2 I think about the withExtendedDescription as a helper getter in a different place, so that the option is easier to find in a different module from it was defined. The MAX_PARALLELISM option in TableOptions would conceptually be equal to: public ConfigOption getMaxParallelismOption() { return CoreOptions.MAX_PARALLELISM; } This allows to further clarify the description of the option in the context of a different module and end up in a seperate page in documentation (but with a link to the original one). In the end it is exactly the same option. It has exactly same key, type, parsing logic, it is in the end forwarded to the same place. Ad. 3 Not sure if I understand your concerns here. As Timo said it is in the end sth similar to toBytes/fromBytes, but it puts itself to a Configuration. Also just wanted to make sure we adjusted this part slightly and now the ConfigOption takes ConfigurableFactory. Best, Dawid On 30/08/2019 09:39, Timo Walther wrote: > Hi Becket, > > thanks for the discussion. > > 1. ConfigOptions in their current design are bound to classes. > Regarding, the option is "creating some Configurable objects instead > of defining the config to create > those Configurable"? We just moved this logic to a factory, this > factory can then also be used for other purposes. However, how the > option and objects are serialized to Configuration is still not part > of the option. The option is just pure declaration. > > If we would allow only List<String>, implementers would need to start > implementing own parsing and validation logic all the time. We would > like to avoid that. > > Please also keep in mind that Configuration must not consist of only > strings, it manages a Map<String, Object> for efficient access. Every > map entry can have a string representation for persistence, but in > most cases consists of unserialized objects. > > 2. MAX_PARALLELISM is still defined just once. We don't overwrite > keys, types or default values. But different layers might want to add > helpful information. In our concrete use case for FLIP-59, > ExecutionConfig has 50 properties and many of them are not relevant > for the Table layer or have no effect at all. We would like to list > and mention the most important config options again in the Table > Configuration section, so that users are not confused, but with a > strong link to the core option. E.g.: registered kryo serializers are > also important also for Table users, we would like to add the comment > "This option allows to modify the serialization of the ANY SQL data > type.". I think we should not spam the core configuration page with > comments from all layers, connectors, or libraries but keep this in > the corresponding component documentation. > > 3. But it is something like fromBytes() and toBytes()? It serializes > and deserializes T from a configuration? > > Regards, > Timo > > On 29.08.19 19:14, Becket Qin wrote: >> Hi Timo and Stephan, >> >> Thanks for the detail explanation. >> >> 1. I agree that each config should be in a human readable format. My >> concern is that the current List<Configurable> looks going a little >> too far >> from what the configuration is supposed to do. They are essentially >> creating some Configurable objects instead of defining the config to >> create >> those Configurable. This mixes ConfigOption and the usage of it. API >> wise >> it would be good to keep the configs and their usages (such as how to >> create objects using the ConfigOption) apart from each other. >> I am wondering if we can just make List also only take string. For >> example, >> is the following definition of map and list sufficient? >> >> A MapConfigOption is ConfigOption<Map<String, String>>. It can be >> defined >> as: >> map_config_name: k1=v1, k2=v2, k3=v3, ... >> >> A ListConfigOption is ConfigOption<List<String>>. It can be defined as: >> list_config_name: v1, v2, v3, ... >> >> A ListOfMapConfigOption is ConfigOption<List<Map<String, String>>. It >> can >> be defined as: >> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... >> >> All the key and values in the configuration are String. This also >> guarantees that the configuration is always serializable. >> If we want to do one more step, we can allow the ConfigOption to set all >> the primitive types and parse that for the users. So something like >> List<Integer>, List<Class<?>> seems fine. >> >> The configuration class could also have util methods to create a list of >> configurable such as: >> <T> List<T> <Configuration#getConfigurableInstances(ListMapConfigOption, >> Class<T> clazz). >> But the configuration class will not take arbitrary Configurable as the >> value of its config. >> >> 2. I might have misunderstood this. But my concern on the description >> extension is in the following example. >> >> public static final ConfigOption<Integer> MAX_PARALLELISM = >> >> CoreOptions.MAX_PARALLELISM.withExtendedDescription( >> "Note: That this property means that a table program has a side-effect >> XYZ."); >> >> In this case, we will have two MAX_PARALLELISM configs now. One is >> CoreOptions.MAX_PARALLELISM. The other one is defined here. I suppose >> users >> will see both configurations. One with an extended description and one >> without. Let's say there is a third component which also users >> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM >> ConfigOption? If >> so, what would that ConfigOption's description look like? >> >> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM >> and the >> description should clearly state all the usage of this ConfigOption. >> >> 3. I see, in that case, how about we name it something like >> extractConfiguration()? I am just trying to see if we can make it clear >> this is not something like fromBytes() and toBytes(). >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]> wrote: >> >>> Hi Becket, >>> >>> let me try to clarify some of your questions: >>> >>> 1. For every option, we also needed to think about how to represent it >>> in a human readable format. We do not want to allow arbitrary nesting >>> because that would easily allow to bypass the flattened hierarchy of >>> config options (`session.memory.min`). The current design allows to >>> represent every option type as a list. E.g.: >>> >>> `myIntOption: 12` can be `myIntListOption: 12;12` >>> `myObjectOption: field=12,other=true` can be `myObjectListOption: >>> field=12,other=true; field=12,other=true` >>> `myPropertyOption: key=str0,other=str1` can be `myPropertyListOption: >>> key=str0,other=str1;key=str0,other=str1` >>> >>> We need the atomic class for serialization/deserialization both in >>> binary and string format. >>> >>> ConfigOption<List> is not present in the code base yet, but this >>> FLIP is >>> a preparation of making ExecutionConfig configurable. If you look into >>> this class or also in existing table connectors/formats, you will see >>> that each proposed option type has its requirements. >>> >>> 2. Regarding extending the description of ConfigOptions, the >>> semantic of >>> one option should be a super set of the other option. E.g. in Table API >>> we might use general ExecutionConfig properties. But we would like >>> to a) >>> make external options more prominent in the Table API config docs to >>> link people to properties they should pay attention b) notice about >>> side >>> effects. The core semantic of a property should not change. >>> >>> 3. The factory will not receive the entire configuration but works in a >>> separate key space. For `myObjectOption` above, it would receive a >>> configuration that consists of `field: 12` and `other: true`. >>> >>> I agree. I will convert the document into a Wiki page today. >>> >>> Thanks, >>> Timo >>> >>> On 29.08.19 09:00, Stephan Ewen wrote: >>>> @Becket One thing that may be non-obvious is that the Configuration >>>> class >>>> also defines serialization / persistence logic at the moment. So it >>>> needs >>>> to know the set of types it supports. That stands in the way of an >>>> arbitrary generic map type. >>>> >>>> @Timo I agree though that it seems a bit inconsistent to have one >>>> collection orthogonal to the type (List) and another one bound to >>> specific >>>> types (Map). >>>> >>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email]> >>>> wrote: >>>> >>>>> Hi Timo, >>>>> >>>>> Thanks for the proposal. Sorry for the late comments, but I have a >>>>> few >>>>> questions / comments. >>>>> >>>>> 1. Is a new field of isList necessary in the ConfigOption? >>>>> Would it be enough to just check the atomicClass to see if it is a >>>>> List >>> or >>>>> not? >>>>> Also, in the ConfigOption<Map> class case, are we always assume >>>>> both key >>>>> and value types are String? Can we just apply the same to the >>>>> ConfigOption<List>? >>>>> BTW, I did a quick search in the codebase but did not find any >>>>> usage of >>>>> ConfigOption<List>. >>>>> >>>>> 2. The same config name, but with two ConfigOption with different >>> semantic >>>>> in different component seems super confusing. For example, when users >>> set >>>>> both configs, they may have no idea one is overriding the other. >>>>> There >>>>> might be two cases: >>>>> - If it is just the same config used by different components to >>>>> act >>>>> accordingly, it might be better to just have one config, but describe >>>>> clearly on how that config will be used. >>>>> - If it is actually two configurations that can be set >>>>> differently, I >>>>> think the config names should just be different. >>>>> >>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() method >>>>> pretty much means getConfiguration()? The toConfiguration() method >>> sounds >>>>> like converting an object to a configuration, which only works if the >>>>> object does not contain any state / value. I am also wondering if >>>>> there >>> is >>>>> a real use case of this method. Because supposedly the configurations >>> could >>>>> just be passed around to caller of this method. >>>>> >>>>> Also, can you put the proposal into the FLIP wiki instead of in the >>>>> doc before voting? The FLIP wiki allows track the modification >>>>> history >>> and >>>>> has a more established structure to ensure nothing is missed. >>>>> >>>>> Thanks, >>>>> >>>>> Jiangjie (Becket) Qin >>>>> >>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther <[hidden email]> >>> wrote: >>>>>> Hi everyone, >>>>>> >>>>>> I updated the FLIP proposal one more time as mentioned in the voting >>>>>> thread. If there are no objections, I will start a new voting thread >>>>>> tomorrow at 9am Berlin time. >>>>>> >>>>>> Thanks, >>>>>> Timo >>>>>> >>>>>> >>>>>> On 22.08.19 14:19, Timo Walther wrote: >>>>>>> Hi everyone, >>>>>>> >>>>>>> thanks for all the feedback we have received online and offline. It >>>>>>> showed that many people support the idea of evolving the Flink >>>>>>> configuration functionality. I'm almost sure that this FLIP will >>>>>>> not >>>>>>> solve all issues but at least will improve the current status. >>>>>>> >>>>>>> We've updated the document and replaced the Correlation part >>>>>>> with the >>>>>>> concept of a ConfigOptionGroup that can provide all available >>>>>>> options >>>>>>> of a group plus custom group validators for eager validation. >>>>>>> For now, >>>>>>> this eager group validation will only be used at certain >>>>>>> locations in >>>>>>> the Flink code but it prepares for maybe validating the entire >>>>>>> global >>>>>>> configuration before submitting a job in the future. >>>>>>> >>>>>>> Please take another look if you find time. I hope we can proceed >>>>>>> with >>>>>>> the voting process if there are no objections. >>>>>>> >>>>>>> Regards, >>>>>>> Timo >>>>>>> >>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: >>>>>>>> Hi Stephan, >>>>>>>> >>>>>>>> thanks for your suggestions. Let me give you some background about >>>>>>>> the decisions made in this FLIP: >>>>>>>> >>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" because we did >>>>>>>> not want to change the entire configuration infrastructure. >>>>>>>> Both for >>>>>>>> backwards-compatibility reasons and the amount of work that >>>>>>>> would be >>>>>>>> required to update all options. If our goal is to rework the >>>>>>>> configuration option entirely, I might suggest to switch to JSON >>>>>>>> format with JSON schema and JSON validator. However, setting >>>>>>>> properties in a CLI or web interface becomes more tricky the more >>>>>>>> nested structures are allowed. >>>>>>>> >>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is >>>>>>>> centered >>>>>>>> around Java classes where T is determined by the default value. >>>>>>>> The >>>>>>>> FLIP just makes this more explicit by offering an explicit >>>>>>>> `intType()` method etc. The current design of validators centered >>>>>>>> around Java classes makes it possible to have typical domain >>>>>>>> validators baked by generics as you suggested. If we introduce >>>>>>>> types >>>>>>>> such as "quantity with measure and unit" we still need to get a >>>>>>>> class >>>>>>>> out of this option at the end, so why changing a proven concept? >>>>>>>> >>>>>>>> 3. List Options: The `isList` prevents having arbitrary >>>>>>>> nesting. As >>>>>>>> Dawid mentioned, we kept human readability in mind. For every >>>>>>>> atomic >>>>>>>> option like "key=12" can be represented by a list "keys=12;13". >>>>>>>> But >>>>>>>> we don't want to go further; esp. no nesting. A dedicated list >>>>>>>> option >>>>>>>> would start making this more complicated such as >>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), >>>>>>>> StringOption(...)))", do we want that? >>>>>>>> >>>>>>>> 4. Correlation: The correlation part is one of the suggestions >>>>>>>> that I >>>>>>>> like least in the document. We can also discuss removing it >>>>>>>> entirely, >>>>>>>> but I think it solves the use case of relating options with each >>>>>>>> other in a flexible way right next to the actual option. >>>>>>>> Instead of >>>>>>>> being hidden in some component initialization, we should put it >>>>>>>> close >>>>>>>> to the option to also perform validation eagerly instead of >>>>>>>> failing >>>>>>>> at runtime when the option is accessed the first time. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Timo >>>>>>>> >>>>>>>> >>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>>>>>>>> A "List Type" sounds like a good direction to me. >>>>>>>>> >>>>>>>>> The comment on the type system was a bit brief, I agree. The >>>>>>>>> idea is >>>>>>>>> to see >>>>>>>>> if something like that can ease validation. Especially the >>>>> correlation >>>>>>>>> system seems quite complex (proxies to work around order of >>>>>>>>> initialization). >>>>>>>>> >>>>>>>>> For example, let's assume we don't think primarily about "java >>>>>>>>> types" but >>>>>>>>> would define types as one of the following (just examples, >>>>>>>>> haven't >>>>>>>>> thought >>>>>>>>> all the details through): >>>>>>>>> >>>>>>>>> (a) category type: implies string, and a fix set of possible >>>>> values. >>>>>>>>> Those would be passes and naturally make it into the docs and >>>>>>>>> validation. >>>>>>>>> Maps to a String or Enum in Java. >>>>>>>>> >>>>>>>>> (b) numeric integer type: implies long (or optionally >>>>>>>>> integer, >>> if >>>>>>>>> we want >>>>>>>>> to automatically check overflow / underflow). would take typical >>>>> domain >>>>>>>>> validators, like non-negative, etc. >>>>>>>>> >>>>>>>>> (c) numeric real type: same as above (double or float) >>>>>>>>> >>>>>>>>> (d) numeric interval type: either defined as an interval, or >>>>>>>>> references >>>>>>>>> other parameter by key. validation by valid interval. >>>>>>>>> >>>>>>>>> (e) quantity: a measure and a unit. separately parsable. The >>>>>>>>> measure's >>>>>>>>> type could be any of the numeric types above, with same >>>>>>>>> validation >>>>>>>>> rules. >>>>>>>>> >>>>>>>>> With a system like the above, would we still correlation >>>>>>>>> validators? >>>>>>>>> Are >>>>>>>>> there still cases that we need to catch early (config loading) or >>>>>>>>> are the >>>>>>>>> remaining cases sufficiently rare and runtime or setup specific, >>>>>>>>> that it is >>>>>>>>> fine to handle them in component initialization? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>>>>>>>> <[hidden email]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Stephan, >>>>>>>>>> >>>>>>>>>> Thank you for your opinion. >>>>>>>>>> >>>>>>>>>> Actually list/composite types are the topics we spent the >>>>>>>>>> most of >>>>> the >>>>>>>>>> time. I understand that from a perspective of a full blown type >>>>>>>>>> system, >>>>>>>>>> a field like isList may look weird. Please let me elaborate a >>>>>>>>>> bit >>>>> more >>>>>>>>>> on the reason behind it though. Maybe we weren't clear enough >>>>>>>>>> about >>>>> it >>>>>>>>>> in the FLIP. The key feature of all the conifg options is >>>>>>>>>> that they >>>>>>>>>> must >>>>>>>>>> have a string representation as they might come from a >>> configuration >>>>>>>>>> file. Moreover it must be a human readable format, so that the >>>>> values >>>>>>>>>> might be manually adjusted. Having that in mind we did not >>>>>>>>>> want to >>>>>>>>>> add a >>>>>>>>>> support of an arbitrary nesting and we decided to allow for >>>>>>>>>> lists >>>>> only >>>>>>>>>> (and flat objects - I think though in the current design >>>>>>>>>> there is a >>>>>>>>>> mistake around the Configurable interface). I think though >>>>>>>>>> you have >>>>> a >>>>>>>>>> point here and it would be better to have a ListConfigOption >>>>>>>>>> instead of >>>>>>>>>> this field. Does it make sense to you? >>>>>>>>>> >>>>>>>>>> As for the second part of your message. I am not sure if I >>>>> understood >>>>>>>>>> it. The validators work with parse/deserialized values from >>>>>>>>>> Configuration that means they can be bound to the generic >>>>>>>>>> parameter >>>>> of >>>>>>>>>> Configuration. You can have a RangeValidator<? extends >>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the >>>>>>>>>> ConfigOption >>>>>>>>>> has anything to do with the validation logic. Could you >>>>>>>>>> elaborate a >>>>>>>>>> bit >>>>>>>>>> more what did you mean? >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> >>>>>>>>>> Dawid >>>>>>>>>> >>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>>>>>>>> I like the idea of enhancing the configuration and to do early >>>>>>>>>> validation. >>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, >>>>>>>>>>> though. For >>>>>>>>>>> example, having a boolean "isList" is a clear indication of not >>>>>>>>>>> having >>>>>>>>>>> thought through the type/category system. >>>>>>>>>>> Also, having a more clear category system makes validation >>> simpler. >>>>>>>>>>> For example, I have seen systems distinguishing between numeric >>>>>>>>>> parameters >>>>>>>>>>> (valid ranges), category parameters (set of possible values), >>>>>>>>>>> quantities >>>>>>>>>>> like duration and memory size (need measure and unit), which >>>>>>>>>>> results in >>>>>>>>>> an >>>>>>>>>>> elegant system for validation. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < >>>>> [hidden email] >>>>>>>>>> .invalid> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>>>>>>>> This allows the currently cluttered configuration of various >>>>>>>>>>>> modules to be unified. >>>>>>>>>>>> This is also first step of one of the keys to making new >>>>>>>>>>>> unified >>>>>>>>>>>> TableEnvironment available for production. >>>>>>>>>>>> >>>>>>>>>>>> Previously, we did encounter complex configurations, such as >>>>>>>>>>>> specifying the skewed values of column in DDL. The skew may >>>>>>>>>>>> be a single field or a combination of multiple fields. >>>>>>>>>>>> So the >>>>>>>>>>>> configuration is very troublesome. We used JSON string to >>>>>>>>>>>> configure it. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Jingsong Lee >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>> ------------------------------------------------------------------ >>>>>>>>>>>> From:Jark Wu <[hidden email]> >>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>>>>>>>> To:dev <[hidden email]> >>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and >>>>> Configuration >>>>>>>>>>>> Thanks for starting this design Timo and Dawid, >>>>>>>>>>>> >>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a long >>>>> time. >>>>>>>>>>>> We have seen the benefit when developing blink >>>>>>>>>>>> configurations and >>>>>>>>>> connector >>>>>>>>>>>> properties in 1.9 release. >>>>>>>>>>>> Thanks for bringing it up and make such a detailed design. >>>>>>>>>>>> I will leave my thoughts and comments there. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Jark >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>> >>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>>>>>>>> >>>>>>>>>>>>> Client API enhancement benefit from this evolution which >>>>>>>>>>>>> hopefully provides a better view of configuration of Flink. >>>>>>>>>>>>> In client API enhancement, we likely make the deployment >>>>>>>>>>>>> of cluster and submission of job totally defined by >>>>> configuration. >>>>>>>>>>>>> Will take a look at the document in days. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> tison. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 >>>>>>>>>>>>> 下午10:12写道: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Dawid and I are working on making parts of >>>>>>>>>>>>>> ExecutionConfig and >>>>>>>>>>>>>> TableConfig configurable via config options. This is >>>>>>>>>>>>>> necessary >>>>>>>>>>>>>> to make >>>>>>>>>>>>>> all properties also available in SQL. Additionally, with the >>>>>>>>>>>>>> new SQL >>>>>>>>>>>> DDL >>>>>>>>>>>>>> based on properties as well as more connectors and formats >>>>>>>>>>>>>> coming up, >>>>>>>>>>>>>> unified configuration becomes more important. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We need more features around string-based configuration >>>>>>>>>>>>>> in the >>>>>>>>>>>>>> future, >>>>>>>>>>>>>> which is why Dawid and I would like to propose FLIP-54 for >>>>>>>>>>>>>> evolving >>>>>>>>>> the >>>>>>>>>>>>>> ConfigOption and Configuration classes: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>> >>>>>>>>>>>>>> In summary it adds: >>>>>>>>>>>>>> - documented types and validation >>>>>>>>>>>>>> - more common types such as memory size, duration, list >>>>>>>>>>>>>> - simple non-nested object types >>>>>>>>>>>>>> >>>>>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>>>>> Timo >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>> > signature.asc (849 bytes) Download Attachment |
Hi Timo,
Thanks again for the clarification. Please see a few more questions below. Re: 1 > Please also keep in mind that Configuration must not consist of only > strings, it manages a Map<String, Object> for efficient access. Every > map entry can have a string representation for persistence, but in most > cases consists of unserialized objects. I'd like to understand this a bit more. The reason we have a Map<String, Object> in Configuration was because that Object could be either a String, a List or a Map, right? But they eventually always boil down to Strings, or maybe one of the predefined type that we know how to serialize. In the current design, can the Object also be Configurable? If the value in the config Map<String, Object> can be Configurable objects, how do we serialize them? Calling toConfiguration() seems not quite working because there might be some other internal fields that are not part of the configuration. The modification to those fields will be lost if we simply use toConfiguration(). So the impact of modifying those Configurable objects seems a little undefined. And it would be difficult to prevent users from doing that. If the value in the config Map<String, Object> cannot be Configurable objects, then it seems a little weird to have all the other ConfigType stored in the ConfigMap in their own native type and accessed via getInteger() / getBoolean(), etc, while having ConfigurableType to be different from others because one have to use ConfigurableFactory to get the configurations. Re: 2 > I think about the withExtendedDescription as a helper getter in a > different place, so that the option is easier to find in a different > module from it was defined. > The MAX_PARALLELISM option in TableOptions would conceptually be equal to: > public ConfigOption getMaxParallelismOption() { > return CoreOptions.MAX_PARALLELISM; > } Just to make sure I understand it correctly, does that mean users will see something like following? - CoreOptions.MAX_PARALLELISM with description_1; - TableOptions.MAX_PARALLELISM with description_1 + description_2. - DataStreamOptions.MAX_PARALLELISM with description_1 + description_3. And users will only configure exactly one MAX_PARALLELISM cross the board. So they won't be confused by setting two MAX_PARALLELISM config for two different modules, while they are actually the same config. If that is the case, I don't have further concern. Re: 3 Maybe I am thinking too much. I thought toBytes() / fromBytes() actually restore the original Object. But fromConfiguration() and toConfiguration() does not do that, anything not in the configuration of the original object will be lost. So it would be good to make that clear. Maybe a clear Java doc is sufficient. Thanks, Jiangjie (Becket) Qin On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz <[hidden email]> wrote: > Hi, > > Ad. 1 > > The advantage of our approach is that you have the type definition close > to the option definition. The only difference is that it enables > expressing simple pojos with the primitives like ConfigOption<Integer>, > ConfigOption<Long> etc. Otherwise as Timo said you will start having > > the parsing logic scattered everywhere in the code base as it is now. > The string representation in our proposal is exactly the same as you > explained for those three options. The only difference is that you don't > have to parse the elements of a List, Map etc. afterwards. > > Ad. 2 > > I think about the withExtendedDescription as a helper getter in a > different place, so that the option is easier to find in a different > module from it was defined. > > The MAX_PARALLELISM option in TableOptions would conceptually be equal to: > > public ConfigOption getMaxParallelismOption() { > > return CoreOptions.MAX_PARALLELISM; > > } > > This allows to further clarify the description of the option in the > context of a different module and end up in a seperate page in > documentation (but with a link to the original one). In the end it is > exactly the same option. It has exactly same key, type, parsing logic, > it is in the end forwarded to the same place. > > Ad. 3 > > Not sure if I understand your concerns here. As Timo said it is in the > end sth similar to toBytes/fromBytes, but it puts itself to a > Configuration. Also just wanted to make sure we adjusted this part > slightly and now the ConfigOption takes ConfigurableFactory. > > Best, > > Dawid > > > On 30/08/2019 09:39, Timo Walther wrote: > > Hi Becket, > > > > thanks for the discussion. > > > > 1. ConfigOptions in their current design are bound to classes. > > Regarding, the option is "creating some Configurable objects instead > > of defining the config to create > > those Configurable"? We just moved this logic to a factory, this > > factory can then also be used for other purposes. However, how the > > option and objects are serialized to Configuration is still not part > > of the option. The option is just pure declaration. > > > > If we would allow only List<String>, implementers would need to start > > implementing own parsing and validation logic all the time. We would > > like to avoid that. > > > > Please also keep in mind that Configuration must not consist of only > > strings, it manages a Map<String, Object> for efficient access. Every > > map entry can have a string representation for persistence, but in > > most cases consists of unserialized objects. > > > > 2. MAX_PARALLELISM is still defined just once. We don't overwrite > > keys, types or default values. But different layers might want to add > > helpful information. In our concrete use case for FLIP-59, > > ExecutionConfig has 50 properties and many of them are not relevant > > for the Table layer or have no effect at all. We would like to list > > and mention the most important config options again in the Table > > Configuration section, so that users are not confused, but with a > > strong link to the core option. E.g.: registered kryo serializers are > > also important also for Table users, we would like to add the comment > > "This option allows to modify the serialization of the ANY SQL data > > type.". I think we should not spam the core configuration page with > > comments from all layers, connectors, or libraries but keep this in > > the corresponding component documentation. > > > > 3. But it is something like fromBytes() and toBytes()? It serializes > > and deserializes T from a configuration? > > > > Regards, > > Timo > > > > On 29.08.19 19:14, Becket Qin wrote: > >> Hi Timo and Stephan, > >> > >> Thanks for the detail explanation. > >> > >> 1. I agree that each config should be in a human readable format. My > >> concern is that the current List<Configurable> looks going a little > >> too far > >> from what the configuration is supposed to do. They are essentially > >> creating some Configurable objects instead of defining the config to > >> create > >> those Configurable. This mixes ConfigOption and the usage of it. API > >> wise > >> it would be good to keep the configs and their usages (such as how to > >> create objects using the ConfigOption) apart from each other. > >> I am wondering if we can just make List also only take string. For > >> example, > >> is the following definition of map and list sufficient? > >> > >> A MapConfigOption is ConfigOption<Map<String, String>>. It can be > >> defined > >> as: > >> map_config_name: k1=v1, k2=v2, k3=v3, ... > >> > >> A ListConfigOption is ConfigOption<List<String>>. It can be defined as: > >> list_config_name: v1, v2, v3, ... > >> > >> A ListOfMapConfigOption is ConfigOption<List<Map<String, String>>. It > >> can > >> be defined as: > >> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... > >> > >> All the key and values in the configuration are String. This also > >> guarantees that the configuration is always serializable. > >> If we want to do one more step, we can allow the ConfigOption to set all > >> the primitive types and parse that for the users. So something like > >> List<Integer>, List<Class<?>> seems fine. > >> > >> The configuration class could also have util methods to create a list of > >> configurable such as: > >> <T> List<T> <Configuration#getConfigurableInstances(ListMapConfigOption, > >> Class<T> clazz). > >> But the configuration class will not take arbitrary Configurable as the > >> value of its config. > >> > >> 2. I might have misunderstood this. But my concern on the description > >> extension is in the following example. > >> > >> public static final ConfigOption<Integer> MAX_PARALLELISM = > >> > >> CoreOptions.MAX_PARALLELISM.withExtendedDescription( > >> "Note: That this property means that a table program has a side-effect > >> XYZ."); > >> > >> In this case, we will have two MAX_PARALLELISM configs now. One is > >> CoreOptions.MAX_PARALLELISM. The other one is defined here. I suppose > >> users > >> will see both configurations. One with an extended description and one > >> without. Let's say there is a third component which also users > >> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM > >> ConfigOption? If > >> so, what would that ConfigOption's description look like? > >> > >> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM > >> and the > >> description should clearly state all the usage of this ConfigOption. > >> > >> 3. I see, in that case, how about we name it something like > >> extractConfiguration()? I am just trying to see if we can make it clear > >> this is not something like fromBytes() and toBytes(). > >> > >> Thanks, > >> > >> Jiangjie (Becket) Qin > >> > >> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]> > wrote: > >> > >>> Hi Becket, > >>> > >>> let me try to clarify some of your questions: > >>> > >>> 1. For every option, we also needed to think about how to represent it > >>> in a human readable format. We do not want to allow arbitrary nesting > >>> because that would easily allow to bypass the flattened hierarchy of > >>> config options (`session.memory.min`). The current design allows to > >>> represent every option type as a list. E.g.: > >>> > >>> `myIntOption: 12` can be `myIntListOption: 12;12` > >>> `myObjectOption: field=12,other=true` can be `myObjectListOption: > >>> field=12,other=true; field=12,other=true` > >>> `myPropertyOption: key=str0,other=str1` can be `myPropertyListOption: > >>> key=str0,other=str1;key=str0,other=str1` > >>> > >>> We need the atomic class for serialization/deserialization both in > >>> binary and string format. > >>> > >>> ConfigOption<List> is not present in the code base yet, but this > >>> FLIP is > >>> a preparation of making ExecutionConfig configurable. If you look into > >>> this class or also in existing table connectors/formats, you will see > >>> that each proposed option type has its requirements. > >>> > >>> 2. Regarding extending the description of ConfigOptions, the > >>> semantic of > >>> one option should be a super set of the other option. E.g. in Table API > >>> we might use general ExecutionConfig properties. But we would like > >>> to a) > >>> make external options more prominent in the Table API config docs to > >>> link people to properties they should pay attention b) notice about > >>> side > >>> effects. The core semantic of a property should not change. > >>> > >>> 3. The factory will not receive the entire configuration but works in a > >>> separate key space. For `myObjectOption` above, it would receive a > >>> configuration that consists of `field: 12` and `other: true`. > >>> > >>> I agree. I will convert the document into a Wiki page today. > >>> > >>> Thanks, > >>> Timo > >>> > >>> On 29.08.19 09:00, Stephan Ewen wrote: > >>>> @Becket One thing that may be non-obvious is that the Configuration > >>>> class > >>>> also defines serialization / persistence logic at the moment. So it > >>>> needs > >>>> to know the set of types it supports. That stands in the way of an > >>>> arbitrary generic map type. > >>>> > >>>> @Timo I agree though that it seems a bit inconsistent to have one > >>>> collection orthogonal to the type (List) and another one bound to > >>> specific > >>>> types (Map). > >>>> > >>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email]> > >>>> wrote: > >>>> > >>>>> Hi Timo, > >>>>> > >>>>> Thanks for the proposal. Sorry for the late comments, but I have a > >>>>> few > >>>>> questions / comments. > >>>>> > >>>>> 1. Is a new field of isList necessary in the ConfigOption? > >>>>> Would it be enough to just check the atomicClass to see if it is a > >>>>> List > >>> or > >>>>> not? > >>>>> Also, in the ConfigOption<Map> class case, are we always assume > >>>>> both key > >>>>> and value types are String? Can we just apply the same to the > >>>>> ConfigOption<List>? > >>>>> BTW, I did a quick search in the codebase but did not find any > >>>>> usage of > >>>>> ConfigOption<List>. > >>>>> > >>>>> 2. The same config name, but with two ConfigOption with different > >>> semantic > >>>>> in different component seems super confusing. For example, when users > >>> set > >>>>> both configs, they may have no idea one is overriding the other. > >>>>> There > >>>>> might be two cases: > >>>>> - If it is just the same config used by different components to > >>>>> act > >>>>> accordingly, it might be better to just have one config, but describe > >>>>> clearly on how that config will be used. > >>>>> - If it is actually two configurations that can be set > >>>>> differently, I > >>>>> think the config names should just be different. > >>>>> > >>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() method > >>>>> pretty much means getConfiguration()? The toConfiguration() method > >>> sounds > >>>>> like converting an object to a configuration, which only works if the > >>>>> object does not contain any state / value. I am also wondering if > >>>>> there > >>> is > >>>>> a real use case of this method. Because supposedly the configurations > >>> could > >>>>> just be passed around to caller of this method. > >>>>> > >>>>> Also, can you put the proposal into the FLIP wiki instead of in the > >>>>> doc before voting? The FLIP wiki allows track the modification > >>>>> history > >>> and > >>>>> has a more established structure to ensure nothing is missed. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Jiangjie (Becket) Qin > >>>>> > >>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther <[hidden email]> > >>> wrote: > >>>>>> Hi everyone, > >>>>>> > >>>>>> I updated the FLIP proposal one more time as mentioned in the voting > >>>>>> thread. If there are no objections, I will start a new voting thread > >>>>>> tomorrow at 9am Berlin time. > >>>>>> > >>>>>> Thanks, > >>>>>> Timo > >>>>>> > >>>>>> > >>>>>> On 22.08.19 14:19, Timo Walther wrote: > >>>>>>> Hi everyone, > >>>>>>> > >>>>>>> thanks for all the feedback we have received online and offline. It > >>>>>>> showed that many people support the idea of evolving the Flink > >>>>>>> configuration functionality. I'm almost sure that this FLIP will > >>>>>>> not > >>>>>>> solve all issues but at least will improve the current status. > >>>>>>> > >>>>>>> We've updated the document and replaced the Correlation part > >>>>>>> with the > >>>>>>> concept of a ConfigOptionGroup that can provide all available > >>>>>>> options > >>>>>>> of a group plus custom group validators for eager validation. > >>>>>>> For now, > >>>>>>> this eager group validation will only be used at certain > >>>>>>> locations in > >>>>>>> the Flink code but it prepares for maybe validating the entire > >>>>>>> global > >>>>>>> configuration before submitting a job in the future. > >>>>>>> > >>>>>>> Please take another look if you find time. I hope we can proceed > >>>>>>> with > >>>>>>> the voting process if there are no objections. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Timo > >>>>>>> > >>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: > >>>>>>>> Hi Stephan, > >>>>>>>> > >>>>>>>> thanks for your suggestions. Let me give you some background about > >>>>>>>> the decisions made in this FLIP: > >>>>>>>> > >>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" because we did > >>>>>>>> not want to change the entire configuration infrastructure. > >>>>>>>> Both for > >>>>>>>> backwards-compatibility reasons and the amount of work that > >>>>>>>> would be > >>>>>>>> required to update all options. If our goal is to rework the > >>>>>>>> configuration option entirely, I might suggest to switch to JSON > >>>>>>>> format with JSON schema and JSON validator. However, setting > >>>>>>>> properties in a CLI or web interface becomes more tricky the more > >>>>>>>> nested structures are allowed. > >>>>>>>> > >>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is > >>>>>>>> centered > >>>>>>>> around Java classes where T is determined by the default value. > >>>>>>>> The > >>>>>>>> FLIP just makes this more explicit by offering an explicit > >>>>>>>> `intType()` method etc. The current design of validators centered > >>>>>>>> around Java classes makes it possible to have typical domain > >>>>>>>> validators baked by generics as you suggested. If we introduce > >>>>>>>> types > >>>>>>>> such as "quantity with measure and unit" we still need to get a > >>>>>>>> class > >>>>>>>> out of this option at the end, so why changing a proven concept? > >>>>>>>> > >>>>>>>> 3. List Options: The `isList` prevents having arbitrary > >>>>>>>> nesting. As > >>>>>>>> Dawid mentioned, we kept human readability in mind. For every > >>>>>>>> atomic > >>>>>>>> option like "key=12" can be represented by a list "keys=12;13". > >>>>>>>> But > >>>>>>>> we don't want to go further; esp. no nesting. A dedicated list > >>>>>>>> option > >>>>>>>> would start making this more complicated such as > >>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), > >>>>>>>> StringOption(...)))", do we want that? > >>>>>>>> > >>>>>>>> 4. Correlation: The correlation part is one of the suggestions > >>>>>>>> that I > >>>>>>>> like least in the document. We can also discuss removing it > >>>>>>>> entirely, > >>>>>>>> but I think it solves the use case of relating options with each > >>>>>>>> other in a flexible way right next to the actual option. > >>>>>>>> Instead of > >>>>>>>> being hidden in some component initialization, we should put it > >>>>>>>> close > >>>>>>>> to the option to also perform validation eagerly instead of > >>>>>>>> failing > >>>>>>>> at runtime when the option is accessed the first time. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Timo > >>>>>>>> > >>>>>>>> > >>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: > >>>>>>>>> A "List Type" sounds like a good direction to me. > >>>>>>>>> > >>>>>>>>> The comment on the type system was a bit brief, I agree. The > >>>>>>>>> idea is > >>>>>>>>> to see > >>>>>>>>> if something like that can ease validation. Especially the > >>>>> correlation > >>>>>>>>> system seems quite complex (proxies to work around order of > >>>>>>>>> initialization). > >>>>>>>>> > >>>>>>>>> For example, let's assume we don't think primarily about "java > >>>>>>>>> types" but > >>>>>>>>> would define types as one of the following (just examples, > >>>>>>>>> haven't > >>>>>>>>> thought > >>>>>>>>> all the details through): > >>>>>>>>> > >>>>>>>>> (a) category type: implies string, and a fix set of possible > >>>>> values. > >>>>>>>>> Those would be passes and naturally make it into the docs and > >>>>>>>>> validation. > >>>>>>>>> Maps to a String or Enum in Java. > >>>>>>>>> > >>>>>>>>> (b) numeric integer type: implies long (or optionally > >>>>>>>>> integer, > >>> if > >>>>>>>>> we want > >>>>>>>>> to automatically check overflow / underflow). would take typical > >>>>> domain > >>>>>>>>> validators, like non-negative, etc. > >>>>>>>>> > >>>>>>>>> (c) numeric real type: same as above (double or float) > >>>>>>>>> > >>>>>>>>> (d) numeric interval type: either defined as an interval, or > >>>>>>>>> references > >>>>>>>>> other parameter by key. validation by valid interval. > >>>>>>>>> > >>>>>>>>> (e) quantity: a measure and a unit. separately parsable. The > >>>>>>>>> measure's > >>>>>>>>> type could be any of the numeric types above, with same > >>>>>>>>> validation > >>>>>>>>> rules. > >>>>>>>>> > >>>>>>>>> With a system like the above, would we still correlation > >>>>>>>>> validators? > >>>>>>>>> Are > >>>>>>>>> there still cases that we need to catch early (config loading) or > >>>>>>>>> are the > >>>>>>>>> remaining cases sufficiently rare and runtime or setup specific, > >>>>>>>>> that it is > >>>>>>>>> fine to handle them in component initialization? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz > >>>>>>>>> <[hidden email]> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi Stephan, > >>>>>>>>>> > >>>>>>>>>> Thank you for your opinion. > >>>>>>>>>> > >>>>>>>>>> Actually list/composite types are the topics we spent the > >>>>>>>>>> most of > >>>>> the > >>>>>>>>>> time. I understand that from a perspective of a full blown type > >>>>>>>>>> system, > >>>>>>>>>> a field like isList may look weird. Please let me elaborate a > >>>>>>>>>> bit > >>>>> more > >>>>>>>>>> on the reason behind it though. Maybe we weren't clear enough > >>>>>>>>>> about > >>>>> it > >>>>>>>>>> in the FLIP. The key feature of all the conifg options is > >>>>>>>>>> that they > >>>>>>>>>> must > >>>>>>>>>> have a string representation as they might come from a > >>> configuration > >>>>>>>>>> file. Moreover it must be a human readable format, so that the > >>>>> values > >>>>>>>>>> might be manually adjusted. Having that in mind we did not > >>>>>>>>>> want to > >>>>>>>>>> add a > >>>>>>>>>> support of an arbitrary nesting and we decided to allow for > >>>>>>>>>> lists > >>>>> only > >>>>>>>>>> (and flat objects - I think though in the current design > >>>>>>>>>> there is a > >>>>>>>>>> mistake around the Configurable interface). I think though > >>>>>>>>>> you have > >>>>> a > >>>>>>>>>> point here and it would be better to have a ListConfigOption > >>>>>>>>>> instead of > >>>>>>>>>> this field. Does it make sense to you? > >>>>>>>>>> > >>>>>>>>>> As for the second part of your message. I am not sure if I > >>>>> understood > >>>>>>>>>> it. The validators work with parse/deserialized values from > >>>>>>>>>> Configuration that means they can be bound to the generic > >>>>>>>>>> parameter > >>>>> of > >>>>>>>>>> Configuration. You can have a RangeValidator<? extends > >>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the > >>>>>>>>>> ConfigOption > >>>>>>>>>> has anything to do with the validation logic. Could you > >>>>>>>>>> elaborate a > >>>>>>>>>> bit > >>>>>>>>>> more what did you mean? > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> > >>>>>>>>>> Dawid > >>>>>>>>>> > >>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: > >>>>>>>>>>> I like the idea of enhancing the configuration and to do early > >>>>>>>>>> validation. > >>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, > >>>>>>>>>>> though. For > >>>>>>>>>>> example, having a boolean "isList" is a clear indication of not > >>>>>>>>>>> having > >>>>>>>>>>> thought through the type/category system. > >>>>>>>>>>> Also, having a more clear category system makes validation > >>> simpler. > >>>>>>>>>>> For example, I have seen systems distinguishing between numeric > >>>>>>>>>> parameters > >>>>>>>>>>> (valid ranges), category parameters (set of possible values), > >>>>>>>>>>> quantities > >>>>>>>>>>> like duration and memory size (need measure and unit), which > >>>>>>>>>>> results in > >>>>>>>>>> an > >>>>>>>>>>> elegant system for validation. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < > >>>>> [hidden email] > >>>>>>>>>> .invalid> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. > >>>>>>>>>>>> This allows the currently cluttered configuration of various > >>>>>>>>>>>> modules to be unified. > >>>>>>>>>>>> This is also first step of one of the keys to making new > >>>>>>>>>>>> unified > >>>>>>>>>>>> TableEnvironment available for production. > >>>>>>>>>>>> > >>>>>>>>>>>> Previously, we did encounter complex configurations, such as > >>>>>>>>>>>> specifying the skewed values of column in DDL. The skew may > >>>>>>>>>>>> be a single field or a combination of multiple fields. > >>>>>>>>>>>> So the > >>>>>>>>>>>> configuration is very troublesome. We used JSON string to > >>>>>>>>>>>> configure it. > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Jingsong Lee > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>> ------------------------------------------------------------------ > >>>>>>>>>>>> From:Jark Wu <[hidden email]> > >>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 > >>>>>>>>>>>> To:dev <[hidden email]> > >>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and > >>>>> Configuration > >>>>>>>>>>>> Thanks for starting this design Timo and Dawid, > >>>>>>>>>>>> > >>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a long > >>>>> time. > >>>>>>>>>>>> We have seen the benefit when developing blink > >>>>>>>>>>>> configurations and > >>>>>>>>>> connector > >>>>>>>>>>>> properties in 1.9 release. > >>>>>>>>>>>> Thanks for bringing it up and make such a detailed design. > >>>>>>>>>>>> I will leave my thoughts and comments there. > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> Jark > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email] > > > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi Timo, > >>>>>>>>>>>>> > >>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! > >>>>>>>>>>>>> > >>>>>>>>>>>>> Client API enhancement benefit from this evolution which > >>>>>>>>>>>>> hopefully provides a better view of configuration of Flink. > >>>>>>>>>>>>> In client API enhancement, we likely make the deployment > >>>>>>>>>>>>> of cluster and submission of job totally defined by > >>>>> configuration. > >>>>>>>>>>>>> Will take a look at the document in days. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> tison. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 > >>>>>>>>>>>>> 下午10:12写道: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Dawid and I are working on making parts of > >>>>>>>>>>>>>> ExecutionConfig and > >>>>>>>>>>>>>> TableConfig configurable via config options. This is > >>>>>>>>>>>>>> necessary > >>>>>>>>>>>>>> to make > >>>>>>>>>>>>>> all properties also available in SQL. Additionally, with the > >>>>>>>>>>>>>> new SQL > >>>>>>>>>>>> DDL > >>>>>>>>>>>>>> based on properties as well as more connectors and formats > >>>>>>>>>>>>>> coming up, > >>>>>>>>>>>>>> unified configuration becomes more important. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> We need more features around string-based configuration > >>>>>>>>>>>>>> in the > >>>>>>>>>>>>>> future, > >>>>>>>>>>>>>> which is why Dawid and I would like to propose FLIP-54 for > >>>>>>>>>>>>>> evolving > >>>>>>>>>> the > >>>>>>>>>>>>>> ConfigOption and Configuration classes: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>> > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > >>> > >>>>>>>>>>>>>> In summary it adds: > >>>>>>>>>>>>>> - documented types and validation > >>>>>>>>>>>>>> - more common types such as memory size, duration, list > >>>>>>>>>>>>>> - simple non-nested object types > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Looking forward to your feedback, > >>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>> > > > > |
Hi Becket,
1. First of all, you are totally right. The FLIP contains a bug due to the last minute changes that Dawid suggested: by having immutable objects created by a factory we loose the serializability of the Configuration because the factory itself is not stored in the Configuration. I would propose to revert the last change and stick to the original design, which means that a object must implement the Configurable interface and also implements serialization/deserialization methods such that also internal fields can be persisted as you suggested. But in general, people should not use any internal fields. Configurable objects are meant for simple little helper POJOs, not complex arbitrary nested data structures. It is Map<String, Object> because Configuration stores the raw objects. If you put a Boolean option into it, it remains Boolean. This makes the map very efficient for shipping to the cluster and accessing options multiple times. The same for configurable objects. We put the pure objects into the map without any serialization/deserialization. The provided factory allows to convert the Object into a Configuration and we know how to serialize/deserializise a configuration because it is just a key/value map. 2. Yes, this is what we had in mind. It should still be the same configuration option. We would like to avoid specialized option keys across components (exec.max-para and table.exec.max-para) if they are describing basically the same thing. But adding some more description like "TableOptions.MAX_PARALLELISM with description_1 + description_2" does not hurt. 3. They should restore the original object given that the toConfiguration/fromConfiguration methods have been implemented correctly. I will extend the example to make the logic clearer while fixing the bug. Thanks for the healthy discussion, Timo On 30.08.19 15:29, Becket Qin wrote: > Hi Timo, > > Thanks again for the clarification. Please see a few more questions below. > > Re: 1 > >> Please also keep in mind that Configuration must not consist of only >> strings, it manages a Map<String, Object> for efficient access. Every >> map entry can have a string representation for persistence, but in most >> cases consists of unserialized objects. > > I'd like to understand this a bit more. The reason we have a Map<String, > Object> in Configuration was because that Object could be either a String, > a List or a Map, right? But they eventually always boil down to Strings, or > maybe one of the predefined type that we know how to serialize. In the > current design, can the Object also be Configurable? > If the value in the config Map<String, Object> can be Configurable objects, > how do we serialize them? Calling toConfiguration() seems not quite working > because there might be some other internal fields that are not part of the > configuration. The modification to those fields will be lost if we simply > use toConfiguration(). So the impact of modifying those Configurable > objects seems a little undefined. And it would be difficult to prevent > users from doing that. > If the value in the config Map<String, Object> cannot be Configurable > objects, then it seems a little weird to have all the other ConfigType > stored in the ConfigMap in their own native type and accessed via > getInteger() / getBoolean(), etc, while having ConfigurableType to be > different from others because one have to use ConfigurableFactory to get > the configurations. > > Re: 2 > >> I think about the withExtendedDescription as a helper getter in a >> different place, so that the option is easier to find in a different >> module from it was defined. >> The MAX_PARALLELISM option in TableOptions would conceptually be equal to: >> public ConfigOption getMaxParallelismOption() { >> return CoreOptions.MAX_PARALLELISM; >> } > Just to make sure I understand it correctly, does that mean users will see > something like following? > - CoreOptions.MAX_PARALLELISM with description_1; > - TableOptions.MAX_PARALLELISM with description_1 + description_2. > - DataStreamOptions.MAX_PARALLELISM with description_1 + description_3. > And users will only configure exactly one MAX_PARALLELISM cross the board. > So they won't be confused by setting two MAX_PARALLELISM config for two > different modules, while they are actually the same config. If that is the > case, I don't have further concern. > > Re: 3 > Maybe I am thinking too much. I thought toBytes() / fromBytes() actually > restore the original Object. But fromConfiguration() and toConfiguration() > does not do that, anything not in the configuration of the original object > will be lost. So it would be good to make that clear. Maybe a clear Java > doc is sufficient. > > Thanks, > > Jiangjie (Becket) Qin > > On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz <[hidden email]> > wrote: > >> Hi, >> >> Ad. 1 >> >> The advantage of our approach is that you have the type definition close >> to the option definition. The only difference is that it enables >> expressing simple pojos with the primitives like ConfigOption<Integer>, >> ConfigOption<Long> etc. Otherwise as Timo said you will start having >> >> the parsing logic scattered everywhere in the code base as it is now. >> The string representation in our proposal is exactly the same as you >> explained for those three options. The only difference is that you don't >> have to parse the elements of a List, Map etc. afterwards. >> >> Ad. 2 >> >> I think about the withExtendedDescription as a helper getter in a >> different place, so that the option is easier to find in a different >> module from it was defined. >> >> The MAX_PARALLELISM option in TableOptions would conceptually be equal to: >> >> public ConfigOption getMaxParallelismOption() { >> >> return CoreOptions.MAX_PARALLELISM; >> >> } >> >> This allows to further clarify the description of the option in the >> context of a different module and end up in a seperate page in >> documentation (but with a link to the original one). In the end it is >> exactly the same option. It has exactly same key, type, parsing logic, >> it is in the end forwarded to the same place. >> >> Ad. 3 >> >> Not sure if I understand your concerns here. As Timo said it is in the >> end sth similar to toBytes/fromBytes, but it puts itself to a >> Configuration. Also just wanted to make sure we adjusted this part >> slightly and now the ConfigOption takes ConfigurableFactory. >> >> Best, >> >> Dawid >> >> >> On 30/08/2019 09:39, Timo Walther wrote: >>> Hi Becket, >>> >>> thanks for the discussion. >>> >>> 1. ConfigOptions in their current design are bound to classes. >>> Regarding, the option is "creating some Configurable objects instead >>> of defining the config to create >>> those Configurable"? We just moved this logic to a factory, this >>> factory can then also be used for other purposes. However, how the >>> option and objects are serialized to Configuration is still not part >>> of the option. The option is just pure declaration. >>> >>> If we would allow only List<String>, implementers would need to start >>> implementing own parsing and validation logic all the time. We would >>> like to avoid that. >>> >>> Please also keep in mind that Configuration must not consist of only >>> strings, it manages a Map<String, Object> for efficient access. Every >>> map entry can have a string representation for persistence, but in >>> most cases consists of unserialized objects. >>> >>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite >>> keys, types or default values. But different layers might want to add >>> helpful information. In our concrete use case for FLIP-59, >>> ExecutionConfig has 50 properties and many of them are not relevant >>> for the Table layer or have no effect at all. We would like to list >>> and mention the most important config options again in the Table >>> Configuration section, so that users are not confused, but with a >>> strong link to the core option. E.g.: registered kryo serializers are >>> also important also for Table users, we would like to add the comment >>> "This option allows to modify the serialization of the ANY SQL data >>> type.". I think we should not spam the core configuration page with >>> comments from all layers, connectors, or libraries but keep this in >>> the corresponding component documentation. >>> >>> 3. But it is something like fromBytes() and toBytes()? It serializes >>> and deserializes T from a configuration? >>> >>> Regards, >>> Timo >>> >>> On 29.08.19 19:14, Becket Qin wrote: >>>> Hi Timo and Stephan, >>>> >>>> Thanks for the detail explanation. >>>> >>>> 1. I agree that each config should be in a human readable format. My >>>> concern is that the current List<Configurable> looks going a little >>>> too far >>>> from what the configuration is supposed to do. They are essentially >>>> creating some Configurable objects instead of defining the config to >>>> create >>>> those Configurable. This mixes ConfigOption and the usage of it. API >>>> wise >>>> it would be good to keep the configs and their usages (such as how to >>>> create objects using the ConfigOption) apart from each other. >>>> I am wondering if we can just make List also only take string. For >>>> example, >>>> is the following definition of map and list sufficient? >>>> >>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be >>>> defined >>>> as: >>>> map_config_name: k1=v1, k2=v2, k3=v3, ... >>>> >>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined as: >>>> list_config_name: v1, v2, v3, ... >>>> >>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, String>>. It >>>> can >>>> be defined as: >>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... >>>> >>>> All the key and values in the configuration are String. This also >>>> guarantees that the configuration is always serializable. >>>> If we want to do one more step, we can allow the ConfigOption to set all >>>> the primitive types and parse that for the users. So something like >>>> List<Integer>, List<Class<?>> seems fine. >>>> >>>> The configuration class could also have util methods to create a list of >>>> configurable such as: >>>> <T> List<T> <Configuration#getConfigurableInstances(ListMapConfigOption, >>>> Class<T> clazz). >>>> But the configuration class will not take arbitrary Configurable as the >>>> value of its config. >>>> >>>> 2. I might have misunderstood this. But my concern on the description >>>> extension is in the following example. >>>> >>>> public static final ConfigOption<Integer> MAX_PARALLELISM = >>>> >>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( >>>> "Note: That this property means that a table program has a side-effect >>>> XYZ."); >>>> >>>> In this case, we will have two MAX_PARALLELISM configs now. One is >>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I suppose >>>> users >>>> will see both configurations. One with an extended description and one >>>> without. Let's say there is a third component which also users >>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM >>>> ConfigOption? If >>>> so, what would that ConfigOption's description look like? >>>> >>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM >>>> and the >>>> description should clearly state all the usage of this ConfigOption. >>>> >>>> 3. I see, in that case, how about we name it something like >>>> extractConfiguration()? I am just trying to see if we can make it clear >>>> this is not something like fromBytes() and toBytes(). >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]> >> wrote: >>>>> Hi Becket, >>>>> >>>>> let me try to clarify some of your questions: >>>>> >>>>> 1. For every option, we also needed to think about how to represent it >>>>> in a human readable format. We do not want to allow arbitrary nesting >>>>> because that would easily allow to bypass the flattened hierarchy of >>>>> config options (`session.memory.min`). The current design allows to >>>>> represent every option type as a list. E.g.: >>>>> >>>>> `myIntOption: 12` can be `myIntListOption: 12;12` >>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption: >>>>> field=12,other=true; field=12,other=true` >>>>> `myPropertyOption: key=str0,other=str1` can be `myPropertyListOption: >>>>> key=str0,other=str1;key=str0,other=str1` >>>>> >>>>> We need the atomic class for serialization/deserialization both in >>>>> binary and string format. >>>>> >>>>> ConfigOption<List> is not present in the code base yet, but this >>>>> FLIP is >>>>> a preparation of making ExecutionConfig configurable. If you look into >>>>> this class or also in existing table connectors/formats, you will see >>>>> that each proposed option type has its requirements. >>>>> >>>>> 2. Regarding extending the description of ConfigOptions, the >>>>> semantic of >>>>> one option should be a super set of the other option. E.g. in Table API >>>>> we might use general ExecutionConfig properties. But we would like >>>>> to a) >>>>> make external options more prominent in the Table API config docs to >>>>> link people to properties they should pay attention b) notice about >>>>> side >>>>> effects. The core semantic of a property should not change. >>>>> >>>>> 3. The factory will not receive the entire configuration but works in a >>>>> separate key space. For `myObjectOption` above, it would receive a >>>>> configuration that consists of `field: 12` and `other: true`. >>>>> >>>>> I agree. I will convert the document into a Wiki page today. >>>>> >>>>> Thanks, >>>>> Timo >>>>> >>>>> On 29.08.19 09:00, Stephan Ewen wrote: >>>>>> @Becket One thing that may be non-obvious is that the Configuration >>>>>> class >>>>>> also defines serialization / persistence logic at the moment. So it >>>>>> needs >>>>>> to know the set of types it supports. That stands in the way of an >>>>>> arbitrary generic map type. >>>>>> >>>>>> @Timo I agree though that it seems a bit inconsistent to have one >>>>>> collection orthogonal to the type (List) and another one bound to >>>>> specific >>>>>> types (Map). >>>>>> >>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email]> >>>>>> wrote: >>>>>> >>>>>>> Hi Timo, >>>>>>> >>>>>>> Thanks for the proposal. Sorry for the late comments, but I have a >>>>>>> few >>>>>>> questions / comments. >>>>>>> >>>>>>> 1. Is a new field of isList necessary in the ConfigOption? >>>>>>> Would it be enough to just check the atomicClass to see if it is a >>>>>>> List >>>>> or >>>>>>> not? >>>>>>> Also, in the ConfigOption<Map> class case, are we always assume >>>>>>> both key >>>>>>> and value types are String? Can we just apply the same to the >>>>>>> ConfigOption<List>? >>>>>>> BTW, I did a quick search in the codebase but did not find any >>>>>>> usage of >>>>>>> ConfigOption<List>. >>>>>>> >>>>>>> 2. The same config name, but with two ConfigOption with different >>>>> semantic >>>>>>> in different component seems super confusing. For example, when users >>>>> set >>>>>>> both configs, they may have no idea one is overriding the other. >>>>>>> There >>>>>>> might be two cases: >>>>>>> - If it is just the same config used by different components to >>>>>>> act >>>>>>> accordingly, it might be better to just have one config, but describe >>>>>>> clearly on how that config will be used. >>>>>>> - If it is actually two configurations that can be set >>>>>>> differently, I >>>>>>> think the config names should just be different. >>>>>>> >>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() method >>>>>>> pretty much means getConfiguration()? The toConfiguration() method >>>>> sounds >>>>>>> like converting an object to a configuration, which only works if the >>>>>>> object does not contain any state / value. I am also wondering if >>>>>>> there >>>>> is >>>>>>> a real use case of this method. Because supposedly the configurations >>>>> could >>>>>>> just be passed around to caller of this method. >>>>>>> >>>>>>> Also, can you put the proposal into the FLIP wiki instead of in the >>>>>>> doc before voting? The FLIP wiki allows track the modification >>>>>>> history >>>>> and >>>>>>> has a more established structure to ensure nothing is missed. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Jiangjie (Becket) Qin >>>>>>> >>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther <[hidden email]> >>>>> wrote: >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> I updated the FLIP proposal one more time as mentioned in the voting >>>>>>>> thread. If there are no objections, I will start a new voting thread >>>>>>>> tomorrow at 9am Berlin time. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Timo >>>>>>>> >>>>>>>> >>>>>>>> On 22.08.19 14:19, Timo Walther wrote: >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> thanks for all the feedback we have received online and offline. It >>>>>>>>> showed that many people support the idea of evolving the Flink >>>>>>>>> configuration functionality. I'm almost sure that this FLIP will >>>>>>>>> not >>>>>>>>> solve all issues but at least will improve the current status. >>>>>>>>> >>>>>>>>> We've updated the document and replaced the Correlation part >>>>>>>>> with the >>>>>>>>> concept of a ConfigOptionGroup that can provide all available >>>>>>>>> options >>>>>>>>> of a group plus custom group validators for eager validation. >>>>>>>>> For now, >>>>>>>>> this eager group validation will only be used at certain >>>>>>>>> locations in >>>>>>>>> the Flink code but it prepares for maybe validating the entire >>>>>>>>> global >>>>>>>>> configuration before submitting a job in the future. >>>>>>>>> >>>>>>>>> Please take another look if you find time. I hope we can proceed >>>>>>>>> with >>>>>>>>> the voting process if there are no objections. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Timo >>>>>>>>> >>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: >>>>>>>>>> Hi Stephan, >>>>>>>>>> >>>>>>>>>> thanks for your suggestions. Let me give you some background about >>>>>>>>>> the decisions made in this FLIP: >>>>>>>>>> >>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" because we did >>>>>>>>>> not want to change the entire configuration infrastructure. >>>>>>>>>> Both for >>>>>>>>>> backwards-compatibility reasons and the amount of work that >>>>>>>>>> would be >>>>>>>>>> required to update all options. If our goal is to rework the >>>>>>>>>> configuration option entirely, I might suggest to switch to JSON >>>>>>>>>> format with JSON schema and JSON validator. However, setting >>>>>>>>>> properties in a CLI or web interface becomes more tricky the more >>>>>>>>>> nested structures are allowed. >>>>>>>>>> >>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is >>>>>>>>>> centered >>>>>>>>>> around Java classes where T is determined by the default value. >>>>>>>>>> The >>>>>>>>>> FLIP just makes this more explicit by offering an explicit >>>>>>>>>> `intType()` method etc. The current design of validators centered >>>>>>>>>> around Java classes makes it possible to have typical domain >>>>>>>>>> validators baked by generics as you suggested. If we introduce >>>>>>>>>> types >>>>>>>>>> such as "quantity with measure and unit" we still need to get a >>>>>>>>>> class >>>>>>>>>> out of this option at the end, so why changing a proven concept? >>>>>>>>>> >>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary >>>>>>>>>> nesting. As >>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every >>>>>>>>>> atomic >>>>>>>>>> option like "key=12" can be represented by a list "keys=12;13". >>>>>>>>>> But >>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated list >>>>>>>>>> option >>>>>>>>>> would start making this more complicated such as >>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), >>>>>>>>>> StringOption(...)))", do we want that? >>>>>>>>>> >>>>>>>>>> 4. Correlation: The correlation part is one of the suggestions >>>>>>>>>> that I >>>>>>>>>> like least in the document. We can also discuss removing it >>>>>>>>>> entirely, >>>>>>>>>> but I think it solves the use case of relating options with each >>>>>>>>>> other in a flexible way right next to the actual option. >>>>>>>>>> Instead of >>>>>>>>>> being hidden in some component initialization, we should put it >>>>>>>>>> close >>>>>>>>>> to the option to also perform validation eagerly instead of >>>>>>>>>> failing >>>>>>>>>> at runtime when the option is accessed the first time. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Timo >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>>>>>>>>>> A "List Type" sounds like a good direction to me. >>>>>>>>>>> >>>>>>>>>>> The comment on the type system was a bit brief, I agree. The >>>>>>>>>>> idea is >>>>>>>>>>> to see >>>>>>>>>>> if something like that can ease validation. Especially the >>>>>>> correlation >>>>>>>>>>> system seems quite complex (proxies to work around order of >>>>>>>>>>> initialization). >>>>>>>>>>> >>>>>>>>>>> For example, let's assume we don't think primarily about "java >>>>>>>>>>> types" but >>>>>>>>>>> would define types as one of the following (just examples, >>>>>>>>>>> haven't >>>>>>>>>>> thought >>>>>>>>>>> all the details through): >>>>>>>>>>> >>>>>>>>>>> (a) category type: implies string, and a fix set of possible >>>>>>> values. >>>>>>>>>>> Those would be passes and naturally make it into the docs and >>>>>>>>>>> validation. >>>>>>>>>>> Maps to a String or Enum in Java. >>>>>>>>>>> >>>>>>>>>>> (b) numeric integer type: implies long (or optionally >>>>>>>>>>> integer, >>>>> if >>>>>>>>>>> we want >>>>>>>>>>> to automatically check overflow / underflow). would take typical >>>>>>> domain >>>>>>>>>>> validators, like non-negative, etc. >>>>>>>>>>> >>>>>>>>>>> (c) numeric real type: same as above (double or float) >>>>>>>>>>> >>>>>>>>>>> (d) numeric interval type: either defined as an interval, or >>>>>>>>>>> references >>>>>>>>>>> other parameter by key. validation by valid interval. >>>>>>>>>>> >>>>>>>>>>> (e) quantity: a measure and a unit. separately parsable. The >>>>>>>>>>> measure's >>>>>>>>>>> type could be any of the numeric types above, with same >>>>>>>>>>> validation >>>>>>>>>>> rules. >>>>>>>>>>> >>>>>>>>>>> With a system like the above, would we still correlation >>>>>>>>>>> validators? >>>>>>>>>>> Are >>>>>>>>>>> there still cases that we need to catch early (config loading) or >>>>>>>>>>> are the >>>>>>>>>>> remaining cases sufficiently rare and runtime or setup specific, >>>>>>>>>>> that it is >>>>>>>>>>> fine to handle them in component initialization? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>>>>>>>>>> <[hidden email]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>> >>>>>>>>>>>> Thank you for your opinion. >>>>>>>>>>>> >>>>>>>>>>>> Actually list/composite types are the topics we spent the >>>>>>>>>>>> most of >>>>>>> the >>>>>>>>>>>> time. I understand that from a perspective of a full blown type >>>>>>>>>>>> system, >>>>>>>>>>>> a field like isList may look weird. Please let me elaborate a >>>>>>>>>>>> bit >>>>>>> more >>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear enough >>>>>>>>>>>> about >>>>>>> it >>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is >>>>>>>>>>>> that they >>>>>>>>>>>> must >>>>>>>>>>>> have a string representation as they might come from a >>>>> configuration >>>>>>>>>>>> file. Moreover it must be a human readable format, so that the >>>>>>> values >>>>>>>>>>>> might be manually adjusted. Having that in mind we did not >>>>>>>>>>>> want to >>>>>>>>>>>> add a >>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for >>>>>>>>>>>> lists >>>>>>> only >>>>>>>>>>>> (and flat objects - I think though in the current design >>>>>>>>>>>> there is a >>>>>>>>>>>> mistake around the Configurable interface). I think though >>>>>>>>>>>> you have >>>>>>> a >>>>>>>>>>>> point here and it would be better to have a ListConfigOption >>>>>>>>>>>> instead of >>>>>>>>>>>> this field. Does it make sense to you? >>>>>>>>>>>> >>>>>>>>>>>> As for the second part of your message. I am not sure if I >>>>>>> understood >>>>>>>>>>>> it. The validators work with parse/deserialized values from >>>>>>>>>>>> Configuration that means they can be bound to the generic >>>>>>>>>>>> parameter >>>>>>> of >>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends >>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the >>>>>>>>>>>> ConfigOption >>>>>>>>>>>> has anything to do with the validation logic. Could you >>>>>>>>>>>> elaborate a >>>>>>>>>>>> bit >>>>>>>>>>>> more what did you mean? >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> >>>>>>>>>>>> Dawid >>>>>>>>>>>> >>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>>>>>>>>>> I like the idea of enhancing the configuration and to do early >>>>>>>>>>>> validation. >>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, >>>>>>>>>>>>> though. For >>>>>>>>>>>>> example, having a boolean "isList" is a clear indication of not >>>>>>>>>>>>> having >>>>>>>>>>>>> thought through the type/category system. >>>>>>>>>>>>> Also, having a more clear category system makes validation >>>>> simpler. >>>>>>>>>>>>> For example, I have seen systems distinguishing between numeric >>>>>>>>>>>> parameters >>>>>>>>>>>>> (valid ranges), category parameters (set of possible values), >>>>>>>>>>>>> quantities >>>>>>>>>>>>> like duration and memory size (need measure and unit), which >>>>>>>>>>>>> results in >>>>>>>>>>>> an >>>>>>>>>>>>> elegant system for validation. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < >>>>>>> [hidden email] >>>>>>>>>>>> .invalid> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>>>>>>>>>> This allows the currently cluttered configuration of various >>>>>>>>>>>>>> modules to be unified. >>>>>>>>>>>>>> This is also first step of one of the keys to making new >>>>>>>>>>>>>> unified >>>>>>>>>>>>>> TableEnvironment available for production. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Previously, we did encounter complex configurations, such as >>>>>>>>>>>>>> specifying the skewed values of column in DDL. The skew may >>>>>>>>>>>>>> be a single field or a combination of multiple fields. >>>>>>>>>>>>>> So the >>>>>>>>>>>>>> configuration is very troublesome. We used JSON string to >>>>>>>>>>>>>> configure it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Jingsong Lee >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>> From:Jark Wu <[hidden email]> >>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>>>>>>>>>> To:dev <[hidden email]> >>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and >>>>>>> Configuration >>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a long >>>>>>> time. >>>>>>>>>>>>>> We have seen the benefit when developing blink >>>>>>>>>>>>>> configurations and >>>>>>>>>>>> connector >>>>>>>>>>>>>> properties in 1.9 release. >>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed design. >>>>>>>>>>>>>> I will leave my thoughts and comments there. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Jark >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email] >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which >>>>>>>>>>>>>>> hopefully provides a better view of configuration of Flink. >>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment >>>>>>>>>>>>>>> of cluster and submission of job totally defined by >>>>>>> configuration. >>>>>>>>>>>>>>> Will take a look at the document in days. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> tison. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 >>>>>>>>>>>>>>> 下午10:12写道: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dawid and I are working on making parts of >>>>>>>>>>>>>>>> ExecutionConfig and >>>>>>>>>>>>>>>> TableConfig configurable via config options. This is >>>>>>>>>>>>>>>> necessary >>>>>>>>>>>>>>>> to make >>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, with the >>>>>>>>>>>>>>>> new SQL >>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>> based on properties as well as more connectors and formats >>>>>>>>>>>>>>>> coming up, >>>>>>>>>>>>>>>> unified configuration becomes more important. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We need more features around string-based configuration >>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>> future, >>>>>>>>>>>>>>>> which is why Dawid and I would like to propose FLIP-54 for >>>>>>>>>>>>>>>> evolving >>>>>>>>>>>> the >>>>>>>>>>>>>>>> ConfigOption and Configuration classes: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>>>>>>>>>>>>>>> In summary it adds: >>>>>>>>>>>>>>>> - documented types and validation >>>>>>>>>>>>>>>> - more common types such as memory size, duration, list >>>>>>>>>>>>>>>> - simple non-nested object types >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >> |
Hi Timo,
Thanks for the reply. I am still a little concerned over the mutability of the Configurable which could be the value in Configuration. Re: 1 > But in general, people should not use any internal fields. > Configurable objects are meant for simple little helper POJOs, not > complex arbitrary nested data structures. This seems difficult to enforce... Ideally the values in configurations should actually be immutable. The value can only be changed by explicitly calling setters in Configuration. Otherwise we may have weird situation where the Configurable in the same configuration are different in two places because the configurable is modified in one places and not modified in another place. So I am a little concerned on putting a Configurable type in the Configuration map, because the value could be modified without explicitly setting the configuration. For example, can users do the following? Configurable configurable = configuration.getConfigurable(myConfigurableOption); configurable.setConfigA(123); // this already changes the configurable object in the configuration. Re: 2 Thanks for confirming. As long as users will not have a situation where they need to set two configurations with the same key but different descriptions, I think it is OK. Re: 3 This is actually kind of related to 1. i.e. Whether toConfiguration() guarantees the exact same object can be rebuilt from the configuration or not. I am still not quite sure about the use case of toConfiguration() though. It seems indicating the Configurable is mutable, which might be dangerous. Thanks, Jiangjie (Becket) Qin On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <[hidden email]> wrote: > Hi Becket, > > 1. First of all, you are totally right. The FLIP contains a bug due to > the last minute changes that Dawid suggested: by having immutable > objects created by a factory we loose the serializability of the > Configuration because the factory itself is not stored in the > Configuration. I would propose to revert the last change and stick to > the original design, which means that a object must implement the > Configurable interface and also implements serialization/deserialization > methods such that also internal fields can be persisted as you > suggested. But in general, people should not use any internal fields. > Configurable objects are meant for simple little helper POJOs, not > complex arbitrary nested data structures. > > It is Map<String, Object> because Configuration stores the raw objects. > If you put a Boolean option into it, it remains Boolean. This makes the > map very efficient for shipping to the cluster and accessing options > multiple times. The same for configurable objects. We put the pure > objects into the map without any serialization/deserialization. The > provided factory allows to convert the Object into a Configuration and > we know how to serialize/deserializise a configuration because it is > just a key/value map. > > 2. Yes, this is what we had in mind. It should still be the same > configuration option. We would like to avoid specialized option keys > across components (exec.max-para and table.exec.max-para) if they are > describing basically the same thing. But adding some more description > like "TableOptions.MAX_PARALLELISM with description_1 + description_2" > does not hurt. > > 3. They should restore the original object given that the > toConfiguration/fromConfiguration methods have been implemented > correctly. I will extend the example to make the logic clearer while > fixing the bug. > > Thanks for the healthy discussion, > Timo > > > On 30.08.19 15:29, Becket Qin wrote: > > Hi Timo, > > > > Thanks again for the clarification. Please see a few more questions > below. > > > > Re: 1 > > > >> Please also keep in mind that Configuration must not consist of only > >> strings, it manages a Map<String, Object> for efficient access. Every > >> map entry can have a string representation for persistence, but in most > >> cases consists of unserialized objects. > > > > I'd like to understand this a bit more. The reason we have a Map<String, > > Object> in Configuration was because that Object could be either a > String, > > a List or a Map, right? But they eventually always boil down to Strings, > or > > maybe one of the predefined type that we know how to serialize. In the > > current design, can the Object also be Configurable? > > If the value in the config Map<String, Object> can be Configurable > objects, > > how do we serialize them? Calling toConfiguration() seems not quite > working > > because there might be some other internal fields that are not part of > the > > configuration. The modification to those fields will be lost if we simply > > use toConfiguration(). So the impact of modifying those Configurable > > objects seems a little undefined. And it would be difficult to prevent > > users from doing that. > > If the value in the config Map<String, Object> cannot be Configurable > > objects, then it seems a little weird to have all the other ConfigType > > stored in the ConfigMap in their own native type and accessed via > > getInteger() / getBoolean(), etc, while having ConfigurableType to be > > different from others because one have to use ConfigurableFactory to get > > the configurations. > > > > Re: 2 > > > >> I think about the withExtendedDescription as a helper getter in a > >> different place, so that the option is easier to find in a different > >> module from it was defined. > >> The MAX_PARALLELISM option in TableOptions would conceptually be equal > to: > >> public ConfigOption getMaxParallelismOption() { > >> return CoreOptions.MAX_PARALLELISM; > >> } > > Just to make sure I understand it correctly, does that mean users will > see > > something like following? > > - CoreOptions.MAX_PARALLELISM with description_1; > > - TableOptions.MAX_PARALLELISM with description_1 + description_2. > > - DataStreamOptions.MAX_PARALLELISM with description_1 + description_3. > > And users will only configure exactly one MAX_PARALLELISM cross the > board. > > So they won't be confused by setting two MAX_PARALLELISM config for two > > different modules, while they are actually the same config. If that is > the > > case, I don't have further concern. > > > > Re: 3 > > Maybe I am thinking too much. I thought toBytes() / fromBytes() actually > > restore the original Object. But fromConfiguration() and > toConfiguration() > > does not do that, anything not in the configuration of the original > object > > will be lost. So it would be good to make that clear. Maybe a clear Java > > doc is sufficient. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz <[hidden email] > > > > wrote: > > > >> Hi, > >> > >> Ad. 1 > >> > >> The advantage of our approach is that you have the type definition close > >> to the option definition. The only difference is that it enables > >> expressing simple pojos with the primitives like ConfigOption<Integer>, > >> ConfigOption<Long> etc. Otherwise as Timo said you will start having > >> > >> the parsing logic scattered everywhere in the code base as it is now. > >> The string representation in our proposal is exactly the same as you > >> explained for those three options. The only difference is that you don't > >> have to parse the elements of a List, Map etc. afterwards. > >> > >> Ad. 2 > >> > >> I think about the withExtendedDescription as a helper getter in a > >> different place, so that the option is easier to find in a different > >> module from it was defined. > >> > >> The MAX_PARALLELISM option in TableOptions would conceptually be equal > to: > >> > >> public ConfigOption getMaxParallelismOption() { > >> > >> return CoreOptions.MAX_PARALLELISM; > >> > >> } > >> > >> This allows to further clarify the description of the option in the > >> context of a different module and end up in a seperate page in > >> documentation (but with a link to the original one). In the end it is > >> exactly the same option. It has exactly same key, type, parsing logic, > >> it is in the end forwarded to the same place. > >> > >> Ad. 3 > >> > >> Not sure if I understand your concerns here. As Timo said it is in the > >> end sth similar to toBytes/fromBytes, but it puts itself to a > >> Configuration. Also just wanted to make sure we adjusted this part > >> slightly and now the ConfigOption takes ConfigurableFactory. > >> > >> Best, > >> > >> Dawid > >> > >> > >> On 30/08/2019 09:39, Timo Walther wrote: > >>> Hi Becket, > >>> > >>> thanks for the discussion. > >>> > >>> 1. ConfigOptions in their current design are bound to classes. > >>> Regarding, the option is "creating some Configurable objects instead > >>> of defining the config to create > >>> those Configurable"? We just moved this logic to a factory, this > >>> factory can then also be used for other purposes. However, how the > >>> option and objects are serialized to Configuration is still not part > >>> of the option. The option is just pure declaration. > >>> > >>> If we would allow only List<String>, implementers would need to start > >>> implementing own parsing and validation logic all the time. We would > >>> like to avoid that. > >>> > >>> Please also keep in mind that Configuration must not consist of only > >>> strings, it manages a Map<String, Object> for efficient access. Every > >>> map entry can have a string representation for persistence, but in > >>> most cases consists of unserialized objects. > >>> > >>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite > >>> keys, types or default values. But different layers might want to add > >>> helpful information. In our concrete use case for FLIP-59, > >>> ExecutionConfig has 50 properties and many of them are not relevant > >>> for the Table layer or have no effect at all. We would like to list > >>> and mention the most important config options again in the Table > >>> Configuration section, so that users are not confused, but with a > >>> strong link to the core option. E.g.: registered kryo serializers are > >>> also important also for Table users, we would like to add the comment > >>> "This option allows to modify the serialization of the ANY SQL data > >>> type.". I think we should not spam the core configuration page with > >>> comments from all layers, connectors, or libraries but keep this in > >>> the corresponding component documentation. > >>> > >>> 3. But it is something like fromBytes() and toBytes()? It serializes > >>> and deserializes T from a configuration? > >>> > >>> Regards, > >>> Timo > >>> > >>> On 29.08.19 19:14, Becket Qin wrote: > >>>> Hi Timo and Stephan, > >>>> > >>>> Thanks for the detail explanation. > >>>> > >>>> 1. I agree that each config should be in a human readable format. My > >>>> concern is that the current List<Configurable> looks going a little > >>>> too far > >>>> from what the configuration is supposed to do. They are essentially > >>>> creating some Configurable objects instead of defining the config to > >>>> create > >>>> those Configurable. This mixes ConfigOption and the usage of it. API > >>>> wise > >>>> it would be good to keep the configs and their usages (such as how to > >>>> create objects using the ConfigOption) apart from each other. > >>>> I am wondering if we can just make List also only take string. For > >>>> example, > >>>> is the following definition of map and list sufficient? > >>>> > >>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be > >>>> defined > >>>> as: > >>>> map_config_name: k1=v1, k2=v2, k3=v3, ... > >>>> > >>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined > as: > >>>> list_config_name: v1, v2, v3, ... > >>>> > >>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, String>>. It > >>>> can > >>>> be defined as: > >>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... > >>>> > >>>> All the key and values in the configuration are String. This also > >>>> guarantees that the configuration is always serializable. > >>>> If we want to do one more step, we can allow the ConfigOption to set > all > >>>> the primitive types and parse that for the users. So something like > >>>> List<Integer>, List<Class<?>> seems fine. > >>>> > >>>> The configuration class could also have util methods to create a list > of > >>>> configurable such as: > >>>> <T> List<T> > <Configuration#getConfigurableInstances(ListMapConfigOption, > >>>> Class<T> clazz). > >>>> But the configuration class will not take arbitrary Configurable as > the > >>>> value of its config. > >>>> > >>>> 2. I might have misunderstood this. But my concern on the description > >>>> extension is in the following example. > >>>> > >>>> public static final ConfigOption<Integer> MAX_PARALLELISM = > >>>> > >>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( > >>>> "Note: That this property means that a table program has a side-effect > >>>> XYZ."); > >>>> > >>>> In this case, we will have two MAX_PARALLELISM configs now. One is > >>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I suppose > >>>> users > >>>> will see both configurations. One with an extended description and one > >>>> without. Let's say there is a third component which also users > >>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM > >>>> ConfigOption? If > >>>> so, what would that ConfigOption's description look like? > >>>> > >>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM > >>>> and the > >>>> description should clearly state all the usage of this ConfigOption. > >>>> > >>>> 3. I see, in that case, how about we name it something like > >>>> extractConfiguration()? I am just trying to see if we can make it > clear > >>>> this is not something like fromBytes() and toBytes(). > >>>> > >>>> Thanks, > >>>> > >>>> Jiangjie (Becket) Qin > >>>> > >>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]> > >> wrote: > >>>>> Hi Becket, > >>>>> > >>>>> let me try to clarify some of your questions: > >>>>> > >>>>> 1. For every option, we also needed to think about how to represent > it > >>>>> in a human readable format. We do not want to allow arbitrary nesting > >>>>> because that would easily allow to bypass the flattened hierarchy of > >>>>> config options (`session.memory.min`). The current design allows to > >>>>> represent every option type as a list. E.g.: > >>>>> > >>>>> `myIntOption: 12` can be `myIntListOption: 12;12` > >>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption: > >>>>> field=12,other=true; field=12,other=true` > >>>>> `myPropertyOption: key=str0,other=str1` can be `myPropertyListOption: > >>>>> key=str0,other=str1;key=str0,other=str1` > >>>>> > >>>>> We need the atomic class for serialization/deserialization both in > >>>>> binary and string format. > >>>>> > >>>>> ConfigOption<List> is not present in the code base yet, but this > >>>>> FLIP is > >>>>> a preparation of making ExecutionConfig configurable. If you look > into > >>>>> this class or also in existing table connectors/formats, you will see > >>>>> that each proposed option type has its requirements. > >>>>> > >>>>> 2. Regarding extending the description of ConfigOptions, the > >>>>> semantic of > >>>>> one option should be a super set of the other option. E.g. in Table > API > >>>>> we might use general ExecutionConfig properties. But we would like > >>>>> to a) > >>>>> make external options more prominent in the Table API config docs to > >>>>> link people to properties they should pay attention b) notice about > >>>>> side > >>>>> effects. The core semantic of a property should not change. > >>>>> > >>>>> 3. The factory will not receive the entire configuration but works > in a > >>>>> separate key space. For `myObjectOption` above, it would receive a > >>>>> configuration that consists of `field: 12` and `other: true`. > >>>>> > >>>>> I agree. I will convert the document into a Wiki page today. > >>>>> > >>>>> Thanks, > >>>>> Timo > >>>>> > >>>>> On 29.08.19 09:00, Stephan Ewen wrote: > >>>>>> @Becket One thing that may be non-obvious is that the Configuration > >>>>>> class > >>>>>> also defines serialization / persistence logic at the moment. So it > >>>>>> needs > >>>>>> to know the set of types it supports. That stands in the way of an > >>>>>> arbitrary generic map type. > >>>>>> > >>>>>> @Timo I agree though that it seems a bit inconsistent to have one > >>>>>> collection orthogonal to the type (List) and another one bound to > >>>>> specific > >>>>>> types (Map). > >>>>>> > >>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email]> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi Timo, > >>>>>>> > >>>>>>> Thanks for the proposal. Sorry for the late comments, but I have a > >>>>>>> few > >>>>>>> questions / comments. > >>>>>>> > >>>>>>> 1. Is a new field of isList necessary in the ConfigOption? > >>>>>>> Would it be enough to just check the atomicClass to see if it is a > >>>>>>> List > >>>>> or > >>>>>>> not? > >>>>>>> Also, in the ConfigOption<Map> class case, are we always assume > >>>>>>> both key > >>>>>>> and value types are String? Can we just apply the same to the > >>>>>>> ConfigOption<List>? > >>>>>>> BTW, I did a quick search in the codebase but did not find any > >>>>>>> usage of > >>>>>>> ConfigOption<List>. > >>>>>>> > >>>>>>> 2. The same config name, but with two ConfigOption with different > >>>>> semantic > >>>>>>> in different component seems super confusing. For example, when > users > >>>>> set > >>>>>>> both configs, they may have no idea one is overriding the other. > >>>>>>> There > >>>>>>> might be two cases: > >>>>>>> - If it is just the same config used by different components to > >>>>>>> act > >>>>>>> accordingly, it might be better to just have one config, but > describe > >>>>>>> clearly on how that config will be used. > >>>>>>> - If it is actually two configurations that can be set > >>>>>>> differently, I > >>>>>>> think the config names should just be different. > >>>>>>> > >>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() > method > >>>>>>> pretty much means getConfiguration()? The toConfiguration() method > >>>>> sounds > >>>>>>> like converting an object to a configuration, which only works if > the > >>>>>>> object does not contain any state / value. I am also wondering if > >>>>>>> there > >>>>> is > >>>>>>> a real use case of this method. Because supposedly the > configurations > >>>>> could > >>>>>>> just be passed around to caller of this method. > >>>>>>> > >>>>>>> Also, can you put the proposal into the FLIP wiki instead of in the > >>>>>>> doc before voting? The FLIP wiki allows track the modification > >>>>>>> history > >>>>> and > >>>>>>> has a more established structure to ensure nothing is missed. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Jiangjie (Becket) Qin > >>>>>>> > >>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther <[hidden email]> > >>>>> wrote: > >>>>>>>> Hi everyone, > >>>>>>>> > >>>>>>>> I updated the FLIP proposal one more time as mentioned in the > voting > >>>>>>>> thread. If there are no objections, I will start a new voting > thread > >>>>>>>> tomorrow at 9am Berlin time. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Timo > >>>>>>>> > >>>>>>>> > >>>>>>>> On 22.08.19 14:19, Timo Walther wrote: > >>>>>>>>> Hi everyone, > >>>>>>>>> > >>>>>>>>> thanks for all the feedback we have received online and offline. > It > >>>>>>>>> showed that many people support the idea of evolving the Flink > >>>>>>>>> configuration functionality. I'm almost sure that this FLIP will > >>>>>>>>> not > >>>>>>>>> solve all issues but at least will improve the current status. > >>>>>>>>> > >>>>>>>>> We've updated the document and replaced the Correlation part > >>>>>>>>> with the > >>>>>>>>> concept of a ConfigOptionGroup that can provide all available > >>>>>>>>> options > >>>>>>>>> of a group plus custom group validators for eager validation. > >>>>>>>>> For now, > >>>>>>>>> this eager group validation will only be used at certain > >>>>>>>>> locations in > >>>>>>>>> the Flink code but it prepares for maybe validating the entire > >>>>>>>>> global > >>>>>>>>> configuration before submitting a job in the future. > >>>>>>>>> > >>>>>>>>> Please take another look if you find time. I hope we can proceed > >>>>>>>>> with > >>>>>>>>> the voting process if there are no objections. > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Timo > >>>>>>>>> > >>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: > >>>>>>>>>> Hi Stephan, > >>>>>>>>>> > >>>>>>>>>> thanks for your suggestions. Let me give you some background > about > >>>>>>>>>> the decisions made in this FLIP: > >>>>>>>>>> > >>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" because we > did > >>>>>>>>>> not want to change the entire configuration infrastructure. > >>>>>>>>>> Both for > >>>>>>>>>> backwards-compatibility reasons and the amount of work that > >>>>>>>>>> would be > >>>>>>>>>> required to update all options. If our goal is to rework the > >>>>>>>>>> configuration option entirely, I might suggest to switch to JSON > >>>>>>>>>> format with JSON schema and JSON validator. However, setting > >>>>>>>>>> properties in a CLI or web interface becomes more tricky the > more > >>>>>>>>>> nested structures are allowed. > >>>>>>>>>> > >>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is > >>>>>>>>>> centered > >>>>>>>>>> around Java classes where T is determined by the default value. > >>>>>>>>>> The > >>>>>>>>>> FLIP just makes this more explicit by offering an explicit > >>>>>>>>>> `intType()` method etc. The current design of validators > centered > >>>>>>>>>> around Java classes makes it possible to have typical domain > >>>>>>>>>> validators baked by generics as you suggested. If we introduce > >>>>>>>>>> types > >>>>>>>>>> such as "quantity with measure and unit" we still need to get a > >>>>>>>>>> class > >>>>>>>>>> out of this option at the end, so why changing a proven concept? > >>>>>>>>>> > >>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary > >>>>>>>>>> nesting. As > >>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every > >>>>>>>>>> atomic > >>>>>>>>>> option like "key=12" can be represented by a list "keys=12;13". > >>>>>>>>>> But > >>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated list > >>>>>>>>>> option > >>>>>>>>>> would start making this more complicated such as > >>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), > >>>>>>>>>> StringOption(...)))", do we want that? > >>>>>>>>>> > >>>>>>>>>> 4. Correlation: The correlation part is one of the suggestions > >>>>>>>>>> that I > >>>>>>>>>> like least in the document. We can also discuss removing it > >>>>>>>>>> entirely, > >>>>>>>>>> but I think it solves the use case of relating options with each > >>>>>>>>>> other in a flexible way right next to the actual option. > >>>>>>>>>> Instead of > >>>>>>>>>> being hidden in some component initialization, we should put it > >>>>>>>>>> close > >>>>>>>>>> to the option to also perform validation eagerly instead of > >>>>>>>>>> failing > >>>>>>>>>> at runtime when the option is accessed the first time. > >>>>>>>>>> > >>>>>>>>>> Regards, > >>>>>>>>>> Timo > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: > >>>>>>>>>>> A "List Type" sounds like a good direction to me. > >>>>>>>>>>> > >>>>>>>>>>> The comment on the type system was a bit brief, I agree. The > >>>>>>>>>>> idea is > >>>>>>>>>>> to see > >>>>>>>>>>> if something like that can ease validation. Especially the > >>>>>>> correlation > >>>>>>>>>>> system seems quite complex (proxies to work around order of > >>>>>>>>>>> initialization). > >>>>>>>>>>> > >>>>>>>>>>> For example, let's assume we don't think primarily about "java > >>>>>>>>>>> types" but > >>>>>>>>>>> would define types as one of the following (just examples, > >>>>>>>>>>> haven't > >>>>>>>>>>> thought > >>>>>>>>>>> all the details through): > >>>>>>>>>>> > >>>>>>>>>>> (a) category type: implies string, and a fix set of > possible > >>>>>>> values. > >>>>>>>>>>> Those would be passes and naturally make it into the docs and > >>>>>>>>>>> validation. > >>>>>>>>>>> Maps to a String or Enum in Java. > >>>>>>>>>>> > >>>>>>>>>>> (b) numeric integer type: implies long (or optionally > >>>>>>>>>>> integer, > >>>>> if > >>>>>>>>>>> we want > >>>>>>>>>>> to automatically check overflow / underflow). would take > typical > >>>>>>> domain > >>>>>>>>>>> validators, like non-negative, etc. > >>>>>>>>>>> > >>>>>>>>>>> (c) numeric real type: same as above (double or float) > >>>>>>>>>>> > >>>>>>>>>>> (d) numeric interval type: either defined as an > interval, or > >>>>>>>>>>> references > >>>>>>>>>>> other parameter by key. validation by valid interval. > >>>>>>>>>>> > >>>>>>>>>>> (e) quantity: a measure and a unit. separately parsable. > The > >>>>>>>>>>> measure's > >>>>>>>>>>> type could be any of the numeric types above, with same > >>>>>>>>>>> validation > >>>>>>>>>>> rules. > >>>>>>>>>>> > >>>>>>>>>>> With a system like the above, would we still correlation > >>>>>>>>>>> validators? > >>>>>>>>>>> Are > >>>>>>>>>>> there still cases that we need to catch early (config loading) > or > >>>>>>>>>>> are the > >>>>>>>>>>> remaining cases sufficiently rare and runtime or setup > specific, > >>>>>>>>>>> that it is > >>>>>>>>>>> fine to handle them in component initialization? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz > >>>>>>>>>>> <[hidden email]> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi Stephan, > >>>>>>>>>>>> > >>>>>>>>>>>> Thank you for your opinion. > >>>>>>>>>>>> > >>>>>>>>>>>> Actually list/composite types are the topics we spent the > >>>>>>>>>>>> most of > >>>>>>> the > >>>>>>>>>>>> time. I understand that from a perspective of a full blown > type > >>>>>>>>>>>> system, > >>>>>>>>>>>> a field like isList may look weird. Please let me elaborate a > >>>>>>>>>>>> bit > >>>>>>> more > >>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear enough > >>>>>>>>>>>> about > >>>>>>> it > >>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is > >>>>>>>>>>>> that they > >>>>>>>>>>>> must > >>>>>>>>>>>> have a string representation as they might come from a > >>>>> configuration > >>>>>>>>>>>> file. Moreover it must be a human readable format, so that the > >>>>>>> values > >>>>>>>>>>>> might be manually adjusted. Having that in mind we did not > >>>>>>>>>>>> want to > >>>>>>>>>>>> add a > >>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for > >>>>>>>>>>>> lists > >>>>>>> only > >>>>>>>>>>>> (and flat objects - I think though in the current design > >>>>>>>>>>>> there is a > >>>>>>>>>>>> mistake around the Configurable interface). I think though > >>>>>>>>>>>> you have > >>>>>>> a > >>>>>>>>>>>> point here and it would be better to have a ListConfigOption > >>>>>>>>>>>> instead of > >>>>>>>>>>>> this field. Does it make sense to you? > >>>>>>>>>>>> > >>>>>>>>>>>> As for the second part of your message. I am not sure if I > >>>>>>> understood > >>>>>>>>>>>> it. The validators work with parse/deserialized values from > >>>>>>>>>>>> Configuration that means they can be bound to the generic > >>>>>>>>>>>> parameter > >>>>>>> of > >>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends > >>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the > >>>>>>>>>>>> ConfigOption > >>>>>>>>>>>> has anything to do with the validation logic. Could you > >>>>>>>>>>>> elaborate a > >>>>>>>>>>>> bit > >>>>>>>>>>>> more what did you mean? > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> > >>>>>>>>>>>> Dawid > >>>>>>>>>>>> > >>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: > >>>>>>>>>>>>> I like the idea of enhancing the configuration and to do > early > >>>>>>>>>>>> validation. > >>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, > >>>>>>>>>>>>> though. For > >>>>>>>>>>>>> example, having a boolean "isList" is a clear indication of > not > >>>>>>>>>>>>> having > >>>>>>>>>>>>> thought through the type/category system. > >>>>>>>>>>>>> Also, having a more clear category system makes validation > >>>>> simpler. > >>>>>>>>>>>>> For example, I have seen systems distinguishing between > numeric > >>>>>>>>>>>> parameters > >>>>>>>>>>>>> (valid ranges), category parameters (set of possible values), > >>>>>>>>>>>>> quantities > >>>>>>>>>>>>> like duration and memory size (need measure and unit), which > >>>>>>>>>>>>> results in > >>>>>>>>>>>> an > >>>>>>>>>>>>> elegant system for validation. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < > >>>>>>> [hidden email] > >>>>>>>>>>>> .invalid> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. > >>>>>>>>>>>>>> This allows the currently cluttered configuration of various > >>>>>>>>>>>>>> modules to be unified. > >>>>>>>>>>>>>> This is also first step of one of the keys to making new > >>>>>>>>>>>>>> unified > >>>>>>>>>>>>>> TableEnvironment available for production. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Previously, we did encounter complex configurations, such as > >>>>>>>>>>>>>> specifying the skewed values of column in DDL. The skew may > >>>>>>>>>>>>>> be a single field or a combination of multiple fields. > >>>>>>>>>>>>>> So the > >>>>>>>>>>>>>> configuration is very troublesome. We used JSON string > to > >>>>>>>>>>>>>> configure it. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>> Jingsong Lee > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>> ------------------------------------------------------------------ > >>>>>>>>>>>>>> From:Jark Wu <[hidden email]> > >>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 > >>>>>>>>>>>>>> To:dev <[hidden email]> > >>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and > >>>>>>> Configuration > >>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a > long > >>>>>>> time. > >>>>>>>>>>>>>> We have seen the benefit when developing blink > >>>>>>>>>>>>>> configurations and > >>>>>>>>>>>> connector > >>>>>>>>>>>>>> properties in 1.9 release. > >>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed design. > >>>>>>>>>>>>>> I will leave my thoughts and comments there. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen < > [hidden email] > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi Timo, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which > >>>>>>>>>>>>>>> hopefully provides a better view of configuration of Flink. > >>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment > >>>>>>>>>>>>>>> of cluster and submission of job totally defined by > >>>>>>> configuration. > >>>>>>>>>>>>>>> Will take a look at the document in days. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> tison. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 > >>>>>>>>>>>>>>> 下午10:12写道: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Dawid and I are working on making parts of > >>>>>>>>>>>>>>>> ExecutionConfig and > >>>>>>>>>>>>>>>> TableConfig configurable via config options. This is > >>>>>>>>>>>>>>>> necessary > >>>>>>>>>>>>>>>> to make > >>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, with > the > >>>>>>>>>>>>>>>> new SQL > >>>>>>>>>>>>>> DDL > >>>>>>>>>>>>>>>> based on properties as well as more connectors and formats > >>>>>>>>>>>>>>>> coming up, > >>>>>>>>>>>>>>>> unified configuration becomes more important. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> We need more features around string-based configuration > >>>>>>>>>>>>>>>> in the > >>>>>>>>>>>>>>>> future, > >>>>>>>>>>>>>>>> which is why Dawid and I would like to propose FLIP-54 for > >>>>>>>>>>>>>>>> evolving > >>>>>>>>>>>> the > >>>>>>>>>>>>>>>> ConfigOption and Configuration classes: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >> > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > >>>>>>>>>>>>>>>> In summary it adds: > >>>>>>>>>>>>>>>> - documented types and validation > >>>>>>>>>>>>>>>> - more common types such as memory size, duration, list > >>>>>>>>>>>>>>>> - simple non-nested object types > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Looking forward to your feedback, > >>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >> > > |
Hi Becket,
Re 1 & 3: "values in configurations should actually be immutable" I would also prefer immutability but most of our configuration is mutable due to serialization/deserialization. Also maps and list could be mutable in theory. It is difficult to really enforce that for nested structures. I see two options: a) For the original design: How about we force implementers to add a duplicate() method in a Configurable object? This would mean that the object is still mutable but by duplicating the object both during reading and writing we would avoid the problem you described. b) For the current design: We still use the factory approach but let a Configurable object implement a getFactory() method such that we know how to serialize the object. With the help of a factory we can also duplicate the object easily during reading and writing and ensure immutability. I would personally go for approach a) to not over-engineer this topic. But I'm open for option b). Regards, Timo On 31.08.19 04:09, Becket Qin wrote: > Hi Timo, > > Thanks for the reply. I am still a little concerned over the mutability of > the Configurable which could be the value in Configuration. > > Re: 1 > >> But in general, people should not use any internal fields. >> Configurable objects are meant for simple little helper POJOs, not >> complex arbitrary nested data structures. > This seems difficult to enforce... Ideally the values in configurations > should actually be immutable. The value can only be changed by explicitly > calling setters in Configuration. Otherwise we may have weird situation > where the Configurable in the same configuration are different in two > places because the configurable is modified in one places and not modified > in another place. So I am a little concerned on putting a Configurable type > in the Configuration map, because the value could be modified without > explicitly setting the configuration. For example, can users do the > following? > > Configurable configurable = > configuration.getConfigurable(myConfigurableOption); > configurable.setConfigA(123); // this already changes the configurable > object in the configuration. > > Re: 2 > Thanks for confirming. As long as users will not have a situation where > they need to set two configurations with the same key but different > descriptions, I think it is OK. > > Re: 3 > This is actually kind of related to 1. i.e. Whether toConfiguration() > guarantees the exact same object can be rebuilt from the configuration or > not. I am still not quite sure about the use case of toConfiguration() > though. It seems indicating the Configurable is mutable, which might be > dangerous. > > Thanks, > > Jiangjie (Becket) Qin > > On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <[hidden email]> wrote: > >> Hi Becket, >> >> 1. First of all, you are totally right. The FLIP contains a bug due to >> the last minute changes that Dawid suggested: by having immutable >> objects created by a factory we loose the serializability of the >> Configuration because the factory itself is not stored in the >> Configuration. I would propose to revert the last change and stick to >> the original design, which means that a object must implement the >> Configurable interface and also implements serialization/deserialization >> methods such that also internal fields can be persisted as you >> suggested. But in general, people should not use any internal fields. >> Configurable objects are meant for simple little helper POJOs, not >> complex arbitrary nested data structures. >> >> It is Map<String, Object> because Configuration stores the raw objects. >> If you put a Boolean option into it, it remains Boolean. This makes the >> map very efficient for shipping to the cluster and accessing options >> multiple times. The same for configurable objects. We put the pure >> objects into the map without any serialization/deserialization. The >> provided factory allows to convert the Object into a Configuration and >> we know how to serialize/deserializise a configuration because it is >> just a key/value map. >> >> 2. Yes, this is what we had in mind. It should still be the same >> configuration option. We would like to avoid specialized option keys >> across components (exec.max-para and table.exec.max-para) if they are >> describing basically the same thing. But adding some more description >> like "TableOptions.MAX_PARALLELISM with description_1 + description_2" >> does not hurt. >> >> 3. They should restore the original object given that the >> toConfiguration/fromConfiguration methods have been implemented >> correctly. I will extend the example to make the logic clearer while >> fixing the bug. >> >> Thanks for the healthy discussion, >> Timo >> >> >> On 30.08.19 15:29, Becket Qin wrote: >>> Hi Timo, >>> >>> Thanks again for the clarification. Please see a few more questions >> below. >>> Re: 1 >>> >>>> Please also keep in mind that Configuration must not consist of only >>>> strings, it manages a Map<String, Object> for efficient access. Every >>>> map entry can have a string representation for persistence, but in most >>>> cases consists of unserialized objects. >>> I'd like to understand this a bit more. The reason we have a Map<String, >>> Object> in Configuration was because that Object could be either a >> String, >>> a List or a Map, right? But they eventually always boil down to Strings, >> or >>> maybe one of the predefined type that we know how to serialize. In the >>> current design, can the Object also be Configurable? >>> If the value in the config Map<String, Object> can be Configurable >> objects, >>> how do we serialize them? Calling toConfiguration() seems not quite >> working >>> because there might be some other internal fields that are not part of >> the >>> configuration. The modification to those fields will be lost if we simply >>> use toConfiguration(). So the impact of modifying those Configurable >>> objects seems a little undefined. And it would be difficult to prevent >>> users from doing that. >>> If the value in the config Map<String, Object> cannot be Configurable >>> objects, then it seems a little weird to have all the other ConfigType >>> stored in the ConfigMap in their own native type and accessed via >>> getInteger() / getBoolean(), etc, while having ConfigurableType to be >>> different from others because one have to use ConfigurableFactory to get >>> the configurations. >>> >>> Re: 2 >>> >>>> I think about the withExtendedDescription as a helper getter in a >>>> different place, so that the option is easier to find in a different >>>> module from it was defined. >>>> The MAX_PARALLELISM option in TableOptions would conceptually be equal >> to: >>>> public ConfigOption getMaxParallelismOption() { >>>> return CoreOptions.MAX_PARALLELISM; >>>> } >>> Just to make sure I understand it correctly, does that mean users will >> see >>> something like following? >>> - CoreOptions.MAX_PARALLELISM with description_1; >>> - TableOptions.MAX_PARALLELISM with description_1 + description_2. >>> - DataStreamOptions.MAX_PARALLELISM with description_1 + description_3. >>> And users will only configure exactly one MAX_PARALLELISM cross the >> board. >>> So they won't be confused by setting two MAX_PARALLELISM config for two >>> different modules, while they are actually the same config. If that is >> the >>> case, I don't have further concern. >>> >>> Re: 3 >>> Maybe I am thinking too much. I thought toBytes() / fromBytes() actually >>> restore the original Object. But fromConfiguration() and >> toConfiguration() >>> does not do that, anything not in the configuration of the original >> object >>> will be lost. So it would be good to make that clear. Maybe a clear Java >>> doc is sufficient. >>> >>> Thanks, >>> >>> Jiangjie (Becket) Qin >>> >>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz <[hidden email] >>> >>> wrote: >>> >>>> Hi, >>>> >>>> Ad. 1 >>>> >>>> The advantage of our approach is that you have the type definition close >>>> to the option definition. The only difference is that it enables >>>> expressing simple pojos with the primitives like ConfigOption<Integer>, >>>> ConfigOption<Long> etc. Otherwise as Timo said you will start having >>>> >>>> the parsing logic scattered everywhere in the code base as it is now. >>>> The string representation in our proposal is exactly the same as you >>>> explained for those three options. The only difference is that you don't >>>> have to parse the elements of a List, Map etc. afterwards. >>>> >>>> Ad. 2 >>>> >>>> I think about the withExtendedDescription as a helper getter in a >>>> different place, so that the option is easier to find in a different >>>> module from it was defined. >>>> >>>> The MAX_PARALLELISM option in TableOptions would conceptually be equal >> to: >>>> public ConfigOption getMaxParallelismOption() { >>>> >>>> return CoreOptions.MAX_PARALLELISM; >>>> >>>> } >>>> >>>> This allows to further clarify the description of the option in the >>>> context of a different module and end up in a seperate page in >>>> documentation (but with a link to the original one). In the end it is >>>> exactly the same option. It has exactly same key, type, parsing logic, >>>> it is in the end forwarded to the same place. >>>> >>>> Ad. 3 >>>> >>>> Not sure if I understand your concerns here. As Timo said it is in the >>>> end sth similar to toBytes/fromBytes, but it puts itself to a >>>> Configuration. Also just wanted to make sure we adjusted this part >>>> slightly and now the ConfigOption takes ConfigurableFactory. >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> >>>> On 30/08/2019 09:39, Timo Walther wrote: >>>>> Hi Becket, >>>>> >>>>> thanks for the discussion. >>>>> >>>>> 1. ConfigOptions in their current design are bound to classes. >>>>> Regarding, the option is "creating some Configurable objects instead >>>>> of defining the config to create >>>>> those Configurable"? We just moved this logic to a factory, this >>>>> factory can then also be used for other purposes. However, how the >>>>> option and objects are serialized to Configuration is still not part >>>>> of the option. The option is just pure declaration. >>>>> >>>>> If we would allow only List<String>, implementers would need to start >>>>> implementing own parsing and validation logic all the time. We would >>>>> like to avoid that. >>>>> >>>>> Please also keep in mind that Configuration must not consist of only >>>>> strings, it manages a Map<String, Object> for efficient access. Every >>>>> map entry can have a string representation for persistence, but in >>>>> most cases consists of unserialized objects. >>>>> >>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite >>>>> keys, types or default values. But different layers might want to add >>>>> helpful information. In our concrete use case for FLIP-59, >>>>> ExecutionConfig has 50 properties and many of them are not relevant >>>>> for the Table layer or have no effect at all. We would like to list >>>>> and mention the most important config options again in the Table >>>>> Configuration section, so that users are not confused, but with a >>>>> strong link to the core option. E.g.: registered kryo serializers are >>>>> also important also for Table users, we would like to add the comment >>>>> "This option allows to modify the serialization of the ANY SQL data >>>>> type.". I think we should not spam the core configuration page with >>>>> comments from all layers, connectors, or libraries but keep this in >>>>> the corresponding component documentation. >>>>> >>>>> 3. But it is something like fromBytes() and toBytes()? It serializes >>>>> and deserializes T from a configuration? >>>>> >>>>> Regards, >>>>> Timo >>>>> >>>>> On 29.08.19 19:14, Becket Qin wrote: >>>>>> Hi Timo and Stephan, >>>>>> >>>>>> Thanks for the detail explanation. >>>>>> >>>>>> 1. I agree that each config should be in a human readable format. My >>>>>> concern is that the current List<Configurable> looks going a little >>>>>> too far >>>>>> from what the configuration is supposed to do. They are essentially >>>>>> creating some Configurable objects instead of defining the config to >>>>>> create >>>>>> those Configurable. This mixes ConfigOption and the usage of it. API >>>>>> wise >>>>>> it would be good to keep the configs and their usages (such as how to >>>>>> create objects using the ConfigOption) apart from each other. >>>>>> I am wondering if we can just make List also only take string. For >>>>>> example, >>>>>> is the following definition of map and list sufficient? >>>>>> >>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be >>>>>> defined >>>>>> as: >>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ... >>>>>> >>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined >> as: >>>>>> list_config_name: v1, v2, v3, ... >>>>>> >>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, String>>. It >>>>>> can >>>>>> be defined as: >>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... >>>>>> >>>>>> All the key and values in the configuration are String. This also >>>>>> guarantees that the configuration is always serializable. >>>>>> If we want to do one more step, we can allow the ConfigOption to set >> all >>>>>> the primitive types and parse that for the users. So something like >>>>>> List<Integer>, List<Class<?>> seems fine. >>>>>> >>>>>> The configuration class could also have util methods to create a list >> of >>>>>> configurable such as: >>>>>> <T> List<T> >> <Configuration#getConfigurableInstances(ListMapConfigOption, >>>>>> Class<T> clazz). >>>>>> But the configuration class will not take arbitrary Configurable as >> the >>>>>> value of its config. >>>>>> >>>>>> 2. I might have misunderstood this. But my concern on the description >>>>>> extension is in the following example. >>>>>> >>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM = >>>>>> >>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( >>>>>> "Note: That this property means that a table program has a side-effect >>>>>> XYZ."); >>>>>> >>>>>> In this case, we will have two MAX_PARALLELISM configs now. One is >>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I suppose >>>>>> users >>>>>> will see both configurations. One with an extended description and one >>>>>> without. Let's say there is a third component which also users >>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM >>>>>> ConfigOption? If >>>>>> so, what would that ConfigOption's description look like? >>>>>> >>>>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM >>>>>> and the >>>>>> description should clearly state all the usage of this ConfigOption. >>>>>> >>>>>> 3. I see, in that case, how about we name it something like >>>>>> extractConfiguration()? I am just trying to see if we can make it >> clear >>>>>> this is not something like fromBytes() and toBytes(). >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jiangjie (Becket) Qin >>>>>> >>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]> >>>> wrote: >>>>>>> Hi Becket, >>>>>>> >>>>>>> let me try to clarify some of your questions: >>>>>>> >>>>>>> 1. For every option, we also needed to think about how to represent >> it >>>>>>> in a human readable format. We do not want to allow arbitrary nesting >>>>>>> because that would easily allow to bypass the flattened hierarchy of >>>>>>> config options (`session.memory.min`). The current design allows to >>>>>>> represent every option type as a list. E.g.: >>>>>>> >>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12` >>>>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption: >>>>>>> field=12,other=true; field=12,other=true` >>>>>>> `myPropertyOption: key=str0,other=str1` can be `myPropertyListOption: >>>>>>> key=str0,other=str1;key=str0,other=str1` >>>>>>> >>>>>>> We need the atomic class for serialization/deserialization both in >>>>>>> binary and string format. >>>>>>> >>>>>>> ConfigOption<List> is not present in the code base yet, but this >>>>>>> FLIP is >>>>>>> a preparation of making ExecutionConfig configurable. If you look >> into >>>>>>> this class or also in existing table connectors/formats, you will see >>>>>>> that each proposed option type has its requirements. >>>>>>> >>>>>>> 2. Regarding extending the description of ConfigOptions, the >>>>>>> semantic of >>>>>>> one option should be a super set of the other option. E.g. in Table >> API >>>>>>> we might use general ExecutionConfig properties. But we would like >>>>>>> to a) >>>>>>> make external options more prominent in the Table API config docs to >>>>>>> link people to properties they should pay attention b) notice about >>>>>>> side >>>>>>> effects. The core semantic of a property should not change. >>>>>>> >>>>>>> 3. The factory will not receive the entire configuration but works >> in a >>>>>>> separate key space. For `myObjectOption` above, it would receive a >>>>>>> configuration that consists of `field: 12` and `other: true`. >>>>>>> >>>>>>> I agree. I will convert the document into a Wiki page today. >>>>>>> >>>>>>> Thanks, >>>>>>> Timo >>>>>>> >>>>>>> On 29.08.19 09:00, Stephan Ewen wrote: >>>>>>>> @Becket One thing that may be non-obvious is that the Configuration >>>>>>>> class >>>>>>>> also defines serialization / persistence logic at the moment. So it >>>>>>>> needs >>>>>>>> to know the set of types it supports. That stands in the way of an >>>>>>>> arbitrary generic map type. >>>>>>>> >>>>>>>> @Timo I agree though that it seems a bit inconsistent to have one >>>>>>>> collection orthogonal to the type (List) and another one bound to >>>>>>> specific >>>>>>>> types (Map). >>>>>>>> >>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Timo, >>>>>>>>> >>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I have a >>>>>>>>> few >>>>>>>>> questions / comments. >>>>>>>>> >>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption? >>>>>>>>> Would it be enough to just check the atomicClass to see if it is a >>>>>>>>> List >>>>>>> or >>>>>>>>> not? >>>>>>>>> Also, in the ConfigOption<Map> class case, are we always assume >>>>>>>>> both key >>>>>>>>> and value types are String? Can we just apply the same to the >>>>>>>>> ConfigOption<List>? >>>>>>>>> BTW, I did a quick search in the codebase but did not find any >>>>>>>>> usage of >>>>>>>>> ConfigOption<List>. >>>>>>>>> >>>>>>>>> 2. The same config name, but with two ConfigOption with different >>>>>>> semantic >>>>>>>>> in different component seems super confusing. For example, when >> users >>>>>>> set >>>>>>>>> both configs, they may have no idea one is overriding the other. >>>>>>>>> There >>>>>>>>> might be two cases: >>>>>>>>> - If it is just the same config used by different components to >>>>>>>>> act >>>>>>>>> accordingly, it might be better to just have one config, but >> describe >>>>>>>>> clearly on how that config will be used. >>>>>>>>> - If it is actually two configurations that can be set >>>>>>>>> differently, I >>>>>>>>> think the config names should just be different. >>>>>>>>> >>>>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() >> method >>>>>>>>> pretty much means getConfiguration()? The toConfiguration() method >>>>>>> sounds >>>>>>>>> like converting an object to a configuration, which only works if >> the >>>>>>>>> object does not contain any state / value. I am also wondering if >>>>>>>>> there >>>>>>> is >>>>>>>>> a real use case of this method. Because supposedly the >> configurations >>>>>>> could >>>>>>>>> just be passed around to caller of this method. >>>>>>>>> >>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of in the >>>>>>>>> doc before voting? The FLIP wiki allows track the modification >>>>>>>>> history >>>>>>> and >>>>>>>>> has a more established structure to ensure nothing is missed. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>> >>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther <[hidden email]> >>>>>>> wrote: >>>>>>>>>> Hi everyone, >>>>>>>>>> >>>>>>>>>> I updated the FLIP proposal one more time as mentioned in the >> voting >>>>>>>>>> thread. If there are no objections, I will start a new voting >> thread >>>>>>>>>> tomorrow at 9am Berlin time. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Timo >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote: >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> thanks for all the feedback we have received online and offline. >> It >>>>>>>>>>> showed that many people support the idea of evolving the Flink >>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP will >>>>>>>>>>> not >>>>>>>>>>> solve all issues but at least will improve the current status. >>>>>>>>>>> >>>>>>>>>>> We've updated the document and replaced the Correlation part >>>>>>>>>>> with the >>>>>>>>>>> concept of a ConfigOptionGroup that can provide all available >>>>>>>>>>> options >>>>>>>>>>> of a group plus custom group validators for eager validation. >>>>>>>>>>> For now, >>>>>>>>>>> this eager group validation will only be used at certain >>>>>>>>>>> locations in >>>>>>>>>>> the Flink code but it prepares for maybe validating the entire >>>>>>>>>>> global >>>>>>>>>>> configuration before submitting a job in the future. >>>>>>>>>>> >>>>>>>>>>> Please take another look if you find time. I hope we can proceed >>>>>>>>>>> with >>>>>>>>>>> the voting process if there are no objections. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: >>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>> >>>>>>>>>>>> thanks for your suggestions. Let me give you some background >> about >>>>>>>>>>>> the decisions made in this FLIP: >>>>>>>>>>>> >>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" because we >> did >>>>>>>>>>>> not want to change the entire configuration infrastructure. >>>>>>>>>>>> Both for >>>>>>>>>>>> backwards-compatibility reasons and the amount of work that >>>>>>>>>>>> would be >>>>>>>>>>>> required to update all options. If our goal is to rework the >>>>>>>>>>>> configuration option entirely, I might suggest to switch to JSON >>>>>>>>>>>> format with JSON schema and JSON validator. However, setting >>>>>>>>>>>> properties in a CLI or web interface becomes more tricky the >> more >>>>>>>>>>>> nested structures are allowed. >>>>>>>>>>>> >>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is >>>>>>>>>>>> centered >>>>>>>>>>>> around Java classes where T is determined by the default value. >>>>>>>>>>>> The >>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit >>>>>>>>>>>> `intType()` method etc. The current design of validators >> centered >>>>>>>>>>>> around Java classes makes it possible to have typical domain >>>>>>>>>>>> validators baked by generics as you suggested. If we introduce >>>>>>>>>>>> types >>>>>>>>>>>> such as "quantity with measure and unit" we still need to get a >>>>>>>>>>>> class >>>>>>>>>>>> out of this option at the end, so why changing a proven concept? >>>>>>>>>>>> >>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary >>>>>>>>>>>> nesting. As >>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every >>>>>>>>>>>> atomic >>>>>>>>>>>> option like "key=12" can be represented by a list "keys=12;13". >>>>>>>>>>>> But >>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated list >>>>>>>>>>>> option >>>>>>>>>>>> would start making this more complicated such as >>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), >>>>>>>>>>>> StringOption(...)))", do we want that? >>>>>>>>>>>> >>>>>>>>>>>> 4. Correlation: The correlation part is one of the suggestions >>>>>>>>>>>> that I >>>>>>>>>>>> like least in the document. We can also discuss removing it >>>>>>>>>>>> entirely, >>>>>>>>>>>> but I think it solves the use case of relating options with each >>>>>>>>>>>> other in a flexible way right next to the actual option. >>>>>>>>>>>> Instead of >>>>>>>>>>>> being hidden in some component initialization, we should put it >>>>>>>>>>>> close >>>>>>>>>>>> to the option to also perform validation eagerly instead of >>>>>>>>>>>> failing >>>>>>>>>>>> at runtime when the option is accessed the first time. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Timo >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>>>>>>>>>>>> A "List Type" sounds like a good direction to me. >>>>>>>>>>>>> >>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. The >>>>>>>>>>>>> idea is >>>>>>>>>>>>> to see >>>>>>>>>>>>> if something like that can ease validation. Especially the >>>>>>>>> correlation >>>>>>>>>>>>> system seems quite complex (proxies to work around order of >>>>>>>>>>>>> initialization). >>>>>>>>>>>>> >>>>>>>>>>>>> For example, let's assume we don't think primarily about "java >>>>>>>>>>>>> types" but >>>>>>>>>>>>> would define types as one of the following (just examples, >>>>>>>>>>>>> haven't >>>>>>>>>>>>> thought >>>>>>>>>>>>> all the details through): >>>>>>>>>>>>> >>>>>>>>>>>>> (a) category type: implies string, and a fix set of >> possible >>>>>>>>> values. >>>>>>>>>>>>> Those would be passes and naturally make it into the docs and >>>>>>>>>>>>> validation. >>>>>>>>>>>>> Maps to a String or Enum in Java. >>>>>>>>>>>>> >>>>>>>>>>>>> (b) numeric integer type: implies long (or optionally >>>>>>>>>>>>> integer, >>>>>>> if >>>>>>>>>>>>> we want >>>>>>>>>>>>> to automatically check overflow / underflow). would take >> typical >>>>>>>>> domain >>>>>>>>>>>>> validators, like non-negative, etc. >>>>>>>>>>>>> >>>>>>>>>>>>> (c) numeric real type: same as above (double or float) >>>>>>>>>>>>> >>>>>>>>>>>>> (d) numeric interval type: either defined as an >> interval, or >>>>>>>>>>>>> references >>>>>>>>>>>>> other parameter by key. validation by valid interval. >>>>>>>>>>>>> >>>>>>>>>>>>> (e) quantity: a measure and a unit. separately parsable. >> The >>>>>>>>>>>>> measure's >>>>>>>>>>>>> type could be any of the numeric types above, with same >>>>>>>>>>>>> validation >>>>>>>>>>>>> rules. >>>>>>>>>>>>> >>>>>>>>>>>>> With a system like the above, would we still correlation >>>>>>>>>>>>> validators? >>>>>>>>>>>>> Are >>>>>>>>>>>>> there still cases that we need to catch early (config loading) >> or >>>>>>>>>>>>> are the >>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup >> specific, >>>>>>>>>>>>> that it is >>>>>>>>>>>>> fine to handle them in component initialization? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>>>>>>>>>>>> <[hidden email]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you for your opinion. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Actually list/composite types are the topics we spent the >>>>>>>>>>>>>> most of >>>>>>>>> the >>>>>>>>>>>>>> time. I understand that from a perspective of a full blown >> type >>>>>>>>>>>>>> system, >>>>>>>>>>>>>> a field like isList may look weird. Please let me elaborate a >>>>>>>>>>>>>> bit >>>>>>>>> more >>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear enough >>>>>>>>>>>>>> about >>>>>>>>> it >>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is >>>>>>>>>>>>>> that they >>>>>>>>>>>>>> must >>>>>>>>>>>>>> have a string representation as they might come from a >>>>>>> configuration >>>>>>>>>>>>>> file. Moreover it must be a human readable format, so that the >>>>>>>>> values >>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did not >>>>>>>>>>>>>> want to >>>>>>>>>>>>>> add a >>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for >>>>>>>>>>>>>> lists >>>>>>>>> only >>>>>>>>>>>>>> (and flat objects - I think though in the current design >>>>>>>>>>>>>> there is a >>>>>>>>>>>>>> mistake around the Configurable interface). I think though >>>>>>>>>>>>>> you have >>>>>>>>> a >>>>>>>>>>>>>> point here and it would be better to have a ListConfigOption >>>>>>>>>>>>>> instead of >>>>>>>>>>>>>> this field. Does it make sense to you? >>>>>>>>>>>>>> >>>>>>>>>>>>>> As for the second part of your message. I am not sure if I >>>>>>>>> understood >>>>>>>>>>>>>> it. The validators work with parse/deserialized values from >>>>>>>>>>>>>> Configuration that means they can be bound to the generic >>>>>>>>>>>>>> parameter >>>>>>>>> of >>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends >>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the >>>>>>>>>>>>>> ConfigOption >>>>>>>>>>>>>> has anything to do with the validation logic. Could you >>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>> bit >>>>>>>>>>>>>> more what did you mean? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Dawid >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to do >> early >>>>>>>>>>>>>> validation. >>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad hoc, >>>>>>>>>>>>>>> though. For >>>>>>>>>>>>>>> example, having a boolean "isList" is a clear indication of >> not >>>>>>>>>>>>>>> having >>>>>>>>>>>>>>> thought through the type/category system. >>>>>>>>>>>>>>> Also, having a more clear category system makes validation >>>>>>> simpler. >>>>>>>>>>>>>>> For example, I have seen systems distinguishing between >> numeric >>>>>>>>>>>>>> parameters >>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible values), >>>>>>>>>>>>>>> quantities >>>>>>>>>>>>>>> like duration and memory size (need measure and unit), which >>>>>>>>>>>>>>> results in >>>>>>>>>>>>>> an >>>>>>>>>>>>>>> elegant system for validation. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < >>>>>>>>> [hidden email] >>>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>>>>>>>>>>>> This allows the currently cluttered configuration of various >>>>>>>>>>>>>>>> modules to be unified. >>>>>>>>>>>>>>>> This is also first step of one of the keys to making new >>>>>>>>>>>>>>>> unified >>>>>>>>>>>>>>>> TableEnvironment available for production. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Previously, we did encounter complex configurations, such as >>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The skew may >>>>>>>>>>>>>>>> be a single field or a combination of multiple fields. >>>>>>>>>>>>>>>> So the >>>>>>>>>>>>>>>> configuration is very troublesome. We used JSON string >> to >>>>>>>>>>>>>>>> configure it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Jingsong Lee >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>>> From:Jark Wu <[hidden email]> >>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>>>>>>>>>>>> To:dev <[hidden email]> >>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and >>>>>>>>> Configuration >>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a >> long >>>>>>>>> time. >>>>>>>>>>>>>>>> We have seen the benefit when developing blink >>>>>>>>>>>>>>>> configurations and >>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>> properties in 1.9 release. >>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed design. >>>>>>>>>>>>>>>> I will leave my thoughts and comments there. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen < >> [hidden email] >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which >>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of Flink. >>>>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment >>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by >>>>>>>>> configuration. >>>>>>>>>>>>>>>>> Will take a look at the document in days. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> tison. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 >>>>>>>>>>>>>>>>> 下午10:12写道: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of >>>>>>>>>>>>>>>>>> ExecutionConfig and >>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is >>>>>>>>>>>>>>>>>> necessary >>>>>>>>>>>>>>>>>> to make >>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, with >> the >>>>>>>>>>>>>>>>>> new SQL >>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>> based on properties as well as more connectors and formats >>>>>>>>>>>>>>>>>> coming up, >>>>>>>>>>>>>>>>>> unified configuration becomes more important. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We need more features around string-based configuration >>>>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>> future, >>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose FLIP-54 for >>>>>>>>>>>>>>>>>> evolving >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>>>>>>>>>>>>>>>>> In summary it adds: >>>>>>>>>>>>>>>>>> - documented types and validation >>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, list >>>>>>>>>>>>>>>>>> - simple non-nested object types >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >> |
Free forum by Nabble | Edit this page |