Hi Timo, Becket
From the options that Timo suggested for improving the mutability situation I would prefer option a) as this is the more explicit option and simpler option. Just as a remark, I think in general Configurable types for options will be rather very rare for some special use-cases, as the majority of options are rather simpler parameters of primitive types (+duration, memory) @Becket for the toConfiguration this is required for shipping the Configuration to TaskManager, so that we do not have to use java serializability. Best, Dawid On 02/09/2019 10:05, Timo Walther wrote: > Hi Becket, > > Re 1 & 3: "values in configurations should actually be immutable" > > I would also prefer immutability but most of our configuration is > mutable due to serialization/deserialization. Also maps and list could > be mutable in theory. It is difficult to really enforce that for > nested structures. I see two options: > > a) For the original design: How about we force implementers to add a > duplicate() method in a Configurable object? This would mean that the > object is still mutable but by duplicating the object both during > reading and writing we would avoid the problem you described. > > b) For the current design: We still use the factory approach but let a > Configurable object implement a getFactory() method such that we know > how to serialize the object. With the help of a factory we can also > duplicate the object easily during reading and writing and ensure > immutability. > > I would personally go for approach a) to not over-engineer this topic. > But I'm open for option b). > > Regards, > Timo > > > On 31.08.19 04:09, Becket Qin wrote: >> Hi Timo, >> >> Thanks for the reply. I am still a little concerned over the >> mutability of >> the Configurable which could be the value in Configuration. >> >> Re: 1 >> >>> But in general, people should not use any internal fields. >>> Configurable objects are meant for simple little helper POJOs, not >>> complex arbitrary nested data structures. >> This seems difficult to enforce... Ideally the values in configurations >> should actually be immutable. The value can only be changed by >> explicitly >> calling setters in Configuration. Otherwise we may have weird situation >> where the Configurable in the same configuration are different in two >> places because the configurable is modified in one places and not >> modified >> in another place. So I am a little concerned on putting a >> Configurable type >> in the Configuration map, because the value could be modified without >> explicitly setting the configuration. For example, can users do the >> following? >> >> Configurable configurable = >> configuration.getConfigurable(myConfigurableOption); >> configurable.setConfigA(123); // this already changes the configurable >> object in the configuration. >> >> Re: 2 >> Thanks for confirming. As long as users will not have a situation where >> they need to set two configurations with the same key but different >> descriptions, I think it is OK. >> >> Re: 3 >> This is actually kind of related to 1. i.e. Whether toConfiguration() >> guarantees the exact same object can be rebuilt from the >> configuration or >> not. I am still not quite sure about the use case of toConfiguration() >> though. It seems indicating the Configurable is mutable, which might be >> dangerous. >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <[hidden email]> >> wrote: >> >>> Hi Becket, >>> >>> 1. First of all, you are totally right. The FLIP contains a bug due to >>> the last minute changes that Dawid suggested: by having immutable >>> objects created by a factory we loose the serializability of the >>> Configuration because the factory itself is not stored in the >>> Configuration. I would propose to revert the last change and stick to >>> the original design, which means that a object must implement the >>> Configurable interface and also implements >>> serialization/deserialization >>> methods such that also internal fields can be persisted as you >>> suggested. But in general, people should not use any internal fields. >>> Configurable objects are meant for simple little helper POJOs, not >>> complex arbitrary nested data structures. >>> >>> It is Map<String, Object> because Configuration stores the raw objects. >>> If you put a Boolean option into it, it remains Boolean. This makes the >>> map very efficient for shipping to the cluster and accessing options >>> multiple times. The same for configurable objects. We put the pure >>> objects into the map without any serialization/deserialization. The >>> provided factory allows to convert the Object into a Configuration and >>> we know how to serialize/deserializise a configuration because it is >>> just a key/value map. >>> >>> 2. Yes, this is what we had in mind. It should still be the same >>> configuration option. We would like to avoid specialized option keys >>> across components (exec.max-para and table.exec.max-para) if they are >>> describing basically the same thing. But adding some more description >>> like "TableOptions.MAX_PARALLELISM with description_1 + description_2" >>> does not hurt. >>> >>> 3. They should restore the original object given that the >>> toConfiguration/fromConfiguration methods have been implemented >>> correctly. I will extend the example to make the logic clearer while >>> fixing the bug. >>> >>> Thanks for the healthy discussion, >>> Timo >>> >>> >>> On 30.08.19 15:29, Becket Qin wrote: >>>> Hi Timo, >>>> >>>> Thanks again for the clarification. Please see a few more questions >>> below. >>>> Re: 1 >>>> >>>>> Please also keep in mind that Configuration must not consist of only >>>>> strings, it manages a Map<String, Object> for efficient access. Every >>>>> map entry can have a string representation for persistence, but in >>>>> most >>>>> cases consists of unserialized objects. >>>> I'd like to understand this a bit more. The reason we have a >>>> Map<String, >>>> Object> in Configuration was because that Object could be either a >>> String, >>>> a List or a Map, right? But they eventually always boil down to >>>> Strings, >>> or >>>> maybe one of the predefined type that we know how to serialize. In the >>>> current design, can the Object also be Configurable? >>>> If the value in the config Map<String, Object> can be Configurable >>> objects, >>>> how do we serialize them? Calling toConfiguration() seems not quite >>> working >>>> because there might be some other internal fields that are not part of >>> the >>>> configuration. The modification to those fields will be lost if we >>>> simply >>>> use toConfiguration(). So the impact of modifying those Configurable >>>> objects seems a little undefined. And it would be difficult to prevent >>>> users from doing that. >>>> If the value in the config Map<String, Object> cannot be Configurable >>>> objects, then it seems a little weird to have all the other ConfigType >>>> stored in the ConfigMap in their own native type and accessed via >>>> getInteger() / getBoolean(), etc, while having ConfigurableType to be >>>> different from others because one have to use ConfigurableFactory >>>> to get >>>> the configurations. >>>> >>>> Re: 2 >>>> >>>>> I think about the withExtendedDescription as a helper getter in a >>>>> different place, so that the option is easier to find in a different >>>>> module from it was defined. >>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>> equal >>> to: >>>>> public ConfigOption getMaxParallelismOption() { >>>>> return CoreOptions.MAX_PARALLELISM; >>>>> } >>>> Just to make sure I understand it correctly, does that mean users will >>> see >>>> something like following? >>>> - CoreOptions.MAX_PARALLELISM with description_1; >>>> - TableOptions.MAX_PARALLELISM with description_1 + description_2. >>>> - DataStreamOptions.MAX_PARALLELISM with description_1 + >>>> description_3. >>>> And users will only configure exactly one MAX_PARALLELISM cross the >>> board. >>>> So they won't be confused by setting two MAX_PARALLELISM config for >>>> two >>>> different modules, while they are actually the same config. If that is >>> the >>>> case, I don't have further concern. >>>> >>>> Re: 3 >>>> Maybe I am thinking too much. I thought toBytes() / fromBytes() >>>> actually >>>> restore the original Object. But fromConfiguration() and >>> toConfiguration() >>>> does not do that, anything not in the configuration of the original >>> object >>>> will be lost. So it would be good to make that clear. Maybe a clear >>>> Java >>>> doc is sufficient. >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz >>>> <[hidden email] >>>> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> Ad. 1 >>>>> >>>>> The advantage of our approach is that you have the type definition >>>>> close >>>>> to the option definition. The only difference is that it enables >>>>> expressing simple pojos with the primitives like >>>>> ConfigOption<Integer>, >>>>> ConfigOption<Long> etc. Otherwise as Timo said you will start having >>>>> >>>>> the parsing logic scattered everywhere in the code base as it is now. >>>>> The string representation in our proposal is exactly the same as you >>>>> explained for those three options. The only difference is that you >>>>> don't >>>>> have to parse the elements of a List, Map etc. afterwards. >>>>> >>>>> Ad. 2 >>>>> >>>>> I think about the withExtendedDescription as a helper getter in a >>>>> different place, so that the option is easier to find in a different >>>>> module from it was defined. >>>>> >>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>> equal >>> to: >>>>> public ConfigOption getMaxParallelismOption() { >>>>> >>>>> return CoreOptions.MAX_PARALLELISM; >>>>> >>>>> } >>>>> >>>>> This allows to further clarify the description of the option in the >>>>> context of a different module and end up in a seperate page in >>>>> documentation (but with a link to the original one). In the end it is >>>>> exactly the same option. It has exactly same key, type, parsing >>>>> logic, >>>>> it is in the end forwarded to the same place. >>>>> >>>>> Ad. 3 >>>>> >>>>> Not sure if I understand your concerns here. As Timo said it is in >>>>> the >>>>> end sth similar to toBytes/fromBytes, but it puts itself to a >>>>> Configuration. Also just wanted to make sure we adjusted this part >>>>> slightly and now the ConfigOption takes ConfigurableFactory. >>>>> >>>>> Best, >>>>> >>>>> Dawid >>>>> >>>>> >>>>> On 30/08/2019 09:39, Timo Walther wrote: >>>>>> Hi Becket, >>>>>> >>>>>> thanks for the discussion. >>>>>> >>>>>> 1. ConfigOptions in their current design are bound to classes. >>>>>> Regarding, the option is "creating some Configurable objects instead >>>>>> of defining the config to create >>>>>> those Configurable"? We just moved this logic to a factory, this >>>>>> factory can then also be used for other purposes. However, how the >>>>>> option and objects are serialized to Configuration is still not part >>>>>> of the option. The option is just pure declaration. >>>>>> >>>>>> If we would allow only List<String>, implementers would need to >>>>>> start >>>>>> implementing own parsing and validation logic all the time. We would >>>>>> like to avoid that. >>>>>> >>>>>> Please also keep in mind that Configuration must not consist of only >>>>>> strings, it manages a Map<String, Object> for efficient access. >>>>>> Every >>>>>> map entry can have a string representation for persistence, but in >>>>>> most cases consists of unserialized objects. >>>>>> >>>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite >>>>>> keys, types or default values. But different layers might want to >>>>>> add >>>>>> helpful information. In our concrete use case for FLIP-59, >>>>>> ExecutionConfig has 50 properties and many of them are not relevant >>>>>> for the Table layer or have no effect at all. We would like to list >>>>>> and mention the most important config options again in the Table >>>>>> Configuration section, so that users are not confused, but with a >>>>>> strong link to the core option. E.g.: registered kryo serializers >>>>>> are >>>>>> also important also for Table users, we would like to add the >>>>>> comment >>>>>> "This option allows to modify the serialization of the ANY SQL data >>>>>> type.". I think we should not spam the core configuration page with >>>>>> comments from all layers, connectors, or libraries but keep this in >>>>>> the corresponding component documentation. >>>>>> >>>>>> 3. But it is something like fromBytes() and toBytes()? It serializes >>>>>> and deserializes T from a configuration? >>>>>> >>>>>> Regards, >>>>>> Timo >>>>>> >>>>>> On 29.08.19 19:14, Becket Qin wrote: >>>>>>> Hi Timo and Stephan, >>>>>>> >>>>>>> Thanks for the detail explanation. >>>>>>> >>>>>>> 1. I agree that each config should be in a human readable >>>>>>> format. My >>>>>>> concern is that the current List<Configurable> looks going a little >>>>>>> too far >>>>>>> from what the configuration is supposed to do. They are essentially >>>>>>> creating some Configurable objects instead of defining the >>>>>>> config to >>>>>>> create >>>>>>> those Configurable. This mixes ConfigOption and the usage of it. >>>>>>> API >>>>>>> wise >>>>>>> it would be good to keep the configs and their usages (such as >>>>>>> how to >>>>>>> create objects using the ConfigOption) apart from each other. >>>>>>> I am wondering if we can just make List also only take string. For >>>>>>> example, >>>>>>> is the following definition of map and list sufficient? >>>>>>> >>>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be >>>>>>> defined >>>>>>> as: >>>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ... >>>>>>> >>>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined >>> as: >>>>>>> list_config_name: v1, v2, v3, ... >>>>>>> >>>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, >>>>>>> String>>. It >>>>>>> can >>>>>>> be defined as: >>>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... >>>>>>> >>>>>>> All the key and values in the configuration are String. This also >>>>>>> guarantees that the configuration is always serializable. >>>>>>> If we want to do one more step, we can allow the ConfigOption to >>>>>>> set >>> all >>>>>>> the primitive types and parse that for the users. So something like >>>>>>> List<Integer>, List<Class<?>> seems fine. >>>>>>> >>>>>>> The configuration class could also have util methods to create a >>>>>>> list >>> of >>>>>>> configurable such as: >>>>>>> <T> List<T> >>> <Configuration#getConfigurableInstances(ListMapConfigOption, >>>>>>> Class<T> clazz). >>>>>>> But the configuration class will not take arbitrary Configurable as >>> the >>>>>>> value of its config. >>>>>>> >>>>>>> 2. I might have misunderstood this. But my concern on the >>>>>>> description >>>>>>> extension is in the following example. >>>>>>> >>>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM = >>>>>>> >>>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( >>>>>>> "Note: That this property means that a table program has a >>>>>>> side-effect >>>>>>> XYZ."); >>>>>>> >>>>>>> In this case, we will have two MAX_PARALLELISM configs now. One is >>>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I >>>>>>> suppose >>>>>>> users >>>>>>> will see both configurations. One with an extended description >>>>>>> and one >>>>>>> without. Let's say there is a third component which also users >>>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM >>>>>>> ConfigOption? If >>>>>>> so, what would that ConfigOption's description look like? >>>>>>> >>>>>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM >>>>>>> and the >>>>>>> description should clearly state all the usage of this >>>>>>> ConfigOption. >>>>>>> >>>>>>> 3. I see, in that case, how about we name it something like >>>>>>> extractConfiguration()? I am just trying to see if we can make it >>> clear >>>>>>> this is not something like fromBytes() and toBytes(). >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Jiangjie (Becket) Qin >>>>>>> >>>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]> >>>>> wrote: >>>>>>>> Hi Becket, >>>>>>>> >>>>>>>> let me try to clarify some of your questions: >>>>>>>> >>>>>>>> 1. For every option, we also needed to think about how to >>>>>>>> represent >>> it >>>>>>>> in a human readable format. We do not want to allow arbitrary >>>>>>>> nesting >>>>>>>> because that would easily allow to bypass the flattened >>>>>>>> hierarchy of >>>>>>>> config options (`session.memory.min`). The current design >>>>>>>> allows to >>>>>>>> represent every option type as a list. E.g.: >>>>>>>> >>>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12` >>>>>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption: >>>>>>>> field=12,other=true; field=12,other=true` >>>>>>>> `myPropertyOption: key=str0,other=str1` can be >>>>>>>> `myPropertyListOption: >>>>>>>> key=str0,other=str1;key=str0,other=str1` >>>>>>>> >>>>>>>> We need the atomic class for serialization/deserialization both in >>>>>>>> binary and string format. >>>>>>>> >>>>>>>> ConfigOption<List> is not present in the code base yet, but this >>>>>>>> FLIP is >>>>>>>> a preparation of making ExecutionConfig configurable. If you look >>> into >>>>>>>> this class or also in existing table connectors/formats, you >>>>>>>> will see >>>>>>>> that each proposed option type has its requirements. >>>>>>>> >>>>>>>> 2. Regarding extending the description of ConfigOptions, the >>>>>>>> semantic of >>>>>>>> one option should be a super set of the other option. E.g. in >>>>>>>> Table >>> API >>>>>>>> we might use general ExecutionConfig properties. But we would like >>>>>>>> to a) >>>>>>>> make external options more prominent in the Table API config >>>>>>>> docs to >>>>>>>> link people to properties they should pay attention b) notice >>>>>>>> about >>>>>>>> side >>>>>>>> effects. The core semantic of a property should not change. >>>>>>>> >>>>>>>> 3. The factory will not receive the entire configuration but works >>> in a >>>>>>>> separate key space. For `myObjectOption` above, it would receive a >>>>>>>> configuration that consists of `field: 12` and `other: true`. >>>>>>>> >>>>>>>> I agree. I will convert the document into a Wiki page today. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Timo >>>>>>>> >>>>>>>> On 29.08.19 09:00, Stephan Ewen wrote: >>>>>>>>> @Becket One thing that may be non-obvious is that the >>>>>>>>> Configuration >>>>>>>>> class >>>>>>>>> also defines serialization / persistence logic at the moment. >>>>>>>>> So it >>>>>>>>> needs >>>>>>>>> to know the set of types it supports. That stands in the way >>>>>>>>> of an >>>>>>>>> arbitrary generic map type. >>>>>>>>> >>>>>>>>> @Timo I agree though that it seems a bit inconsistent to have one >>>>>>>>> collection orthogonal to the type (List) and another one bound to >>>>>>>> specific >>>>>>>>> types (Map). >>>>>>>>> >>>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Timo, >>>>>>>>>> >>>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I >>>>>>>>>> have a >>>>>>>>>> few >>>>>>>>>> questions / comments. >>>>>>>>>> >>>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption? >>>>>>>>>> Would it be enough to just check the atomicClass to see if it >>>>>>>>>> is a >>>>>>>>>> List >>>>>>>> or >>>>>>>>>> not? >>>>>>>>>> Also, in the ConfigOption<Map> class case, are we always assume >>>>>>>>>> both key >>>>>>>>>> and value types are String? Can we just apply the same to the >>>>>>>>>> ConfigOption<List>? >>>>>>>>>> BTW, I did a quick search in the codebase but did not find any >>>>>>>>>> usage of >>>>>>>>>> ConfigOption<List>. >>>>>>>>>> >>>>>>>>>> 2. The same config name, but with two ConfigOption with >>>>>>>>>> different >>>>>>>> semantic >>>>>>>>>> in different component seems super confusing. For example, when >>> users >>>>>>>> set >>>>>>>>>> both configs, they may have no idea one is overriding the other. >>>>>>>>>> There >>>>>>>>>> might be two cases: >>>>>>>>>> - If it is just the same config used by different >>>>>>>>>> components to >>>>>>>>>> act >>>>>>>>>> accordingly, it might be better to just have one config, but >>> describe >>>>>>>>>> clearly on how that config will be used. >>>>>>>>>> - If it is actually two configurations that can be set >>>>>>>>>> differently, I >>>>>>>>>> think the config names should just be different. >>>>>>>>>> >>>>>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() >>> method >>>>>>>>>> pretty much means getConfiguration()? The toConfiguration() >>>>>>>>>> method >>>>>>>> sounds >>>>>>>>>> like converting an object to a configuration, which only >>>>>>>>>> works if >>> the >>>>>>>>>> object does not contain any state / value. I am also >>>>>>>>>> wondering if >>>>>>>>>> there >>>>>>>> is >>>>>>>>>> a real use case of this method. Because supposedly the >>> configurations >>>>>>>> could >>>>>>>>>> just be passed around to caller of this method. >>>>>>>>>> >>>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of >>>>>>>>>> in the >>>>>>>>>> doc before voting? The FLIP wiki allows track the modification >>>>>>>>>> history >>>>>>>> and >>>>>>>>>> has a more established structure to ensure nothing is missed. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>> >>>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther >>>>>>>>>> <[hidden email]> >>>>>>>> wrote: >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> I updated the FLIP proposal one more time as mentioned in the >>> voting >>>>>>>>>>> thread. If there are no objections, I will start a new voting >>> thread >>>>>>>>>>> tomorrow at 9am Berlin time. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote: >>>>>>>>>>>> Hi everyone, >>>>>>>>>>>> >>>>>>>>>>>> thanks for all the feedback we have received online and >>>>>>>>>>>> offline. >>> It >>>>>>>>>>>> showed that many people support the idea of evolving the Flink >>>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP >>>>>>>>>>>> will >>>>>>>>>>>> not >>>>>>>>>>>> solve all issues but at least will improve the current status. >>>>>>>>>>>> >>>>>>>>>>>> We've updated the document and replaced the Correlation part >>>>>>>>>>>> with the >>>>>>>>>>>> concept of a ConfigOptionGroup that can provide all available >>>>>>>>>>>> options >>>>>>>>>>>> of a group plus custom group validators for eager validation. >>>>>>>>>>>> For now, >>>>>>>>>>>> this eager group validation will only be used at certain >>>>>>>>>>>> locations in >>>>>>>>>>>> the Flink code but it prepares for maybe validating the entire >>>>>>>>>>>> global >>>>>>>>>>>> configuration before submitting a job in the future. >>>>>>>>>>>> >>>>>>>>>>>> Please take another look if you find time. I hope we can >>>>>>>>>>>> proceed >>>>>>>>>>>> with >>>>>>>>>>>> the voting process if there are no objections. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Timo >>>>>>>>>>>> >>>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: >>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>> >>>>>>>>>>>>> thanks for your suggestions. Let me give you some background >>> about >>>>>>>>>>>>> the decisions made in this FLIP: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" >>>>>>>>>>>>> because we >>> did >>>>>>>>>>>>> not want to change the entire configuration infrastructure. >>>>>>>>>>>>> Both for >>>>>>>>>>>>> backwards-compatibility reasons and the amount of work that >>>>>>>>>>>>> would be >>>>>>>>>>>>> required to update all options. If our goal is to rework the >>>>>>>>>>>>> configuration option entirely, I might suggest to switch >>>>>>>>>>>>> to JSON >>>>>>>>>>>>> format with JSON schema and JSON validator. However, setting >>>>>>>>>>>>> properties in a CLI or web interface becomes more tricky the >>> more >>>>>>>>>>>>> nested structures are allowed. >>>>>>>>>>>>> >>>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is >>>>>>>>>>>>> centered >>>>>>>>>>>>> around Java classes where T is determined by the default >>>>>>>>>>>>> value. >>>>>>>>>>>>> The >>>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit >>>>>>>>>>>>> `intType()` method etc. The current design of validators >>> centered >>>>>>>>>>>>> around Java classes makes it possible to have typical domain >>>>>>>>>>>>> validators baked by generics as you suggested. If we >>>>>>>>>>>>> introduce >>>>>>>>>>>>> types >>>>>>>>>>>>> such as "quantity with measure and unit" we still need to >>>>>>>>>>>>> get a >>>>>>>>>>>>> class >>>>>>>>>>>>> out of this option at the end, so why changing a proven >>>>>>>>>>>>> concept? >>>>>>>>>>>>> >>>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary >>>>>>>>>>>>> nesting. As >>>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every >>>>>>>>>>>>> atomic >>>>>>>>>>>>> option like "key=12" can be represented by a list >>>>>>>>>>>>> "keys=12;13". >>>>>>>>>>>>> But >>>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated >>>>>>>>>>>>> list >>>>>>>>>>>>> option >>>>>>>>>>>>> would start making this more complicated such as >>>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), >>>>>>>>>>>>> StringOption(...)))", do we want that? >>>>>>>>>>>>> >>>>>>>>>>>>> 4. Correlation: The correlation part is one of the >>>>>>>>>>>>> suggestions >>>>>>>>>>>>> that I >>>>>>>>>>>>> like least in the document. We can also discuss removing it >>>>>>>>>>>>> entirely, >>>>>>>>>>>>> but I think it solves the use case of relating options >>>>>>>>>>>>> with each >>>>>>>>>>>>> other in a flexible way right next to the actual option. >>>>>>>>>>>>> Instead of >>>>>>>>>>>>> being hidden in some component initialization, we should >>>>>>>>>>>>> put it >>>>>>>>>>>>> close >>>>>>>>>>>>> to the option to also perform validation eagerly instead of >>>>>>>>>>>>> failing >>>>>>>>>>>>> at runtime when the option is accessed the first time. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Timo >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>>>>>>>>>>>>> A "List Type" sounds like a good direction to me. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. The >>>>>>>>>>>>>> idea is >>>>>>>>>>>>>> to see >>>>>>>>>>>>>> if something like that can ease validation. Especially the >>>>>>>>>> correlation >>>>>>>>>>>>>> system seems quite complex (proxies to work around order of >>>>>>>>>>>>>> initialization). >>>>>>>>>>>>>> >>>>>>>>>>>>>> For example, let's assume we don't think primarily about >>>>>>>>>>>>>> "java >>>>>>>>>>>>>> types" but >>>>>>>>>>>>>> would define types as one of the following (just examples, >>>>>>>>>>>>>> haven't >>>>>>>>>>>>>> thought >>>>>>>>>>>>>> all the details through): >>>>>>>>>>>>>> >>>>>>>>>>>>>> (a) category type: implies string, and a fix set of >>> possible >>>>>>>>>> values. >>>>>>>>>>>>>> Those would be passes and naturally make it into the docs >>>>>>>>>>>>>> and >>>>>>>>>>>>>> validation. >>>>>>>>>>>>>> Maps to a String or Enum in Java. >>>>>>>>>>>>>> >>>>>>>>>>>>>> (b) numeric integer type: implies long (or optionally >>>>>>>>>>>>>> integer, >>>>>>>> if >>>>>>>>>>>>>> we want >>>>>>>>>>>>>> to automatically check overflow / underflow). would take >>> typical >>>>>>>>>> domain >>>>>>>>>>>>>> validators, like non-negative, etc. >>>>>>>>>>>>>> >>>>>>>>>>>>>> (c) numeric real type: same as above (double or >>>>>>>>>>>>>> float) >>>>>>>>>>>>>> >>>>>>>>>>>>>> (d) numeric interval type: either defined as an >>> interval, or >>>>>>>>>>>>>> references >>>>>>>>>>>>>> other parameter by key. validation by valid interval. >>>>>>>>>>>>>> >>>>>>>>>>>>>> (e) quantity: a measure and a unit. separately >>>>>>>>>>>>>> parsable. >>> The >>>>>>>>>>>>>> measure's >>>>>>>>>>>>>> type could be any of the numeric types above, with same >>>>>>>>>>>>>> validation >>>>>>>>>>>>>> rules. >>>>>>>>>>>>>> >>>>>>>>>>>>>> With a system like the above, would we still correlation >>>>>>>>>>>>>> validators? >>>>>>>>>>>>>> Are >>>>>>>>>>>>>> there still cases that we need to catch early (config >>>>>>>>>>>>>> loading) >>> or >>>>>>>>>>>>>> are the >>>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup >>> specific, >>>>>>>>>>>>>> that it is >>>>>>>>>>>>>> fine to handle them in component initialization? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>>>>>>>>>>>>> <[hidden email]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you for your opinion. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Actually list/composite types are the topics we spent the >>>>>>>>>>>>>>> most of >>>>>>>>>> the >>>>>>>>>>>>>>> time. I understand that from a perspective of a full blown >>> type >>>>>>>>>>>>>>> system, >>>>>>>>>>>>>>> a field like isList may look weird. Please let me >>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>> bit >>>>>>>>>> more >>>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear >>>>>>>>>>>>>>> enough >>>>>>>>>>>>>>> about >>>>>>>>>> it >>>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is >>>>>>>>>>>>>>> that they >>>>>>>>>>>>>>> must >>>>>>>>>>>>>>> have a string representation as they might come from a >>>>>>>> configuration >>>>>>>>>>>>>>> file. Moreover it must be a human readable format, so >>>>>>>>>>>>>>> that the >>>>>>>>>> values >>>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did not >>>>>>>>>>>>>>> want to >>>>>>>>>>>>>>> add a >>>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for >>>>>>>>>>>>>>> lists >>>>>>>>>> only >>>>>>>>>>>>>>> (and flat objects - I think though in the current design >>>>>>>>>>>>>>> there is a >>>>>>>>>>>>>>> mistake around the Configurable interface). I think though >>>>>>>>>>>>>>> you have >>>>>>>>>> a >>>>>>>>>>>>>>> point here and it would be better to have a >>>>>>>>>>>>>>> ListConfigOption >>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>> this field. Does it make sense to you? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As for the second part of your message. I am not sure if I >>>>>>>>>> understood >>>>>>>>>>>>>>> it. The validators work with parse/deserialized values from >>>>>>>>>>>>>>> Configuration that means they can be bound to the generic >>>>>>>>>>>>>>> parameter >>>>>>>>>> of >>>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends >>>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the >>>>>>>>>>>>>>> ConfigOption >>>>>>>>>>>>>>> has anything to do with the validation logic. Could you >>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>> more what did you mean? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dawid >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to do >>> early >>>>>>>>>>>>>>> validation. >>>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad >>>>>>>>>>>>>>>> hoc, >>>>>>>>>>>>>>>> though. For >>>>>>>>>>>>>>>> example, having a boolean "isList" is a clear >>>>>>>>>>>>>>>> indication of >>> not >>>>>>>>>>>>>>>> having >>>>>>>>>>>>>>>> thought through the type/category system. >>>>>>>>>>>>>>>> Also, having a more clear category system makes validation >>>>>>>> simpler. >>>>>>>>>>>>>>>> For example, I have seen systems distinguishing between >>> numeric >>>>>>>>>>>>>>> parameters >>>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible >>>>>>>>>>>>>>>> values), >>>>>>>>>>>>>>>> quantities >>>>>>>>>>>>>>>> like duration and memory size (need measure and unit), >>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>> results in >>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>> elegant system for validation. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < >>>>>>>>>> [hidden email] >>>>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>>>>>>>>>>>>> This allows the currently cluttered configuration of >>>>>>>>>>>>>>>>> various >>>>>>>>>>>>>>>>> modules to be unified. >>>>>>>>>>>>>>>>> This is also first step of one of the keys to making new >>>>>>>>>>>>>>>>> unified >>>>>>>>>>>>>>>>> TableEnvironment available for production. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Previously, we did encounter complex configurations, >>>>>>>>>>>>>>>>> such as >>>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The >>>>>>>>>>>>>>>>> skew may >>>>>>>>>>>>>>>>> be a single field or a combination of multiple >>>>>>>>>>>>>>>>> fields. >>>>>>>>>>>>>>>>> So the >>>>>>>>>>>>>>>>> configuration is very troublesome. We used JSON >>>>>>>>>>>>>>>>> string >>> to >>>>>>>>>>>>>>>>> configure it. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Jingsong Lee >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>>>> From:Jark Wu <[hidden email]> >>>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>>>>>>>>>>>>> To:dev <[hidden email]> >>>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and >>>>>>>>>> Configuration >>>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a >>> long >>>>>>>>>> time. >>>>>>>>>>>>>>>>> We have seen the benefit when developing blink >>>>>>>>>>>>>>>>> configurations and >>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>> properties in 1.9 release. >>>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed >>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>> I will leave my thoughts and comments there. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen < >>> [hidden email] >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which >>>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of >>>>>>>>>>>>>>>>>> Flink. >>>>>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment >>>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by >>>>>>>>>> configuration. >>>>>>>>>>>>>>>>>> Will take a look at the document in days. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>> tison. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 >>>>>>>>>>>>>>>>>> 下午10:12写道: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of >>>>>>>>>>>>>>>>>>> ExecutionConfig and >>>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is >>>>>>>>>>>>>>>>>>> necessary >>>>>>>>>>>>>>>>>>> to make >>>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, >>>>>>>>>>>>>>>>>>> with >>> the >>>>>>>>>>>>>>>>>>> new SQL >>>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>>> based on properties as well as more connectors and >>>>>>>>>>>>>>>>>>> formats >>>>>>>>>>>>>>>>>>> coming up, >>>>>>>>>>>>>>>>>>> unified configuration becomes more important. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We need more features around string-based configuration >>>>>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>>> future, >>>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose >>>>>>>>>>>>>>>>>>> FLIP-54 for >>>>>>>>>>>>>>>>>>> evolving >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>> >>>>>>>>>>>>>>>>>>> In summary it adds: >>>>>>>>>>>>>>>>>>> - documented types and validation >>>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, list >>>>>>>>>>>>>>>>>>> - simple non-nested object types >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>> > signature.asc (849 bytes) Download Attachment |
Hi Timo and Dawid,
Thanks for the patient reply. I agree that both option a) and option b) can solve the mutability problem. For option a), is it a little intrusive to add a duplicate() method for a Configurable? It would be great if we don't put this burden on users if possible. For option b), I actually feel it is slightly better than option a) from API perspective as getFactory() seems a more understandable method of a Configurable compared with duplicate(). And users do not need to implement much more logic. I am curious what is the downside of keeping the Configuration simple to only have primitive types, and always create the Configurable using a util method? If Configurables are rare, do we need to put the instantiation / bookkeeping logic in Configuration? @Becket for the toConfiguration this is required for shipping the > Configuration to TaskManager, so that we do not have to use java > serializability. Do you mean a Configurable may be created and configured directly without reading settings from a Configuration instance? I thought a Configurable will always be created via a ConfigurableFactory by extracting required configs from a Configuration. Am I missing something? Thanks, Jiangjie (Becket) Qin On Mon, Sep 2, 2019 at 4:47 PM Dawid Wysakowicz <[hidden email]> wrote: > Hi Timo, Becket > > From the options that Timo suggested for improving the mutability > situation I would prefer option a) as this is the more explicit option > and simpler option. Just as a remark, I think in general Configurable > types for options will be rather very rare for some special use-cases, > as the majority of options are rather simpler parameters of primitive > types (+duration, memory) > > @Becket for the toConfiguration this is required for shipping the > Configuration to TaskManager, so that we do not have to use java > serializability. > > Best, > > Dawid > > > On 02/09/2019 10:05, Timo Walther wrote: > > Hi Becket, > > > > Re 1 & 3: "values in configurations should actually be immutable" > > > > I would also prefer immutability but most of our configuration is > > mutable due to serialization/deserialization. Also maps and list could > > be mutable in theory. It is difficult to really enforce that for > > nested structures. I see two options: > > > > a) For the original design: How about we force implementers to add a > > duplicate() method in a Configurable object? This would mean that the > > object is still mutable but by duplicating the object both during > > reading and writing we would avoid the problem you described. > > > > b) For the current design: We still use the factory approach but let a > > Configurable object implement a getFactory() method such that we know > > how to serialize the object. With the help of a factory we can also > > duplicate the object easily during reading and writing and ensure > > immutability. > > > > I would personally go for approach a) to not over-engineer this topic. > > But I'm open for option b). > > > > Regards, > > Timo > > > > > > On 31.08.19 04:09, Becket Qin wrote: > >> Hi Timo, > >> > >> Thanks for the reply. I am still a little concerned over the > >> mutability of > >> the Configurable which could be the value in Configuration. > >> > >> Re: 1 > >> > >>> But in general, people should not use any internal fields. > >>> Configurable objects are meant for simple little helper POJOs, not > >>> complex arbitrary nested data structures. > >> This seems difficult to enforce... Ideally the values in configurations > >> should actually be immutable. The value can only be changed by > >> explicitly > >> calling setters in Configuration. Otherwise we may have weird situation > >> where the Configurable in the same configuration are different in two > >> places because the configurable is modified in one places and not > >> modified > >> in another place. So I am a little concerned on putting a > >> Configurable type > >> in the Configuration map, because the value could be modified without > >> explicitly setting the configuration. For example, can users do the > >> following? > >> > >> Configurable configurable = > >> configuration.getConfigurable(myConfigurableOption); > >> configurable.setConfigA(123); // this already changes the configurable > >> object in the configuration. > >> > >> Re: 2 > >> Thanks for confirming. As long as users will not have a situation where > >> they need to set two configurations with the same key but different > >> descriptions, I think it is OK. > >> > >> Re: 3 > >> This is actually kind of related to 1. i.e. Whether toConfiguration() > >> guarantees the exact same object can be rebuilt from the > >> configuration or > >> not. I am still not quite sure about the use case of toConfiguration() > >> though. It seems indicating the Configurable is mutable, which might be > >> dangerous. > >> > >> Thanks, > >> > >> Jiangjie (Becket) Qin > >> > >> On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <[hidden email]> > >> wrote: > >> > >>> Hi Becket, > >>> > >>> 1. First of all, you are totally right. The FLIP contains a bug due to > >>> the last minute changes that Dawid suggested: by having immutable > >>> objects created by a factory we loose the serializability of the > >>> Configuration because the factory itself is not stored in the > >>> Configuration. I would propose to revert the last change and stick to > >>> the original design, which means that a object must implement the > >>> Configurable interface and also implements > >>> serialization/deserialization > >>> methods such that also internal fields can be persisted as you > >>> suggested. But in general, people should not use any internal fields. > >>> Configurable objects are meant for simple little helper POJOs, not > >>> complex arbitrary nested data structures. > >>> > >>> It is Map<String, Object> because Configuration stores the raw objects. > >>> If you put a Boolean option into it, it remains Boolean. This makes the > >>> map very efficient for shipping to the cluster and accessing options > >>> multiple times. The same for configurable objects. We put the pure > >>> objects into the map without any serialization/deserialization. The > >>> provided factory allows to convert the Object into a Configuration and > >>> we know how to serialize/deserializise a configuration because it is > >>> just a key/value map. > >>> > >>> 2. Yes, this is what we had in mind. It should still be the same > >>> configuration option. We would like to avoid specialized option keys > >>> across components (exec.max-para and table.exec.max-para) if they are > >>> describing basically the same thing. But adding some more description > >>> like "TableOptions.MAX_PARALLELISM with description_1 + description_2" > >>> does not hurt. > >>> > >>> 3. They should restore the original object given that the > >>> toConfiguration/fromConfiguration methods have been implemented > >>> correctly. I will extend the example to make the logic clearer while > >>> fixing the bug. > >>> > >>> Thanks for the healthy discussion, > >>> Timo > >>> > >>> > >>> On 30.08.19 15:29, Becket Qin wrote: > >>>> Hi Timo, > >>>> > >>>> Thanks again for the clarification. Please see a few more questions > >>> below. > >>>> Re: 1 > >>>> > >>>>> Please also keep in mind that Configuration must not consist of only > >>>>> strings, it manages a Map<String, Object> for efficient access. Every > >>>>> map entry can have a string representation for persistence, but in > >>>>> most > >>>>> cases consists of unserialized objects. > >>>> I'd like to understand this a bit more. The reason we have a > >>>> Map<String, > >>>> Object> in Configuration was because that Object could be either a > >>> String, > >>>> a List or a Map, right? But they eventually always boil down to > >>>> Strings, > >>> or > >>>> maybe one of the predefined type that we know how to serialize. In the > >>>> current design, can the Object also be Configurable? > >>>> If the value in the config Map<String, Object> can be Configurable > >>> objects, > >>>> how do we serialize them? Calling toConfiguration() seems not quite > >>> working > >>>> because there might be some other internal fields that are not part of > >>> the > >>>> configuration. The modification to those fields will be lost if we > >>>> simply > >>>> use toConfiguration(). So the impact of modifying those Configurable > >>>> objects seems a little undefined. And it would be difficult to prevent > >>>> users from doing that. > >>>> If the value in the config Map<String, Object> cannot be Configurable > >>>> objects, then it seems a little weird to have all the other ConfigType > >>>> stored in the ConfigMap in their own native type and accessed via > >>>> getInteger() / getBoolean(), etc, while having ConfigurableType to be > >>>> different from others because one have to use ConfigurableFactory > >>>> to get > >>>> the configurations. > >>>> > >>>> Re: 2 > >>>> > >>>>> I think about the withExtendedDescription as a helper getter in a > >>>>> different place, so that the option is easier to find in a different > >>>>> module from it was defined. > >>>>> The MAX_PARALLELISM option in TableOptions would conceptually be > >>>>> equal > >>> to: > >>>>> public ConfigOption getMaxParallelismOption() { > >>>>> return CoreOptions.MAX_PARALLELISM; > >>>>> } > >>>> Just to make sure I understand it correctly, does that mean users will > >>> see > >>>> something like following? > >>>> - CoreOptions.MAX_PARALLELISM with description_1; > >>>> - TableOptions.MAX_PARALLELISM with description_1 + description_2. > >>>> - DataStreamOptions.MAX_PARALLELISM with description_1 + > >>>> description_3. > >>>> And users will only configure exactly one MAX_PARALLELISM cross the > >>> board. > >>>> So they won't be confused by setting two MAX_PARALLELISM config for > >>>> two > >>>> different modules, while they are actually the same config. If that is > >>> the > >>>> case, I don't have further concern. > >>>> > >>>> Re: 3 > >>>> Maybe I am thinking too much. I thought toBytes() / fromBytes() > >>>> actually > >>>> restore the original Object. But fromConfiguration() and > >>> toConfiguration() > >>>> does not do that, anything not in the configuration of the original > >>> object > >>>> will be lost. So it would be good to make that clear. Maybe a clear > >>>> Java > >>>> doc is sufficient. > >>>> > >>>> Thanks, > >>>> > >>>> Jiangjie (Becket) Qin > >>>> > >>>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz > >>>> <[hidden email] > >>>> > >>>> wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> Ad. 1 > >>>>> > >>>>> The advantage of our approach is that you have the type definition > >>>>> close > >>>>> to the option definition. The only difference is that it enables > >>>>> expressing simple pojos with the primitives like > >>>>> ConfigOption<Integer>, > >>>>> ConfigOption<Long> etc. Otherwise as Timo said you will start having > >>>>> > >>>>> the parsing logic scattered everywhere in the code base as it is now. > >>>>> The string representation in our proposal is exactly the same as you > >>>>> explained for those three options. The only difference is that you > >>>>> don't > >>>>> have to parse the elements of a List, Map etc. afterwards. > >>>>> > >>>>> Ad. 2 > >>>>> > >>>>> I think about the withExtendedDescription as a helper getter in a > >>>>> different place, so that the option is easier to find in a different > >>>>> module from it was defined. > >>>>> > >>>>> The MAX_PARALLELISM option in TableOptions would conceptually be > >>>>> equal > >>> to: > >>>>> public ConfigOption getMaxParallelismOption() { > >>>>> > >>>>> return CoreOptions.MAX_PARALLELISM; > >>>>> > >>>>> } > >>>>> > >>>>> This allows to further clarify the description of the option in the > >>>>> context of a different module and end up in a seperate page in > >>>>> documentation (but with a link to the original one). In the end it is > >>>>> exactly the same option. It has exactly same key, type, parsing > >>>>> logic, > >>>>> it is in the end forwarded to the same place. > >>>>> > >>>>> Ad. 3 > >>>>> > >>>>> Not sure if I understand your concerns here. As Timo said it is in > >>>>> the > >>>>> end sth similar to toBytes/fromBytes, but it puts itself to a > >>>>> Configuration. Also just wanted to make sure we adjusted this part > >>>>> slightly and now the ConfigOption takes ConfigurableFactory. > >>>>> > >>>>> Best, > >>>>> > >>>>> Dawid > >>>>> > >>>>> > >>>>> On 30/08/2019 09:39, Timo Walther wrote: > >>>>>> Hi Becket, > >>>>>> > >>>>>> thanks for the discussion. > >>>>>> > >>>>>> 1. ConfigOptions in their current design are bound to classes. > >>>>>> Regarding, the option is "creating some Configurable objects instead > >>>>>> of defining the config to create > >>>>>> those Configurable"? We just moved this logic to a factory, this > >>>>>> factory can then also be used for other purposes. However, how the > >>>>>> option and objects are serialized to Configuration is still not part > >>>>>> of the option. The option is just pure declaration. > >>>>>> > >>>>>> If we would allow only List<String>, implementers would need to > >>>>>> start > >>>>>> implementing own parsing and validation logic all the time. We would > >>>>>> like to avoid that. > >>>>>> > >>>>>> Please also keep in mind that Configuration must not consist of only > >>>>>> strings, it manages a Map<String, Object> for efficient access. > >>>>>> Every > >>>>>> map entry can have a string representation for persistence, but in > >>>>>> most cases consists of unserialized objects. > >>>>>> > >>>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite > >>>>>> keys, types or default values. But different layers might want to > >>>>>> add > >>>>>> helpful information. In our concrete use case for FLIP-59, > >>>>>> ExecutionConfig has 50 properties and many of them are not relevant > >>>>>> for the Table layer or have no effect at all. We would like to list > >>>>>> and mention the most important config options again in the Table > >>>>>> Configuration section, so that users are not confused, but with a > >>>>>> strong link to the core option. E.g.: registered kryo serializers > >>>>>> are > >>>>>> also important also for Table users, we would like to add the > >>>>>> comment > >>>>>> "This option allows to modify the serialization of the ANY SQL data > >>>>>> type.". I think we should not spam the core configuration page with > >>>>>> comments from all layers, connectors, or libraries but keep this in > >>>>>> the corresponding component documentation. > >>>>>> > >>>>>> 3. But it is something like fromBytes() and toBytes()? It serializes > >>>>>> and deserializes T from a configuration? > >>>>>> > >>>>>> Regards, > >>>>>> Timo > >>>>>> > >>>>>> On 29.08.19 19:14, Becket Qin wrote: > >>>>>>> Hi Timo and Stephan, > >>>>>>> > >>>>>>> Thanks for the detail explanation. > >>>>>>> > >>>>>>> 1. I agree that each config should be in a human readable > >>>>>>> format. My > >>>>>>> concern is that the current List<Configurable> looks going a little > >>>>>>> too far > >>>>>>> from what the configuration is supposed to do. They are essentially > >>>>>>> creating some Configurable objects instead of defining the > >>>>>>> config to > >>>>>>> create > >>>>>>> those Configurable. This mixes ConfigOption and the usage of it. > >>>>>>> API > >>>>>>> wise > >>>>>>> it would be good to keep the configs and their usages (such as > >>>>>>> how to > >>>>>>> create objects using the ConfigOption) apart from each other. > >>>>>>> I am wondering if we can just make List also only take string. For > >>>>>>> example, > >>>>>>> is the following definition of map and list sufficient? > >>>>>>> > >>>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be > >>>>>>> defined > >>>>>>> as: > >>>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ... > >>>>>>> > >>>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined > >>> as: > >>>>>>> list_config_name: v1, v2, v3, ... > >>>>>>> > >>>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, > >>>>>>> String>>. It > >>>>>>> can > >>>>>>> be defined as: > >>>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... > >>>>>>> > >>>>>>> All the key and values in the configuration are String. This also > >>>>>>> guarantees that the configuration is always serializable. > >>>>>>> If we want to do one more step, we can allow the ConfigOption to > >>>>>>> set > >>> all > >>>>>>> the primitive types and parse that for the users. So something like > >>>>>>> List<Integer>, List<Class<?>> seems fine. > >>>>>>> > >>>>>>> The configuration class could also have util methods to create a > >>>>>>> list > >>> of > >>>>>>> configurable such as: > >>>>>>> <T> List<T> > >>> <Configuration#getConfigurableInstances(ListMapConfigOption, > >>>>>>> Class<T> clazz). > >>>>>>> But the configuration class will not take arbitrary Configurable as > >>> the > >>>>>>> value of its config. > >>>>>>> > >>>>>>> 2. I might have misunderstood this. But my concern on the > >>>>>>> description > >>>>>>> extension is in the following example. > >>>>>>> > >>>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM = > >>>>>>> > >>>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( > >>>>>>> "Note: That this property means that a table program has a > >>>>>>> side-effect > >>>>>>> XYZ."); > >>>>>>> > >>>>>>> In this case, we will have two MAX_PARALLELISM configs now. One is > >>>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I > >>>>>>> suppose > >>>>>>> users > >>>>>>> will see both configurations. One with an extended description > >>>>>>> and one > >>>>>>> without. Let's say there is a third component which also users > >>>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM > >>>>>>> ConfigOption? If > >>>>>>> so, what would that ConfigOption's description look like? > >>>>>>> > >>>>>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM > >>>>>>> and the > >>>>>>> description should clearly state all the usage of this > >>>>>>> ConfigOption. > >>>>>>> > >>>>>>> 3. I see, in that case, how about we name it something like > >>>>>>> extractConfiguration()? I am just trying to see if we can make it > >>> clear > >>>>>>> this is not something like fromBytes() and toBytes(). > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Jiangjie (Becket) Qin > >>>>>>> > >>>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]> > >>>>> wrote: > >>>>>>>> Hi Becket, > >>>>>>>> > >>>>>>>> let me try to clarify some of your questions: > >>>>>>>> > >>>>>>>> 1. For every option, we also needed to think about how to > >>>>>>>> represent > >>> it > >>>>>>>> in a human readable format. We do not want to allow arbitrary > >>>>>>>> nesting > >>>>>>>> because that would easily allow to bypass the flattened > >>>>>>>> hierarchy of > >>>>>>>> config options (`session.memory.min`). The current design > >>>>>>>> allows to > >>>>>>>> represent every option type as a list. E.g.: > >>>>>>>> > >>>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12` > >>>>>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption: > >>>>>>>> field=12,other=true; field=12,other=true` > >>>>>>>> `myPropertyOption: key=str0,other=str1` can be > >>>>>>>> `myPropertyListOption: > >>>>>>>> key=str0,other=str1;key=str0,other=str1` > >>>>>>>> > >>>>>>>> We need the atomic class for serialization/deserialization both in > >>>>>>>> binary and string format. > >>>>>>>> > >>>>>>>> ConfigOption<List> is not present in the code base yet, but this > >>>>>>>> FLIP is > >>>>>>>> a preparation of making ExecutionConfig configurable. If you look > >>> into > >>>>>>>> this class or also in existing table connectors/formats, you > >>>>>>>> will see > >>>>>>>> that each proposed option type has its requirements. > >>>>>>>> > >>>>>>>> 2. Regarding extending the description of ConfigOptions, the > >>>>>>>> semantic of > >>>>>>>> one option should be a super set of the other option. E.g. in > >>>>>>>> Table > >>> API > >>>>>>>> we might use general ExecutionConfig properties. But we would like > >>>>>>>> to a) > >>>>>>>> make external options more prominent in the Table API config > >>>>>>>> docs to > >>>>>>>> link people to properties they should pay attention b) notice > >>>>>>>> about > >>>>>>>> side > >>>>>>>> effects. The core semantic of a property should not change. > >>>>>>>> > >>>>>>>> 3. The factory will not receive the entire configuration but works > >>> in a > >>>>>>>> separate key space. For `myObjectOption` above, it would receive a > >>>>>>>> configuration that consists of `field: 12` and `other: true`. > >>>>>>>> > >>>>>>>> I agree. I will convert the document into a Wiki page today. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Timo > >>>>>>>> > >>>>>>>> On 29.08.19 09:00, Stephan Ewen wrote: > >>>>>>>>> @Becket One thing that may be non-obvious is that the > >>>>>>>>> Configuration > >>>>>>>>> class > >>>>>>>>> also defines serialization / persistence logic at the moment. > >>>>>>>>> So it > >>>>>>>>> needs > >>>>>>>>> to know the set of types it supports. That stands in the way > >>>>>>>>> of an > >>>>>>>>> arbitrary generic map type. > >>>>>>>>> > >>>>>>>>> @Timo I agree though that it seems a bit inconsistent to have one > >>>>>>>>> collection orthogonal to the type (List) and another one bound to > >>>>>>>> specific > >>>>>>>>> types (Map). > >>>>>>>>> > >>>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email] > > > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi Timo, > >>>>>>>>>> > >>>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I > >>>>>>>>>> have a > >>>>>>>>>> few > >>>>>>>>>> questions / comments. > >>>>>>>>>> > >>>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption? > >>>>>>>>>> Would it be enough to just check the atomicClass to see if it > >>>>>>>>>> is a > >>>>>>>>>> List > >>>>>>>> or > >>>>>>>>>> not? > >>>>>>>>>> Also, in the ConfigOption<Map> class case, are we always assume > >>>>>>>>>> both key > >>>>>>>>>> and value types are String? Can we just apply the same to the > >>>>>>>>>> ConfigOption<List>? > >>>>>>>>>> BTW, I did a quick search in the codebase but did not find any > >>>>>>>>>> usage of > >>>>>>>>>> ConfigOption<List>. > >>>>>>>>>> > >>>>>>>>>> 2. The same config name, but with two ConfigOption with > >>>>>>>>>> different > >>>>>>>> semantic > >>>>>>>>>> in different component seems super confusing. For example, when > >>> users > >>>>>>>> set > >>>>>>>>>> both configs, they may have no idea one is overriding the other. > >>>>>>>>>> There > >>>>>>>>>> might be two cases: > >>>>>>>>>> - If it is just the same config used by different > >>>>>>>>>> components to > >>>>>>>>>> act > >>>>>>>>>> accordingly, it might be better to just have one config, but > >>> describe > >>>>>>>>>> clearly on how that config will be used. > >>>>>>>>>> - If it is actually two configurations that can be set > >>>>>>>>>> differently, I > >>>>>>>>>> think the config names should just be different. > >>>>>>>>>> > >>>>>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() > >>> method > >>>>>>>>>> pretty much means getConfiguration()? The toConfiguration() > >>>>>>>>>> method > >>>>>>>> sounds > >>>>>>>>>> like converting an object to a configuration, which only > >>>>>>>>>> works if > >>> the > >>>>>>>>>> object does not contain any state / value. I am also > >>>>>>>>>> wondering if > >>>>>>>>>> there > >>>>>>>> is > >>>>>>>>>> a real use case of this method. Because supposedly the > >>> configurations > >>>>>>>> could > >>>>>>>>>> just be passed around to caller of this method. > >>>>>>>>>> > >>>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of > >>>>>>>>>> in the > >>>>>>>>>> doc before voting? The FLIP wiki allows track the modification > >>>>>>>>>> history > >>>>>>>> and > >>>>>>>>>> has a more established structure to ensure nothing is missed. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> > >>>>>>>>>> Jiangjie (Becket) Qin > >>>>>>>>>> > >>>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther > >>>>>>>>>> <[hidden email]> > >>>>>>>> wrote: > >>>>>>>>>>> Hi everyone, > >>>>>>>>>>> > >>>>>>>>>>> I updated the FLIP proposal one more time as mentioned in the > >>> voting > >>>>>>>>>>> thread. If there are no objections, I will start a new voting > >>> thread > >>>>>>>>>>> tomorrow at 9am Berlin time. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Timo > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote: > >>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>> > >>>>>>>>>>>> thanks for all the feedback we have received online and > >>>>>>>>>>>> offline. > >>> It > >>>>>>>>>>>> showed that many people support the idea of evolving the Flink > >>>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP > >>>>>>>>>>>> will > >>>>>>>>>>>> not > >>>>>>>>>>>> solve all issues but at least will improve the current status. > >>>>>>>>>>>> > >>>>>>>>>>>> We've updated the document and replaced the Correlation part > >>>>>>>>>>>> with the > >>>>>>>>>>>> concept of a ConfigOptionGroup that can provide all available > >>>>>>>>>>>> options > >>>>>>>>>>>> of a group plus custom group validators for eager validation. > >>>>>>>>>>>> For now, > >>>>>>>>>>>> this eager group validation will only be used at certain > >>>>>>>>>>>> locations in > >>>>>>>>>>>> the Flink code but it prepares for maybe validating the entire > >>>>>>>>>>>> global > >>>>>>>>>>>> configuration before submitting a job in the future. > >>>>>>>>>>>> > >>>>>>>>>>>> Please take another look if you find time. I hope we can > >>>>>>>>>>>> proceed > >>>>>>>>>>>> with > >>>>>>>>>>>> the voting process if there are no objections. > >>>>>>>>>>>> > >>>>>>>>>>>> Regards, > >>>>>>>>>>>> Timo > >>>>>>>>>>>> > >>>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: > >>>>>>>>>>>>> Hi Stephan, > >>>>>>>>>>>>> > >>>>>>>>>>>>> thanks for your suggestions. Let me give you some background > >>> about > >>>>>>>>>>>>> the decisions made in this FLIP: > >>>>>>>>>>>>> > >>>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" > >>>>>>>>>>>>> because we > >>> did > >>>>>>>>>>>>> not want to change the entire configuration infrastructure. > >>>>>>>>>>>>> Both for > >>>>>>>>>>>>> backwards-compatibility reasons and the amount of work that > >>>>>>>>>>>>> would be > >>>>>>>>>>>>> required to update all options. If our goal is to rework the > >>>>>>>>>>>>> configuration option entirely, I might suggest to switch > >>>>>>>>>>>>> to JSON > >>>>>>>>>>>>> format with JSON schema and JSON validator. However, setting > >>>>>>>>>>>>> properties in a CLI or web interface becomes more tricky the > >>> more > >>>>>>>>>>>>> nested structures are allowed. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is > >>>>>>>>>>>>> centered > >>>>>>>>>>>>> around Java classes where T is determined by the default > >>>>>>>>>>>>> value. > >>>>>>>>>>>>> The > >>>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit > >>>>>>>>>>>>> `intType()` method etc. The current design of validators > >>> centered > >>>>>>>>>>>>> around Java classes makes it possible to have typical domain > >>>>>>>>>>>>> validators baked by generics as you suggested. If we > >>>>>>>>>>>>> introduce > >>>>>>>>>>>>> types > >>>>>>>>>>>>> such as "quantity with measure and unit" we still need to > >>>>>>>>>>>>> get a > >>>>>>>>>>>>> class > >>>>>>>>>>>>> out of this option at the end, so why changing a proven > >>>>>>>>>>>>> concept? > >>>>>>>>>>>>> > >>>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary > >>>>>>>>>>>>> nesting. As > >>>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every > >>>>>>>>>>>>> atomic > >>>>>>>>>>>>> option like "key=12" can be represented by a list > >>>>>>>>>>>>> "keys=12;13". > >>>>>>>>>>>>> But > >>>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated > >>>>>>>>>>>>> list > >>>>>>>>>>>>> option > >>>>>>>>>>>>> would start making this more complicated such as > >>>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), > >>>>>>>>>>>>> StringOption(...)))", do we want that? > >>>>>>>>>>>>> > >>>>>>>>>>>>> 4. Correlation: The correlation part is one of the > >>>>>>>>>>>>> suggestions > >>>>>>>>>>>>> that I > >>>>>>>>>>>>> like least in the document. We can also discuss removing it > >>>>>>>>>>>>> entirely, > >>>>>>>>>>>>> but I think it solves the use case of relating options > >>>>>>>>>>>>> with each > >>>>>>>>>>>>> other in a flexible way right next to the actual option. > >>>>>>>>>>>>> Instead of > >>>>>>>>>>>>> being hidden in some component initialization, we should > >>>>>>>>>>>>> put it > >>>>>>>>>>>>> close > >>>>>>>>>>>>> to the option to also perform validation eagerly instead of > >>>>>>>>>>>>> failing > >>>>>>>>>>>>> at runtime when the option is accessed the first time. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Regards, > >>>>>>>>>>>>> Timo > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: > >>>>>>>>>>>>>> A "List Type" sounds like a good direction to me. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. The > >>>>>>>>>>>>>> idea is > >>>>>>>>>>>>>> to see > >>>>>>>>>>>>>> if something like that can ease validation. Especially the > >>>>>>>>>> correlation > >>>>>>>>>>>>>> system seems quite complex (proxies to work around order of > >>>>>>>>>>>>>> initialization). > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> For example, let's assume we don't think primarily about > >>>>>>>>>>>>>> "java > >>>>>>>>>>>>>> types" but > >>>>>>>>>>>>>> would define types as one of the following (just examples, > >>>>>>>>>>>>>> haven't > >>>>>>>>>>>>>> thought > >>>>>>>>>>>>>> all the details through): > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> (a) category type: implies string, and a fix set of > >>> possible > >>>>>>>>>> values. > >>>>>>>>>>>>>> Those would be passes and naturally make it into the docs > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>> validation. > >>>>>>>>>>>>>> Maps to a String or Enum in Java. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> (b) numeric integer type: implies long (or optionally > >>>>>>>>>>>>>> integer, > >>>>>>>> if > >>>>>>>>>>>>>> we want > >>>>>>>>>>>>>> to automatically check overflow / underflow). would take > >>> typical > >>>>>>>>>> domain > >>>>>>>>>>>>>> validators, like non-negative, etc. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> (c) numeric real type: same as above (double or > >>>>>>>>>>>>>> float) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> (d) numeric interval type: either defined as an > >>> interval, or > >>>>>>>>>>>>>> references > >>>>>>>>>>>>>> other parameter by key. validation by valid interval. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> (e) quantity: a measure and a unit. separately > >>>>>>>>>>>>>> parsable. > >>> The > >>>>>>>>>>>>>> measure's > >>>>>>>>>>>>>> type could be any of the numeric types above, with same > >>>>>>>>>>>>>> validation > >>>>>>>>>>>>>> rules. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> With a system like the above, would we still correlation > >>>>>>>>>>>>>> validators? > >>>>>>>>>>>>>> Are > >>>>>>>>>>>>>> there still cases that we need to catch early (config > >>>>>>>>>>>>>> loading) > >>> or > >>>>>>>>>>>>>> are the > >>>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup > >>> specific, > >>>>>>>>>>>>>> that it is > >>>>>>>>>>>>>> fine to handle them in component initialization? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz > >>>>>>>>>>>>>> <[hidden email]> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi Stephan, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thank you for your opinion. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Actually list/composite types are the topics we spent the > >>>>>>>>>>>>>>> most of > >>>>>>>>>> the > >>>>>>>>>>>>>>> time. I understand that from a perspective of a full blown > >>> type > >>>>>>>>>>>>>>> system, > >>>>>>>>>>>>>>> a field like isList may look weird. Please let me > >>>>>>>>>>>>>>> elaborate a > >>>>>>>>>>>>>>> bit > >>>>>>>>>> more > >>>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear > >>>>>>>>>>>>>>> enough > >>>>>>>>>>>>>>> about > >>>>>>>>>> it > >>>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is > >>>>>>>>>>>>>>> that they > >>>>>>>>>>>>>>> must > >>>>>>>>>>>>>>> have a string representation as they might come from a > >>>>>>>> configuration > >>>>>>>>>>>>>>> file. Moreover it must be a human readable format, so > >>>>>>>>>>>>>>> that the > >>>>>>>>>> values > >>>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did not > >>>>>>>>>>>>>>> want to > >>>>>>>>>>>>>>> add a > >>>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for > >>>>>>>>>>>>>>> lists > >>>>>>>>>> only > >>>>>>>>>>>>>>> (and flat objects - I think though in the current design > >>>>>>>>>>>>>>> there is a > >>>>>>>>>>>>>>> mistake around the Configurable interface). I think though > >>>>>>>>>>>>>>> you have > >>>>>>>>>> a > >>>>>>>>>>>>>>> point here and it would be better to have a > >>>>>>>>>>>>>>> ListConfigOption > >>>>>>>>>>>>>>> instead of > >>>>>>>>>>>>>>> this field. Does it make sense to you? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> As for the second part of your message. I am not sure if I > >>>>>>>>>> understood > >>>>>>>>>>>>>>> it. The validators work with parse/deserialized values from > >>>>>>>>>>>>>>> Configuration that means they can be bound to the generic > >>>>>>>>>>>>>>> parameter > >>>>>>>>>> of > >>>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends > >>>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the > >>>>>>>>>>>>>>> ConfigOption > >>>>>>>>>>>>>>> has anything to do with the validation logic. Could you > >>>>>>>>>>>>>>> elaborate a > >>>>>>>>>>>>>>> bit > >>>>>>>>>>>>>>> more what did you mean? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Dawid > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: > >>>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to do > >>> early > >>>>>>>>>>>>>>> validation. > >>>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad > >>>>>>>>>>>>>>>> hoc, > >>>>>>>>>>>>>>>> though. For > >>>>>>>>>>>>>>>> example, having a boolean "isList" is a clear > >>>>>>>>>>>>>>>> indication of > >>> not > >>>>>>>>>>>>>>>> having > >>>>>>>>>>>>>>>> thought through the type/category system. > >>>>>>>>>>>>>>>> Also, having a more clear category system makes validation > >>>>>>>> simpler. > >>>>>>>>>>>>>>>> For example, I have seen systems distinguishing between > >>> numeric > >>>>>>>>>>>>>>> parameters > >>>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible > >>>>>>>>>>>>>>>> values), > >>>>>>>>>>>>>>>> quantities > >>>>>>>>>>>>>>>> like duration and memory size (need measure and unit), > >>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>> results in > >>>>>>>>>>>>>>> an > >>>>>>>>>>>>>>>> elegant system for validation. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < > >>>>>>>>>> [hidden email] > >>>>>>>>>>>>>>> .invalid> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. > >>>>>>>>>>>>>>>>> This allows the currently cluttered configuration of > >>>>>>>>>>>>>>>>> various > >>>>>>>>>>>>>>>>> modules to be unified. > >>>>>>>>>>>>>>>>> This is also first step of one of the keys to making new > >>>>>>>>>>>>>>>>> unified > >>>>>>>>>>>>>>>>> TableEnvironment available for production. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Previously, we did encounter complex configurations, > >>>>>>>>>>>>>>>>> such as > >>>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The > >>>>>>>>>>>>>>>>> skew may > >>>>>>>>>>>>>>>>> be a single field or a combination of multiple > >>>>>>>>>>>>>>>>> fields. > >>>>>>>>>>>>>>>>> So the > >>>>>>>>>>>>>>>>> configuration is very troublesome. We used JSON > >>>>>>>>>>>>>>>>> string > >>> to > >>>>>>>>>>>>>>>>> configure it. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> Jingsong Lee > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>> ------------------------------------------------------------------ > >>>>>>>>>>>>>>>>> From:Jark Wu <[hidden email]> > >>>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 > >>>>>>>>>>>>>>>>> To:dev <[hidden email]> > >>>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and > >>>>>>>>>> Configuration > >>>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a > >>> long > >>>>>>>>>> time. > >>>>>>>>>>>>>>>>> We have seen the benefit when developing blink > >>>>>>>>>>>>>>>>> configurations and > >>>>>>>>>>>>>>> connector > >>>>>>>>>>>>>>>>> properties in 1.9 release. > >>>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed > >>>>>>>>>>>>>>>>> design. > >>>>>>>>>>>>>>>>> I will leave my thoughts and comments there. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen < > >>> [hidden email] > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hi Timo, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which > >>>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of > >>>>>>>>>>>>>>>>>> Flink. > >>>>>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment > >>>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by > >>>>>>>>>> configuration. > >>>>>>>>>>>>>>>>>> Will take a look at the document in days. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>> tison. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 > >>>>>>>>>>>>>>>>>> 下午10:12写道: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of > >>>>>>>>>>>>>>>>>>> ExecutionConfig and > >>>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is > >>>>>>>>>>>>>>>>>>> necessary > >>>>>>>>>>>>>>>>>>> to make > >>>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, > >>>>>>>>>>>>>>>>>>> with > >>> the > >>>>>>>>>>>>>>>>>>> new SQL > >>>>>>>>>>>>>>>>> DDL > >>>>>>>>>>>>>>>>>>> based on properties as well as more connectors and > >>>>>>>>>>>>>>>>>>> formats > >>>>>>>>>>>>>>>>>>> coming up, > >>>>>>>>>>>>>>>>>>> unified configuration becomes more important. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> We need more features around string-based configuration > >>>>>>>>>>>>>>>>>>> in the > >>>>>>>>>>>>>>>>>>> future, > >>>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose > >>>>>>>>>>>>>>>>>>> FLIP-54 for > >>>>>>>>>>>>>>>>>>> evolving > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>> > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > >>> > >>>>>>>>>>>>>>>>>>> In summary it adds: > >>>>>>>>>>>>>>>>>>> - documented types and validation > >>>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, list > >>>>>>>>>>>>>>>>>>> - simple non-nested object types > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Looking forward to your feedback, > >>>>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>> > > > > |
Hi,
Regarding the factory and duplicate() and whatnot, wouldn’t it work to have a factory like this: /** * Allows to read and write an instance from and to {@link Configuration}. A configurable instance * operates in its own key space in {@link Configuration} and will be (de)prefixed by the framework. It cannot access keys from other options. A factory must have a default constructor. * */ public interface ConfigurableFactory<T extends Configurable> { /** * Creates an instance from the given configuration. */ T fromConfiguration(ConfigurationReader configuration); } with Configurable being: /** * Allows to read and write an instance from and to {@link Configuration}. A configurable instance * operates in its own key space in {@link Configuration} and will be (de)prefixed by the framework. It cannot access keys from other options. A factory must have a default constructor. * */ public interface Configurable { /** * Writes this instance to the given configuration. */ void writeToConfiguration(ConfigurationWriter configuration); // method name TBD } This would make the Configurable immutable and we wouldn’t need a duplicate() method. Best, Aljoscha > On 2. Sep 2019, at 14:40, Becket Qin <[hidden email]> wrote: > > Hi Timo and Dawid, > > Thanks for the patient reply. I agree that both option a) and option b) can > solve the mutability problem. > > For option a), is it a little intrusive to add a duplicate() method for a > Configurable? It would be great if we don't put this burden on users if > possible. > > For option b), I actually feel it is slightly better than option a) from > API perspective as getFactory() seems a more understandable method of a > Configurable compared with duplicate(). And users do not need to implement > much more logic. > > I am curious what is the downside of keeping the Configuration simple to > only have primitive types, and always create the Configurable using a util > method? If Configurables are rare, do we need to put the instantiation / > bookkeeping logic in Configuration? > > @Becket for the toConfiguration this is required for shipping the >> Configuration to TaskManager, so that we do not have to use java >> serializability. > > Do you mean a Configurable may be created and configured directly without > reading settings from a Configuration instance? I thought a Configurable > will always be created via a ConfigurableFactory by extracting required > configs from a Configuration. Am I missing something? > > Thanks, > > Jiangjie (Becket) Qin > > On Mon, Sep 2, 2019 at 4:47 PM Dawid Wysakowicz <[hidden email]> > wrote: > >> Hi Timo, Becket >> >> From the options that Timo suggested for improving the mutability >> situation I would prefer option a) as this is the more explicit option >> and simpler option. Just as a remark, I think in general Configurable >> types for options will be rather very rare for some special use-cases, >> as the majority of options are rather simpler parameters of primitive >> types (+duration, memory) >> >> @Becket for the toConfiguration this is required for shipping the >> Configuration to TaskManager, so that we do not have to use java >> serializability. >> >> Best, >> >> Dawid >> >> >> On 02/09/2019 10:05, Timo Walther wrote: >>> Hi Becket, >>> >>> Re 1 & 3: "values in configurations should actually be immutable" >>> >>> I would also prefer immutability but most of our configuration is >>> mutable due to serialization/deserialization. Also maps and list could >>> be mutable in theory. It is difficult to really enforce that for >>> nested structures. I see two options: >>> >>> a) For the original design: How about we force implementers to add a >>> duplicate() method in a Configurable object? This would mean that the >>> object is still mutable but by duplicating the object both during >>> reading and writing we would avoid the problem you described. >>> >>> b) For the current design: We still use the factory approach but let a >>> Configurable object implement a getFactory() method such that we know >>> how to serialize the object. With the help of a factory we can also >>> duplicate the object easily during reading and writing and ensure >>> immutability. >>> >>> I would personally go for approach a) to not over-engineer this topic. >>> But I'm open for option b). >>> >>> Regards, >>> Timo >>> >>> >>> On 31.08.19 04:09, Becket Qin wrote: >>>> Hi Timo, >>>> >>>> Thanks for the reply. I am still a little concerned over the >>>> mutability of >>>> the Configurable which could be the value in Configuration. >>>> >>>> Re: 1 >>>> >>>>> But in general, people should not use any internal fields. >>>>> Configurable objects are meant for simple little helper POJOs, not >>>>> complex arbitrary nested data structures. >>>> This seems difficult to enforce... Ideally the values in configurations >>>> should actually be immutable. The value can only be changed by >>>> explicitly >>>> calling setters in Configuration. Otherwise we may have weird situation >>>> where the Configurable in the same configuration are different in two >>>> places because the configurable is modified in one places and not >>>> modified >>>> in another place. So I am a little concerned on putting a >>>> Configurable type >>>> in the Configuration map, because the value could be modified without >>>> explicitly setting the configuration. For example, can users do the >>>> following? >>>> >>>> Configurable configurable = >>>> configuration.getConfigurable(myConfigurableOption); >>>> configurable.setConfigA(123); // this already changes the configurable >>>> object in the configuration. >>>> >>>> Re: 2 >>>> Thanks for confirming. As long as users will not have a situation where >>>> they need to set two configurations with the same key but different >>>> descriptions, I think it is OK. >>>> >>>> Re: 3 >>>> This is actually kind of related to 1. i.e. Whether toConfiguration() >>>> guarantees the exact same object can be rebuilt from the >>>> configuration or >>>> not. I am still not quite sure about the use case of toConfiguration() >>>> though. It seems indicating the Configurable is mutable, which might be >>>> dangerous. >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <[hidden email]> >>>> wrote: >>>> >>>>> Hi Becket, >>>>> >>>>> 1. First of all, you are totally right. The FLIP contains a bug due to >>>>> the last minute changes that Dawid suggested: by having immutable >>>>> objects created by a factory we loose the serializability of the >>>>> Configuration because the factory itself is not stored in the >>>>> Configuration. I would propose to revert the last change and stick to >>>>> the original design, which means that a object must implement the >>>>> Configurable interface and also implements >>>>> serialization/deserialization >>>>> methods such that also internal fields can be persisted as you >>>>> suggested. But in general, people should not use any internal fields. >>>>> Configurable objects are meant for simple little helper POJOs, not >>>>> complex arbitrary nested data structures. >>>>> >>>>> It is Map<String, Object> because Configuration stores the raw objects. >>>>> If you put a Boolean option into it, it remains Boolean. This makes the >>>>> map very efficient for shipping to the cluster and accessing options >>>>> multiple times. The same for configurable objects. We put the pure >>>>> objects into the map without any serialization/deserialization. The >>>>> provided factory allows to convert the Object into a Configuration and >>>>> we know how to serialize/deserializise a configuration because it is >>>>> just a key/value map. >>>>> >>>>> 2. Yes, this is what we had in mind. It should still be the same >>>>> configuration option. We would like to avoid specialized option keys >>>>> across components (exec.max-para and table.exec.max-para) if they are >>>>> describing basically the same thing. But adding some more description >>>>> like "TableOptions.MAX_PARALLELISM with description_1 + description_2" >>>>> does not hurt. >>>>> >>>>> 3. They should restore the original object given that the >>>>> toConfiguration/fromConfiguration methods have been implemented >>>>> correctly. I will extend the example to make the logic clearer while >>>>> fixing the bug. >>>>> >>>>> Thanks for the healthy discussion, >>>>> Timo >>>>> >>>>> >>>>> On 30.08.19 15:29, Becket Qin wrote: >>>>>> Hi Timo, >>>>>> >>>>>> Thanks again for the clarification. Please see a few more questions >>>>> below. >>>>>> Re: 1 >>>>>> >>>>>>> Please also keep in mind that Configuration must not consist of only >>>>>>> strings, it manages a Map<String, Object> for efficient access. Every >>>>>>> map entry can have a string representation for persistence, but in >>>>>>> most >>>>>>> cases consists of unserialized objects. >>>>>> I'd like to understand this a bit more. The reason we have a >>>>>> Map<String, >>>>>> Object> in Configuration was because that Object could be either a >>>>> String, >>>>>> a List or a Map, right? But they eventually always boil down to >>>>>> Strings, >>>>> or >>>>>> maybe one of the predefined type that we know how to serialize. In the >>>>>> current design, can the Object also be Configurable? >>>>>> If the value in the config Map<String, Object> can be Configurable >>>>> objects, >>>>>> how do we serialize them? Calling toConfiguration() seems not quite >>>>> working >>>>>> because there might be some other internal fields that are not part of >>>>> the >>>>>> configuration. The modification to those fields will be lost if we >>>>>> simply >>>>>> use toConfiguration(). So the impact of modifying those Configurable >>>>>> objects seems a little undefined. And it would be difficult to prevent >>>>>> users from doing that. >>>>>> If the value in the config Map<String, Object> cannot be Configurable >>>>>> objects, then it seems a little weird to have all the other ConfigType >>>>>> stored in the ConfigMap in their own native type and accessed via >>>>>> getInteger() / getBoolean(), etc, while having ConfigurableType to be >>>>>> different from others because one have to use ConfigurableFactory >>>>>> to get >>>>>> the configurations. >>>>>> >>>>>> Re: 2 >>>>>> >>>>>>> I think about the withExtendedDescription as a helper getter in a >>>>>>> different place, so that the option is easier to find in a different >>>>>>> module from it was defined. >>>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>>>> equal >>>>> to: >>>>>>> public ConfigOption getMaxParallelismOption() { >>>>>>> return CoreOptions.MAX_PARALLELISM; >>>>>>> } >>>>>> Just to make sure I understand it correctly, does that mean users will >>>>> see >>>>>> something like following? >>>>>> - CoreOptions.MAX_PARALLELISM with description_1; >>>>>> - TableOptions.MAX_PARALLELISM with description_1 + description_2. >>>>>> - DataStreamOptions.MAX_PARALLELISM with description_1 + >>>>>> description_3. >>>>>> And users will only configure exactly one MAX_PARALLELISM cross the >>>>> board. >>>>>> So they won't be confused by setting two MAX_PARALLELISM config for >>>>>> two >>>>>> different modules, while they are actually the same config. If that is >>>>> the >>>>>> case, I don't have further concern. >>>>>> >>>>>> Re: 3 >>>>>> Maybe I am thinking too much. I thought toBytes() / fromBytes() >>>>>> actually >>>>>> restore the original Object. But fromConfiguration() and >>>>> toConfiguration() >>>>>> does not do that, anything not in the configuration of the original >>>>> object >>>>>> will be lost. So it would be good to make that clear. Maybe a clear >>>>>> Java >>>>>> doc is sufficient. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jiangjie (Becket) Qin >>>>>> >>>>>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz >>>>>> <[hidden email] >>>>>> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Ad. 1 >>>>>>> >>>>>>> The advantage of our approach is that you have the type definition >>>>>>> close >>>>>>> to the option definition. The only difference is that it enables >>>>>>> expressing simple pojos with the primitives like >>>>>>> ConfigOption<Integer>, >>>>>>> ConfigOption<Long> etc. Otherwise as Timo said you will start having >>>>>>> >>>>>>> the parsing logic scattered everywhere in the code base as it is now. >>>>>>> The string representation in our proposal is exactly the same as you >>>>>>> explained for those three options. The only difference is that you >>>>>>> don't >>>>>>> have to parse the elements of a List, Map etc. afterwards. >>>>>>> >>>>>>> Ad. 2 >>>>>>> >>>>>>> I think about the withExtendedDescription as a helper getter in a >>>>>>> different place, so that the option is easier to find in a different >>>>>>> module from it was defined. >>>>>>> >>>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>>>> equal >>>>> to: >>>>>>> public ConfigOption getMaxParallelismOption() { >>>>>>> >>>>>>> return CoreOptions.MAX_PARALLELISM; >>>>>>> >>>>>>> } >>>>>>> >>>>>>> This allows to further clarify the description of the option in the >>>>>>> context of a different module and end up in a seperate page in >>>>>>> documentation (but with a link to the original one). In the end it is >>>>>>> exactly the same option. It has exactly same key, type, parsing >>>>>>> logic, >>>>>>> it is in the end forwarded to the same place. >>>>>>> >>>>>>> Ad. 3 >>>>>>> >>>>>>> Not sure if I understand your concerns here. As Timo said it is in >>>>>>> the >>>>>>> end sth similar to toBytes/fromBytes, but it puts itself to a >>>>>>> Configuration. Also just wanted to make sure we adjusted this part >>>>>>> slightly and now the ConfigOption takes ConfigurableFactory. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Dawid >>>>>>> >>>>>>> >>>>>>> On 30/08/2019 09:39, Timo Walther wrote: >>>>>>>> Hi Becket, >>>>>>>> >>>>>>>> thanks for the discussion. >>>>>>>> >>>>>>>> 1. ConfigOptions in their current design are bound to classes. >>>>>>>> Regarding, the option is "creating some Configurable objects instead >>>>>>>> of defining the config to create >>>>>>>> those Configurable"? We just moved this logic to a factory, this >>>>>>>> factory can then also be used for other purposes. However, how the >>>>>>>> option and objects are serialized to Configuration is still not part >>>>>>>> of the option. The option is just pure declaration. >>>>>>>> >>>>>>>> If we would allow only List<String>, implementers would need to >>>>>>>> start >>>>>>>> implementing own parsing and validation logic all the time. We would >>>>>>>> like to avoid that. >>>>>>>> >>>>>>>> Please also keep in mind that Configuration must not consist of only >>>>>>>> strings, it manages a Map<String, Object> for efficient access. >>>>>>>> Every >>>>>>>> map entry can have a string representation for persistence, but in >>>>>>>> most cases consists of unserialized objects. >>>>>>>> >>>>>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite >>>>>>>> keys, types or default values. But different layers might want to >>>>>>>> add >>>>>>>> helpful information. In our concrete use case for FLIP-59, >>>>>>>> ExecutionConfig has 50 properties and many of them are not relevant >>>>>>>> for the Table layer or have no effect at all. We would like to list >>>>>>>> and mention the most important config options again in the Table >>>>>>>> Configuration section, so that users are not confused, but with a >>>>>>>> strong link to the core option. E.g.: registered kryo serializers >>>>>>>> are >>>>>>>> also important also for Table users, we would like to add the >>>>>>>> comment >>>>>>>> "This option allows to modify the serialization of the ANY SQL data >>>>>>>> type.". I think we should not spam the core configuration page with >>>>>>>> comments from all layers, connectors, or libraries but keep this in >>>>>>>> the corresponding component documentation. >>>>>>>> >>>>>>>> 3. But it is something like fromBytes() and toBytes()? It serializes >>>>>>>> and deserializes T from a configuration? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Timo >>>>>>>> >>>>>>>> On 29.08.19 19:14, Becket Qin wrote: >>>>>>>>> Hi Timo and Stephan, >>>>>>>>> >>>>>>>>> Thanks for the detail explanation. >>>>>>>>> >>>>>>>>> 1. I agree that each config should be in a human readable >>>>>>>>> format. My >>>>>>>>> concern is that the current List<Configurable> looks going a little >>>>>>>>> too far >>>>>>>>> from what the configuration is supposed to do. They are essentially >>>>>>>>> creating some Configurable objects instead of defining the >>>>>>>>> config to >>>>>>>>> create >>>>>>>>> those Configurable. This mixes ConfigOption and the usage of it. >>>>>>>>> API >>>>>>>>> wise >>>>>>>>> it would be good to keep the configs and their usages (such as >>>>>>>>> how to >>>>>>>>> create objects using the ConfigOption) apart from each other. >>>>>>>>> I am wondering if we can just make List also only take string. For >>>>>>>>> example, >>>>>>>>> is the following definition of map and list sufficient? >>>>>>>>> >>>>>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be >>>>>>>>> defined >>>>>>>>> as: >>>>>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ... >>>>>>>>> >>>>>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined >>>>> as: >>>>>>>>> list_config_name: v1, v2, v3, ... >>>>>>>>> >>>>>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, >>>>>>>>> String>>. It >>>>>>>>> can >>>>>>>>> be defined as: >>>>>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... >>>>>>>>> >>>>>>>>> All the key and values in the configuration are String. This also >>>>>>>>> guarantees that the configuration is always serializable. >>>>>>>>> If we want to do one more step, we can allow the ConfigOption to >>>>>>>>> set >>>>> all >>>>>>>>> the primitive types and parse that for the users. So something like >>>>>>>>> List<Integer>, List<Class<?>> seems fine. >>>>>>>>> >>>>>>>>> The configuration class could also have util methods to create a >>>>>>>>> list >>>>> of >>>>>>>>> configurable such as: >>>>>>>>> <T> List<T> >>>>> <Configuration#getConfigurableInstances(ListMapConfigOption, >>>>>>>>> Class<T> clazz). >>>>>>>>> But the configuration class will not take arbitrary Configurable as >>>>> the >>>>>>>>> value of its config. >>>>>>>>> >>>>>>>>> 2. I might have misunderstood this. But my concern on the >>>>>>>>> description >>>>>>>>> extension is in the following example. >>>>>>>>> >>>>>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM = >>>>>>>>> >>>>>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( >>>>>>>>> "Note: That this property means that a table program has a >>>>>>>>> side-effect >>>>>>>>> XYZ."); >>>>>>>>> >>>>>>>>> In this case, we will have two MAX_PARALLELISM configs now. One is >>>>>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I >>>>>>>>> suppose >>>>>>>>> users >>>>>>>>> will see both configurations. One with an extended description >>>>>>>>> and one >>>>>>>>> without. Let's say there is a third component which also users >>>>>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM >>>>>>>>> ConfigOption? If >>>>>>>>> so, what would that ConfigOption's description look like? >>>>>>>>> >>>>>>>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM >>>>>>>>> and the >>>>>>>>> description should clearly state all the usage of this >>>>>>>>> ConfigOption. >>>>>>>>> >>>>>>>>> 3. I see, in that case, how about we name it something like >>>>>>>>> extractConfiguration()? I am just trying to see if we can make it >>>>> clear >>>>>>>>> this is not something like fromBytes() and toBytes(). >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>> >>>>>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]> >>>>>>> wrote: >>>>>>>>>> Hi Becket, >>>>>>>>>> >>>>>>>>>> let me try to clarify some of your questions: >>>>>>>>>> >>>>>>>>>> 1. For every option, we also needed to think about how to >>>>>>>>>> represent >>>>> it >>>>>>>>>> in a human readable format. We do not want to allow arbitrary >>>>>>>>>> nesting >>>>>>>>>> because that would easily allow to bypass the flattened >>>>>>>>>> hierarchy of >>>>>>>>>> config options (`session.memory.min`). The current design >>>>>>>>>> allows to >>>>>>>>>> represent every option type as a list. E.g.: >>>>>>>>>> >>>>>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12` >>>>>>>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption: >>>>>>>>>> field=12,other=true; field=12,other=true` >>>>>>>>>> `myPropertyOption: key=str0,other=str1` can be >>>>>>>>>> `myPropertyListOption: >>>>>>>>>> key=str0,other=str1;key=str0,other=str1` >>>>>>>>>> >>>>>>>>>> We need the atomic class for serialization/deserialization both in >>>>>>>>>> binary and string format. >>>>>>>>>> >>>>>>>>>> ConfigOption<List> is not present in the code base yet, but this >>>>>>>>>> FLIP is >>>>>>>>>> a preparation of making ExecutionConfig configurable. If you look >>>>> into >>>>>>>>>> this class or also in existing table connectors/formats, you >>>>>>>>>> will see >>>>>>>>>> that each proposed option type has its requirements. >>>>>>>>>> >>>>>>>>>> 2. Regarding extending the description of ConfigOptions, the >>>>>>>>>> semantic of >>>>>>>>>> one option should be a super set of the other option. E.g. in >>>>>>>>>> Table >>>>> API >>>>>>>>>> we might use general ExecutionConfig properties. But we would like >>>>>>>>>> to a) >>>>>>>>>> make external options more prominent in the Table API config >>>>>>>>>> docs to >>>>>>>>>> link people to properties they should pay attention b) notice >>>>>>>>>> about >>>>>>>>>> side >>>>>>>>>> effects. The core semantic of a property should not change. >>>>>>>>>> >>>>>>>>>> 3. The factory will not receive the entire configuration but works >>>>> in a >>>>>>>>>> separate key space. For `myObjectOption` above, it would receive a >>>>>>>>>> configuration that consists of `field: 12` and `other: true`. >>>>>>>>>> >>>>>>>>>> I agree. I will convert the document into a Wiki page today. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Timo >>>>>>>>>> >>>>>>>>>> On 29.08.19 09:00, Stephan Ewen wrote: >>>>>>>>>>> @Becket One thing that may be non-obvious is that the >>>>>>>>>>> Configuration >>>>>>>>>>> class >>>>>>>>>>> also defines serialization / persistence logic at the moment. >>>>>>>>>>> So it >>>>>>>>>>> needs >>>>>>>>>>> to know the set of types it supports. That stands in the way >>>>>>>>>>> of an >>>>>>>>>>> arbitrary generic map type. >>>>>>>>>>> >>>>>>>>>>> @Timo I agree though that it seems a bit inconsistent to have one >>>>>>>>>>> collection orthogonal to the type (List) and another one bound to >>>>>>>>>> specific >>>>>>>>>>> types (Map). >>>>>>>>>>> >>>>>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email] >>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Timo, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I >>>>>>>>>>>> have a >>>>>>>>>>>> few >>>>>>>>>>>> questions / comments. >>>>>>>>>>>> >>>>>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption? >>>>>>>>>>>> Would it be enough to just check the atomicClass to see if it >>>>>>>>>>>> is a >>>>>>>>>>>> List >>>>>>>>>> or >>>>>>>>>>>> not? >>>>>>>>>>>> Also, in the ConfigOption<Map> class case, are we always assume >>>>>>>>>>>> both key >>>>>>>>>>>> and value types are String? Can we just apply the same to the >>>>>>>>>>>> ConfigOption<List>? >>>>>>>>>>>> BTW, I did a quick search in the codebase but did not find any >>>>>>>>>>>> usage of >>>>>>>>>>>> ConfigOption<List>. >>>>>>>>>>>> >>>>>>>>>>>> 2. The same config name, but with two ConfigOption with >>>>>>>>>>>> different >>>>>>>>>> semantic >>>>>>>>>>>> in different component seems super confusing. For example, when >>>>> users >>>>>>>>>> set >>>>>>>>>>>> both configs, they may have no idea one is overriding the other. >>>>>>>>>>>> There >>>>>>>>>>>> might be two cases: >>>>>>>>>>>> - If it is just the same config used by different >>>>>>>>>>>> components to >>>>>>>>>>>> act >>>>>>>>>>>> accordingly, it might be better to just have one config, but >>>>> describe >>>>>>>>>>>> clearly on how that config will be used. >>>>>>>>>>>> - If it is actually two configurations that can be set >>>>>>>>>>>> differently, I >>>>>>>>>>>> think the config names should just be different. >>>>>>>>>>>> >>>>>>>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() >>>>> method >>>>>>>>>>>> pretty much means getConfiguration()? The toConfiguration() >>>>>>>>>>>> method >>>>>>>>>> sounds >>>>>>>>>>>> like converting an object to a configuration, which only >>>>>>>>>>>> works if >>>>> the >>>>>>>>>>>> object does not contain any state / value. I am also >>>>>>>>>>>> wondering if >>>>>>>>>>>> there >>>>>>>>>> is >>>>>>>>>>>> a real use case of this method. Because supposedly the >>>>> configurations >>>>>>>>>> could >>>>>>>>>>>> just be passed around to caller of this method. >>>>>>>>>>>> >>>>>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of >>>>>>>>>>>> in the >>>>>>>>>>>> doc before voting? The FLIP wiki allows track the modification >>>>>>>>>>>> history >>>>>>>>>> and >>>>>>>>>>>> has a more established structure to ensure nothing is missed. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther >>>>>>>>>>>> <[hidden email]> >>>>>>>>>> wrote: >>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>> >>>>>>>>>>>>> I updated the FLIP proposal one more time as mentioned in the >>>>> voting >>>>>>>>>>>>> thread. If there are no objections, I will start a new voting >>>>> thread >>>>>>>>>>>>> tomorrow at 9am Berlin time. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Timo >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote: >>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>> >>>>>>>>>>>>>> thanks for all the feedback we have received online and >>>>>>>>>>>>>> offline. >>>>> It >>>>>>>>>>>>>> showed that many people support the idea of evolving the Flink >>>>>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP >>>>>>>>>>>>>> will >>>>>>>>>>>>>> not >>>>>>>>>>>>>> solve all issues but at least will improve the current status. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We've updated the document and replaced the Correlation part >>>>>>>>>>>>>> with the >>>>>>>>>>>>>> concept of a ConfigOptionGroup that can provide all available >>>>>>>>>>>>>> options >>>>>>>>>>>>>> of a group plus custom group validators for eager validation. >>>>>>>>>>>>>> For now, >>>>>>>>>>>>>> this eager group validation will only be used at certain >>>>>>>>>>>>>> locations in >>>>>>>>>>>>>> the Flink code but it prepares for maybe validating the entire >>>>>>>>>>>>>> global >>>>>>>>>>>>>> configuration before submitting a job in the future. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please take another look if you find time. I hope we can >>>>>>>>>>>>>> proceed >>>>>>>>>>>>>> with >>>>>>>>>>>>>> the voting process if there are no objections. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Timo >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: >>>>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> thanks for your suggestions. Let me give you some background >>>>> about >>>>>>>>>>>>>>> the decisions made in this FLIP: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" >>>>>>>>>>>>>>> because we >>>>> did >>>>>>>>>>>>>>> not want to change the entire configuration infrastructure. >>>>>>>>>>>>>>> Both for >>>>>>>>>>>>>>> backwards-compatibility reasons and the amount of work that >>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>> required to update all options. If our goal is to rework the >>>>>>>>>>>>>>> configuration option entirely, I might suggest to switch >>>>>>>>>>>>>>> to JSON >>>>>>>>>>>>>>> format with JSON schema and JSON validator. However, setting >>>>>>>>>>>>>>> properties in a CLI or web interface becomes more tricky the >>>>> more >>>>>>>>>>>>>>> nested structures are allowed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is >>>>>>>>>>>>>>> centered >>>>>>>>>>>>>>> around Java classes where T is determined by the default >>>>>>>>>>>>>>> value. >>>>>>>>>>>>>>> The >>>>>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit >>>>>>>>>>>>>>> `intType()` method etc. The current design of validators >>>>> centered >>>>>>>>>>>>>>> around Java classes makes it possible to have typical domain >>>>>>>>>>>>>>> validators baked by generics as you suggested. If we >>>>>>>>>>>>>>> introduce >>>>>>>>>>>>>>> types >>>>>>>>>>>>>>> such as "quantity with measure and unit" we still need to >>>>>>>>>>>>>>> get a >>>>>>>>>>>>>>> class >>>>>>>>>>>>>>> out of this option at the end, so why changing a proven >>>>>>>>>>>>>>> concept? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary >>>>>>>>>>>>>>> nesting. As >>>>>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every >>>>>>>>>>>>>>> atomic >>>>>>>>>>>>>>> option like "key=12" can be represented by a list >>>>>>>>>>>>>>> "keys=12;13". >>>>>>>>>>>>>>> But >>>>>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated >>>>>>>>>>>>>>> list >>>>>>>>>>>>>>> option >>>>>>>>>>>>>>> would start making this more complicated such as >>>>>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), >>>>>>>>>>>>>>> StringOption(...)))", do we want that? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 4. Correlation: The correlation part is one of the >>>>>>>>>>>>>>> suggestions >>>>>>>>>>>>>>> that I >>>>>>>>>>>>>>> like least in the document. We can also discuss removing it >>>>>>>>>>>>>>> entirely, >>>>>>>>>>>>>>> but I think it solves the use case of relating options >>>>>>>>>>>>>>> with each >>>>>>>>>>>>>>> other in a flexible way right next to the actual option. >>>>>>>>>>>>>>> Instead of >>>>>>>>>>>>>>> being hidden in some component initialization, we should >>>>>>>>>>>>>>> put it >>>>>>>>>>>>>>> close >>>>>>>>>>>>>>> to the option to also perform validation eagerly instead of >>>>>>>>>>>>>>> failing >>>>>>>>>>>>>>> at runtime when the option is accessed the first time. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>>>>>>>>>>>>>>> A "List Type" sounds like a good direction to me. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. The >>>>>>>>>>>>>>>> idea is >>>>>>>>>>>>>>>> to see >>>>>>>>>>>>>>>> if something like that can ease validation. Especially the >>>>>>>>>>>> correlation >>>>>>>>>>>>>>>> system seems quite complex (proxies to work around order of >>>>>>>>>>>>>>>> initialization). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For example, let's assume we don't think primarily about >>>>>>>>>>>>>>>> "java >>>>>>>>>>>>>>>> types" but >>>>>>>>>>>>>>>> would define types as one of the following (just examples, >>>>>>>>>>>>>>>> haven't >>>>>>>>>>>>>>>> thought >>>>>>>>>>>>>>>> all the details through): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (a) category type: implies string, and a fix set of >>>>> possible >>>>>>>>>>>> values. >>>>>>>>>>>>>>>> Those would be passes and naturally make it into the docs >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> validation. >>>>>>>>>>>>>>>> Maps to a String or Enum in Java. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (b) numeric integer type: implies long (or optionally >>>>>>>>>>>>>>>> integer, >>>>>>>>>> if >>>>>>>>>>>>>>>> we want >>>>>>>>>>>>>>>> to automatically check overflow / underflow). would take >>>>> typical >>>>>>>>>>>> domain >>>>>>>>>>>>>>>> validators, like non-negative, etc. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (c) numeric real type: same as above (double or >>>>>>>>>>>>>>>> float) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (d) numeric interval type: either defined as an >>>>> interval, or >>>>>>>>>>>>>>>> references >>>>>>>>>>>>>>>> other parameter by key. validation by valid interval. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (e) quantity: a measure and a unit. separately >>>>>>>>>>>>>>>> parsable. >>>>> The >>>>>>>>>>>>>>>> measure's >>>>>>>>>>>>>>>> type could be any of the numeric types above, with same >>>>>>>>>>>>>>>> validation >>>>>>>>>>>>>>>> rules. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> With a system like the above, would we still correlation >>>>>>>>>>>>>>>> validators? >>>>>>>>>>>>>>>> Are >>>>>>>>>>>>>>>> there still cases that we need to catch early (config >>>>>>>>>>>>>>>> loading) >>>>> or >>>>>>>>>>>>>>>> are the >>>>>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup >>>>> specific, >>>>>>>>>>>>>>>> that it is >>>>>>>>>>>>>>>> fine to handle them in component initialization? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>>>>>>>>>>>>>>> <[hidden email]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you for your opinion. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Actually list/composite types are the topics we spent the >>>>>>>>>>>>>>>>> most of >>>>>>>>>>>> the >>>>>>>>>>>>>>>>> time. I understand that from a perspective of a full blown >>>>> type >>>>>>>>>>>>>>>>> system, >>>>>>>>>>>>>>>>> a field like isList may look weird. Please let me >>>>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>>>> bit >>>>>>>>>>>> more >>>>>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear >>>>>>>>>>>>>>>>> enough >>>>>>>>>>>>>>>>> about >>>>>>>>>>>> it >>>>>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is >>>>>>>>>>>>>>>>> that they >>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>> have a string representation as they might come from a >>>>>>>>>> configuration >>>>>>>>>>>>>>>>> file. Moreover it must be a human readable format, so >>>>>>>>>>>>>>>>> that the >>>>>>>>>>>> values >>>>>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did not >>>>>>>>>>>>>>>>> want to >>>>>>>>>>>>>>>>> add a >>>>>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for >>>>>>>>>>>>>>>>> lists >>>>>>>>>>>> only >>>>>>>>>>>>>>>>> (and flat objects - I think though in the current design >>>>>>>>>>>>>>>>> there is a >>>>>>>>>>>>>>>>> mistake around the Configurable interface). I think though >>>>>>>>>>>>>>>>> you have >>>>>>>>>>>> a >>>>>>>>>>>>>>>>> point here and it would be better to have a >>>>>>>>>>>>>>>>> ListConfigOption >>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>> this field. Does it make sense to you? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As for the second part of your message. I am not sure if I >>>>>>>>>>>> understood >>>>>>>>>>>>>>>>> it. The validators work with parse/deserialized values from >>>>>>>>>>>>>>>>> Configuration that means they can be bound to the generic >>>>>>>>>>>>>>>>> parameter >>>>>>>>>>>> of >>>>>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends >>>>>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the >>>>>>>>>>>>>>>>> ConfigOption >>>>>>>>>>>>>>>>> has anything to do with the validation logic. Could you >>>>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>>>> more what did you mean? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dawid >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to do >>>>> early >>>>>>>>>>>>>>>>> validation. >>>>>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad >>>>>>>>>>>>>>>>>> hoc, >>>>>>>>>>>>>>>>>> though. For >>>>>>>>>>>>>>>>>> example, having a boolean "isList" is a clear >>>>>>>>>>>>>>>>>> indication of >>>>> not >>>>>>>>>>>>>>>>>> having >>>>>>>>>>>>>>>>>> thought through the type/category system. >>>>>>>>>>>>>>>>>> Also, having a more clear category system makes validation >>>>>>>>>> simpler. >>>>>>>>>>>>>>>>>> For example, I have seen systems distinguishing between >>>>> numeric >>>>>>>>>>>>>>>>> parameters >>>>>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible >>>>>>>>>>>>>>>>>> values), >>>>>>>>>>>>>>>>>> quantities >>>>>>>>>>>>>>>>>> like duration and memory size (need measure and unit), >>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>> results in >>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>> elegant system for validation. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < >>>>>>>>>>>> [hidden email] >>>>>>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>>>>>>>>>>>>>>> This allows the currently cluttered configuration of >>>>>>>>>>>>>>>>>>> various >>>>>>>>>>>>>>>>>>> modules to be unified. >>>>>>>>>>>>>>>>>>> This is also first step of one of the keys to making new >>>>>>>>>>>>>>>>>>> unified >>>>>>>>>>>>>>>>>>> TableEnvironment available for production. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Previously, we did encounter complex configurations, >>>>>>>>>>>>>>>>>>> such as >>>>>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The >>>>>>>>>>>>>>>>>>> skew may >>>>>>>>>>>>>>>>>>> be a single field or a combination of multiple >>>>>>>>>>>>>>>>>>> fields. >>>>>>>>>>>>>>>>>>> So the >>>>>>>>>>>>>>>>>>> configuration is very troublesome. We used JSON >>>>>>>>>>>>>>>>>>> string >>>>> to >>>>>>>>>>>>>>>>>>> configure it. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>> Jingsong Lee >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>>>>>> From:Jark Wu <[hidden email]> >>>>>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>>>>>>>>>>>>>>> To:dev <[hidden email]> >>>>>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and >>>>>>>>>>>> Configuration >>>>>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a >>>>> long >>>>>>>>>>>> time. >>>>>>>>>>>>>>>>>>> We have seen the benefit when developing blink >>>>>>>>>>>>>>>>>>> configurations and >>>>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>>>> properties in 1.9 release. >>>>>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed >>>>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>>>> I will leave my thoughts and comments there. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen < >>>>> [hidden email] >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which >>>>>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of >>>>>>>>>>>>>>>>>>>> Flink. >>>>>>>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment >>>>>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by >>>>>>>>>>>> configuration. >>>>>>>>>>>>>>>>>>>> Will take a look at the document in days. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>> tison. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 >>>>>>>>>>>>>>>>>>>> 下午10:12写道: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of >>>>>>>>>>>>>>>>>>>>> ExecutionConfig and >>>>>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is >>>>>>>>>>>>>>>>>>>>> necessary >>>>>>>>>>>>>>>>>>>>> to make >>>>>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, >>>>>>>>>>>>>>>>>>>>> with >>>>> the >>>>>>>>>>>>>>>>>>>>> new SQL >>>>>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>>>>> based on properties as well as more connectors and >>>>>>>>>>>>>>>>>>>>> formats >>>>>>>>>>>>>>>>>>>>> coming up, >>>>>>>>>>>>>>>>>>>>> unified configuration becomes more important. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> We need more features around string-based configuration >>>>>>>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>>>>> future, >>>>>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose >>>>>>>>>>>>>>>>>>>>> FLIP-54 for >>>>>>>>>>>>>>>>>>>>> evolving >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> >> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>>>> >>>>>>>>>>>>>>>>>>>>> In summary it adds: >>>>>>>>>>>>>>>>>>>>> - documented types and validation >>>>>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, list >>>>>>>>>>>>>>>>>>>>> - simple non-nested object types >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> >>> >> >> |
@Becket:
Regarding "great if we don't put this burden on users", we should consider who is actually using this API. It is not first-level API but mostly API for Flink contributors. Most of the users will use API classes ike ExecutionConfig or TableConfig or other builders for performing configuration. They will never use the ConfigOptions classes directly. So enforcing a duplicate method does not sound like a burden to me. How would the util that you are suggesting look like? It would always need to serialize/deserialize an object into an immutable string. This is not very efficient, given that the instance already exists and can be made immutable by the implementer by not exposing setters. Furthermore, we would loose the declarative approach and could not generate documentation. The current approach specifies the static final sub-ConfigOptions either in Configurable object (initial design) or in the ConfigurableFactory (current design) such that the docs generator can read them. Regarding "Configurable may be created and configured directly without reading settings from a Configuration instance", there seems to be a misunderstanding. This is a very common case if not the most common. As mentioned before, take ExecutionConfig. This configuration is currently only used in a programmatic-way and needs a way to be expressed as ConfigOptions. CachedFile for instance will be a Configurable object that will binary serialized most of the time when sending it to the cluster but due to the Configurable design it is possible to store it in a string representation as well. @Aljoscha: Yes, this approach would also work. We still would need to call writeToConf/readFromConf for duplicate() and ensure immutable semantics, if this is really an important use case. But IMHO all configuration is currently mutable (all API classes like ExecutionConfig, CheckpointConfig, Configuration itself), I don't understand why immutability needs to be discussed here. Regards, Timo On 02.09.19 16:22, Aljoscha Krettek wrote: > Hi, > > Regarding the factory and duplicate() and whatnot, wouldn’t it work to have a factory like this: > > /** > * Allows to read and write an instance from and to {@link Configuration}. A configurable instance > * operates in its own key space in {@link Configuration} and will be (de)prefixed by the framework. It cannot access keys from other options. A factory must have a default constructor. > * > */ > public interface ConfigurableFactory<T extends Configurable> { > > /** > * Creates an instance from the given configuration. > */ > T fromConfiguration(ConfigurationReader configuration); > } > > with Configurable being: > > /** > * Allows to read and write an instance from and to {@link Configuration}. A configurable instance > * operates in its own key space in {@link Configuration} and will be (de)prefixed by the framework. It cannot access keys from other options. A factory must have a default constructor. > * > */ > public interface Configurable { > > /** > * Writes this instance to the given configuration. > */ > void writeToConfiguration(ConfigurationWriter configuration); // method name TBD > } > > This would make the Configurable immutable and we wouldn’t need a duplicate() method. > > Best, > Aljoscha > >> On 2. Sep 2019, at 14:40, Becket Qin <[hidden email]> wrote: >> >> Hi Timo and Dawid, >> >> Thanks for the patient reply. I agree that both option a) and option b) can >> solve the mutability problem. >> >> For option a), is it a little intrusive to add a duplicate() method for a >> Configurable? It would be great if we don't put this burden on users if >> possible. >> >> For option b), I actually feel it is slightly better than option a) from >> API perspective as getFactory() seems a more understandable method of a >> Configurable compared with duplicate(). And users do not need to implement >> much more logic. >> >> I am curious what is the downside of keeping the Configuration simple to >> only have primitive types, and always create the Configurable using a util >> method? If Configurables are rare, do we need to put the instantiation / >> bookkeeping logic in Configuration? >> >> @Becket for the toConfiguration this is required for shipping the >>> Configuration to TaskManager, so that we do not have to use java >>> serializability. >> Do you mean a Configurable may be created and configured directly without >> reading settings from a Configuration instance? I thought a Configurable >> will always be created via a ConfigurableFactory by extracting required >> configs from a Configuration. Am I missing something? >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Mon, Sep 2, 2019 at 4:47 PM Dawid Wysakowicz <[hidden email]> >> wrote: >> >>> Hi Timo, Becket >>> >>> From the options that Timo suggested for improving the mutability >>> situation I would prefer option a) as this is the more explicit option >>> and simpler option. Just as a remark, I think in general Configurable >>> types for options will be rather very rare for some special use-cases, >>> as the majority of options are rather simpler parameters of primitive >>> types (+duration, memory) >>> >>> @Becket for the toConfiguration this is required for shipping the >>> Configuration to TaskManager, so that we do not have to use java >>> serializability. >>> >>> Best, >>> >>> Dawid >>> >>> >>> On 02/09/2019 10:05, Timo Walther wrote: >>>> Hi Becket, >>>> >>>> Re 1 & 3: "values in configurations should actually be immutable" >>>> >>>> I would also prefer immutability but most of our configuration is >>>> mutable due to serialization/deserialization. Also maps and list could >>>> be mutable in theory. It is difficult to really enforce that for >>>> nested structures. I see two options: >>>> >>>> a) For the original design: How about we force implementers to add a >>>> duplicate() method in a Configurable object? This would mean that the >>>> object is still mutable but by duplicating the object both during >>>> reading and writing we would avoid the problem you described. >>>> >>>> b) For the current design: We still use the factory approach but let a >>>> Configurable object implement a getFactory() method such that we know >>>> how to serialize the object. With the help of a factory we can also >>>> duplicate the object easily during reading and writing and ensure >>>> immutability. >>>> >>>> I would personally go for approach a) to not over-engineer this topic. >>>> But I'm open for option b). >>>> >>>> Regards, >>>> Timo >>>> >>>> >>>> On 31.08.19 04:09, Becket Qin wrote: >>>>> Hi Timo, >>>>> >>>>> Thanks for the reply. I am still a little concerned over the >>>>> mutability of >>>>> the Configurable which could be the value in Configuration. >>>>> >>>>> Re: 1 >>>>> >>>>>> But in general, people should not use any internal fields. >>>>>> Configurable objects are meant for simple little helper POJOs, not >>>>>> complex arbitrary nested data structures. >>>>> This seems difficult to enforce... Ideally the values in configurations >>>>> should actually be immutable. The value can only be changed by >>>>> explicitly >>>>> calling setters in Configuration. Otherwise we may have weird situation >>>>> where the Configurable in the same configuration are different in two >>>>> places because the configurable is modified in one places and not >>>>> modified >>>>> in another place. So I am a little concerned on putting a >>>>> Configurable type >>>>> in the Configuration map, because the value could be modified without >>>>> explicitly setting the configuration. For example, can users do the >>>>> following? >>>>> >>>>> Configurable configurable = >>>>> configuration.getConfigurable(myConfigurableOption); >>>>> configurable.setConfigA(123); // this already changes the configurable >>>>> object in the configuration. >>>>> >>>>> Re: 2 >>>>> Thanks for confirming. As long as users will not have a situation where >>>>> they need to set two configurations with the same key but different >>>>> descriptions, I think it is OK. >>>>> >>>>> Re: 3 >>>>> This is actually kind of related to 1. i.e. Whether toConfiguration() >>>>> guarantees the exact same object can be rebuilt from the >>>>> configuration or >>>>> not. I am still not quite sure about the use case of toConfiguration() >>>>> though. It seems indicating the Configurable is mutable, which might be >>>>> dangerous. >>>>> >>>>> Thanks, >>>>> >>>>> Jiangjie (Becket) Qin >>>>> >>>>> On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <[hidden email]> >>>>> wrote: >>>>> >>>>>> Hi Becket, >>>>>> >>>>>> 1. First of all, you are totally right. The FLIP contains a bug due to >>>>>> the last minute changes that Dawid suggested: by having immutable >>>>>> objects created by a factory we loose the serializability of the >>>>>> Configuration because the factory itself is not stored in the >>>>>> Configuration. I would propose to revert the last change and stick to >>>>>> the original design, which means that a object must implement the >>>>>> Configurable interface and also implements >>>>>> serialization/deserialization >>>>>> methods such that also internal fields can be persisted as you >>>>>> suggested. But in general, people should not use any internal fields. >>>>>> Configurable objects are meant for simple little helper POJOs, not >>>>>> complex arbitrary nested data structures. >>>>>> >>>>>> It is Map<String, Object> because Configuration stores the raw objects. >>>>>> If you put a Boolean option into it, it remains Boolean. This makes the >>>>>> map very efficient for shipping to the cluster and accessing options >>>>>> multiple times. The same for configurable objects. We put the pure >>>>>> objects into the map without any serialization/deserialization. The >>>>>> provided factory allows to convert the Object into a Configuration and >>>>>> we know how to serialize/deserializise a configuration because it is >>>>>> just a key/value map. >>>>>> >>>>>> 2. Yes, this is what we had in mind. It should still be the same >>>>>> configuration option. We would like to avoid specialized option keys >>>>>> across components (exec.max-para and table.exec.max-para) if they are >>>>>> describing basically the same thing. But adding some more description >>>>>> like "TableOptions.MAX_PARALLELISM with description_1 + description_2" >>>>>> does not hurt. >>>>>> >>>>>> 3. They should restore the original object given that the >>>>>> toConfiguration/fromConfiguration methods have been implemented >>>>>> correctly. I will extend the example to make the logic clearer while >>>>>> fixing the bug. >>>>>> >>>>>> Thanks for the healthy discussion, >>>>>> Timo >>>>>> >>>>>> >>>>>> On 30.08.19 15:29, Becket Qin wrote: >>>>>>> Hi Timo, >>>>>>> >>>>>>> Thanks again for the clarification. Please see a few more questions >>>>>> below. >>>>>>> Re: 1 >>>>>>> >>>>>>>> Please also keep in mind that Configuration must not consist of only >>>>>>>> strings, it manages a Map<String, Object> for efficient access. Every >>>>>>>> map entry can have a string representation for persistence, but in >>>>>>>> most >>>>>>>> cases consists of unserialized objects. >>>>>>> I'd like to understand this a bit more. The reason we have a >>>>>>> Map<String, >>>>>>> Object> in Configuration was because that Object could be either a >>>>>> String, >>>>>>> a List or a Map, right? But they eventually always boil down to >>>>>>> Strings, >>>>>> or >>>>>>> maybe one of the predefined type that we know how to serialize. In the >>>>>>> current design, can the Object also be Configurable? >>>>>>> If the value in the config Map<String, Object> can be Configurable >>>>>> objects, >>>>>>> how do we serialize them? Calling toConfiguration() seems not quite >>>>>> working >>>>>>> because there might be some other internal fields that are not part of >>>>>> the >>>>>>> configuration. The modification to those fields will be lost if we >>>>>>> simply >>>>>>> use toConfiguration(). So the impact of modifying those Configurable >>>>>>> objects seems a little undefined. And it would be difficult to prevent >>>>>>> users from doing that. >>>>>>> If the value in the config Map<String, Object> cannot be Configurable >>>>>>> objects, then it seems a little weird to have all the other ConfigType >>>>>>> stored in the ConfigMap in their own native type and accessed via >>>>>>> getInteger() / getBoolean(), etc, while having ConfigurableType to be >>>>>>> different from others because one have to use ConfigurableFactory >>>>>>> to get >>>>>>> the configurations. >>>>>>> >>>>>>> Re: 2 >>>>>>> >>>>>>>> I think about the withExtendedDescription as a helper getter in a >>>>>>>> different place, so that the option is easier to find in a different >>>>>>>> module from it was defined. >>>>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>>>>> equal >>>>>> to: >>>>>>>> public ConfigOption getMaxParallelismOption() { >>>>>>>> return CoreOptions.MAX_PARALLELISM; >>>>>>>> } >>>>>>> Just to make sure I understand it correctly, does that mean users will >>>>>> see >>>>>>> something like following? >>>>>>> - CoreOptions.MAX_PARALLELISM with description_1; >>>>>>> - TableOptions.MAX_PARALLELISM with description_1 + description_2. >>>>>>> - DataStreamOptions.MAX_PARALLELISM with description_1 + >>>>>>> description_3. >>>>>>> And users will only configure exactly one MAX_PARALLELISM cross the >>>>>> board. >>>>>>> So they won't be confused by setting two MAX_PARALLELISM config for >>>>>>> two >>>>>>> different modules, while they are actually the same config. If that is >>>>>> the >>>>>>> case, I don't have further concern. >>>>>>> >>>>>>> Re: 3 >>>>>>> Maybe I am thinking too much. I thought toBytes() / fromBytes() >>>>>>> actually >>>>>>> restore the original Object. But fromConfiguration() and >>>>>> toConfiguration() >>>>>>> does not do that, anything not in the configuration of the original >>>>>> object >>>>>>> will be lost. So it would be good to make that clear. Maybe a clear >>>>>>> Java >>>>>>> doc is sufficient. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Jiangjie (Becket) Qin >>>>>>> >>>>>>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz >>>>>>> <[hidden email] >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Ad. 1 >>>>>>>> >>>>>>>> The advantage of our approach is that you have the type definition >>>>>>>> close >>>>>>>> to the option definition. The only difference is that it enables >>>>>>>> expressing simple pojos with the primitives like >>>>>>>> ConfigOption<Integer>, >>>>>>>> ConfigOption<Long> etc. Otherwise as Timo said you will start having >>>>>>>> >>>>>>>> the parsing logic scattered everywhere in the code base as it is now. >>>>>>>> The string representation in our proposal is exactly the same as you >>>>>>>> explained for those three options. The only difference is that you >>>>>>>> don't >>>>>>>> have to parse the elements of a List, Map etc. afterwards. >>>>>>>> >>>>>>>> Ad. 2 >>>>>>>> >>>>>>>> I think about the withExtendedDescription as a helper getter in a >>>>>>>> different place, so that the option is easier to find in a different >>>>>>>> module from it was defined. >>>>>>>> >>>>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>>>>> equal >>>>>> to: >>>>>>>> public ConfigOption getMaxParallelismOption() { >>>>>>>> >>>>>>>> return CoreOptions.MAX_PARALLELISM; >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> This allows to further clarify the description of the option in the >>>>>>>> context of a different module and end up in a seperate page in >>>>>>>> documentation (but with a link to the original one). In the end it is >>>>>>>> exactly the same option. It has exactly same key, type, parsing >>>>>>>> logic, >>>>>>>> it is in the end forwarded to the same place. >>>>>>>> >>>>>>>> Ad. 3 >>>>>>>> >>>>>>>> Not sure if I understand your concerns here. As Timo said it is in >>>>>>>> the >>>>>>>> end sth similar to toBytes/fromBytes, but it puts itself to a >>>>>>>> Configuration. Also just wanted to make sure we adjusted this part >>>>>>>> slightly and now the ConfigOption takes ConfigurableFactory. >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Dawid >>>>>>>> >>>>>>>> >>>>>>>> On 30/08/2019 09:39, Timo Walther wrote: >>>>>>>>> Hi Becket, >>>>>>>>> >>>>>>>>> thanks for the discussion. >>>>>>>>> >>>>>>>>> 1. ConfigOptions in their current design are bound to classes. >>>>>>>>> Regarding, the option is "creating some Configurable objects instead >>>>>>>>> of defining the config to create >>>>>>>>> those Configurable"? We just moved this logic to a factory, this >>>>>>>>> factory can then also be used for other purposes. However, how the >>>>>>>>> option and objects are serialized to Configuration is still not part >>>>>>>>> of the option. The option is just pure declaration. >>>>>>>>> >>>>>>>>> If we would allow only List<String>, implementers would need to >>>>>>>>> start >>>>>>>>> implementing own parsing and validation logic all the time. We would >>>>>>>>> like to avoid that. >>>>>>>>> >>>>>>>>> Please also keep in mind that Configuration must not consist of only >>>>>>>>> strings, it manages a Map<String, Object> for efficient access. >>>>>>>>> Every >>>>>>>>> map entry can have a string representation for persistence, but in >>>>>>>>> most cases consists of unserialized objects. >>>>>>>>> >>>>>>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite >>>>>>>>> keys, types or default values. But different layers might want to >>>>>>>>> add >>>>>>>>> helpful information. In our concrete use case for FLIP-59, >>>>>>>>> ExecutionConfig has 50 properties and many of them are not relevant >>>>>>>>> for the Table layer or have no effect at all. We would like to list >>>>>>>>> and mention the most important config options again in the Table >>>>>>>>> Configuration section, so that users are not confused, but with a >>>>>>>>> strong link to the core option. E.g.: registered kryo serializers >>>>>>>>> are >>>>>>>>> also important also for Table users, we would like to add the >>>>>>>>> comment >>>>>>>>> "This option allows to modify the serialization of the ANY SQL data >>>>>>>>> type.". I think we should not spam the core configuration page with >>>>>>>>> comments from all layers, connectors, or libraries but keep this in >>>>>>>>> the corresponding component documentation. >>>>>>>>> >>>>>>>>> 3. But it is something like fromBytes() and toBytes()? It serializes >>>>>>>>> and deserializes T from a configuration? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Timo >>>>>>>>> >>>>>>>>> On 29.08.19 19:14, Becket Qin wrote: >>>>>>>>>> Hi Timo and Stephan, >>>>>>>>>> >>>>>>>>>> Thanks for the detail explanation. >>>>>>>>>> >>>>>>>>>> 1. I agree that each config should be in a human readable >>>>>>>>>> format. My >>>>>>>>>> concern is that the current List<Configurable> looks going a little >>>>>>>>>> too far >>>>>>>>>> from what the configuration is supposed to do. They are essentially >>>>>>>>>> creating some Configurable objects instead of defining the >>>>>>>>>> config to >>>>>>>>>> create >>>>>>>>>> those Configurable. This mixes ConfigOption and the usage of it. >>>>>>>>>> API >>>>>>>>>> wise >>>>>>>>>> it would be good to keep the configs and their usages (such as >>>>>>>>>> how to >>>>>>>>>> create objects using the ConfigOption) apart from each other. >>>>>>>>>> I am wondering if we can just make List also only take string. For >>>>>>>>>> example, >>>>>>>>>> is the following definition of map and list sufficient? >>>>>>>>>> >>>>>>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be >>>>>>>>>> defined >>>>>>>>>> as: >>>>>>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ... >>>>>>>>>> >>>>>>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined >>>>>> as: >>>>>>>>>> list_config_name: v1, v2, v3, ... >>>>>>>>>> >>>>>>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, >>>>>>>>>> String>>. It >>>>>>>>>> can >>>>>>>>>> be defined as: >>>>>>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... >>>>>>>>>> >>>>>>>>>> All the key and values in the configuration are String. This also >>>>>>>>>> guarantees that the configuration is always serializable. >>>>>>>>>> If we want to do one more step, we can allow the ConfigOption to >>>>>>>>>> set >>>>>> all >>>>>>>>>> the primitive types and parse that for the users. So something like >>>>>>>>>> List<Integer>, List<Class<?>> seems fine. >>>>>>>>>> >>>>>>>>>> The configuration class could also have util methods to create a >>>>>>>>>> list >>>>>> of >>>>>>>>>> configurable such as: >>>>>>>>>> <T> List<T> >>>>>> <Configuration#getConfigurableInstances(ListMapConfigOption, >>>>>>>>>> Class<T> clazz). >>>>>>>>>> But the configuration class will not take arbitrary Configurable as >>>>>> the >>>>>>>>>> value of its config. >>>>>>>>>> >>>>>>>>>> 2. I might have misunderstood this. But my concern on the >>>>>>>>>> description >>>>>>>>>> extension is in the following example. >>>>>>>>>> >>>>>>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM = >>>>>>>>>> >>>>>>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( >>>>>>>>>> "Note: That this property means that a table program has a >>>>>>>>>> side-effect >>>>>>>>>> XYZ."); >>>>>>>>>> >>>>>>>>>> In this case, we will have two MAX_PARALLELISM configs now. One is >>>>>>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I >>>>>>>>>> suppose >>>>>>>>>> users >>>>>>>>>> will see both configurations. One with an extended description >>>>>>>>>> and one >>>>>>>>>> without. Let's say there is a third component which also users >>>>>>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM >>>>>>>>>> ConfigOption? If >>>>>>>>>> so, what would that ConfigOption's description look like? >>>>>>>>>> >>>>>>>>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM >>>>>>>>>> and the >>>>>>>>>> description should clearly state all the usage of this >>>>>>>>>> ConfigOption. >>>>>>>>>> >>>>>>>>>> 3. I see, in that case, how about we name it something like >>>>>>>>>> extractConfiguration()? I am just trying to see if we can make it >>>>>> clear >>>>>>>>>> this is not something like fromBytes() and toBytes(). >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>> >>>>>>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]> >>>>>>>> wrote: >>>>>>>>>>> Hi Becket, >>>>>>>>>>> >>>>>>>>>>> let me try to clarify some of your questions: >>>>>>>>>>> >>>>>>>>>>> 1. For every option, we also needed to think about how to >>>>>>>>>>> represent >>>>>> it >>>>>>>>>>> in a human readable format. We do not want to allow arbitrary >>>>>>>>>>> nesting >>>>>>>>>>> because that would easily allow to bypass the flattened >>>>>>>>>>> hierarchy of >>>>>>>>>>> config options (`session.memory.min`). The current design >>>>>>>>>>> allows to >>>>>>>>>>> represent every option type as a list. E.g.: >>>>>>>>>>> >>>>>>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12` >>>>>>>>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption: >>>>>>>>>>> field=12,other=true; field=12,other=true` >>>>>>>>>>> `myPropertyOption: key=str0,other=str1` can be >>>>>>>>>>> `myPropertyListOption: >>>>>>>>>>> key=str0,other=str1;key=str0,other=str1` >>>>>>>>>>> >>>>>>>>>>> We need the atomic class for serialization/deserialization both in >>>>>>>>>>> binary and string format. >>>>>>>>>>> >>>>>>>>>>> ConfigOption<List> is not present in the code base yet, but this >>>>>>>>>>> FLIP is >>>>>>>>>>> a preparation of making ExecutionConfig configurable. If you look >>>>>> into >>>>>>>>>>> this class or also in existing table connectors/formats, you >>>>>>>>>>> will see >>>>>>>>>>> that each proposed option type has its requirements. >>>>>>>>>>> >>>>>>>>>>> 2. Regarding extending the description of ConfigOptions, the >>>>>>>>>>> semantic of >>>>>>>>>>> one option should be a super set of the other option. E.g. in >>>>>>>>>>> Table >>>>>> API >>>>>>>>>>> we might use general ExecutionConfig properties. But we would like >>>>>>>>>>> to a) >>>>>>>>>>> make external options more prominent in the Table API config >>>>>>>>>>> docs to >>>>>>>>>>> link people to properties they should pay attention b) notice >>>>>>>>>>> about >>>>>>>>>>> side >>>>>>>>>>> effects. The core semantic of a property should not change. >>>>>>>>>>> >>>>>>>>>>> 3. The factory will not receive the entire configuration but works >>>>>> in a >>>>>>>>>>> separate key space. For `myObjectOption` above, it would receive a >>>>>>>>>>> configuration that consists of `field: 12` and `other: true`. >>>>>>>>>>> >>>>>>>>>>> I agree. I will convert the document into a Wiki page today. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> On 29.08.19 09:00, Stephan Ewen wrote: >>>>>>>>>>>> @Becket One thing that may be non-obvious is that the >>>>>>>>>>>> Configuration >>>>>>>>>>>> class >>>>>>>>>>>> also defines serialization / persistence logic at the moment. >>>>>>>>>>>> So it >>>>>>>>>>>> needs >>>>>>>>>>>> to know the set of types it supports. That stands in the way >>>>>>>>>>>> of an >>>>>>>>>>>> arbitrary generic map type. >>>>>>>>>>>> >>>>>>>>>>>> @Timo I agree though that it seems a bit inconsistent to have one >>>>>>>>>>>> collection orthogonal to the type (List) and another one bound to >>>>>>>>>>> specific >>>>>>>>>>>> types (Map). >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email] >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I >>>>>>>>>>>>> have a >>>>>>>>>>>>> few >>>>>>>>>>>>> questions / comments. >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption? >>>>>>>>>>>>> Would it be enough to just check the atomicClass to see if it >>>>>>>>>>>>> is a >>>>>>>>>>>>> List >>>>>>>>>>> or >>>>>>>>>>>>> not? >>>>>>>>>>>>> Also, in the ConfigOption<Map> class case, are we always assume >>>>>>>>>>>>> both key >>>>>>>>>>>>> and value types are String? Can we just apply the same to the >>>>>>>>>>>>> ConfigOption<List>? >>>>>>>>>>>>> BTW, I did a quick search in the codebase but did not find any >>>>>>>>>>>>> usage of >>>>>>>>>>>>> ConfigOption<List>. >>>>>>>>>>>>> >>>>>>>>>>>>> 2. The same config name, but with two ConfigOption with >>>>>>>>>>>>> different >>>>>>>>>>> semantic >>>>>>>>>>>>> in different component seems super confusing. For example, when >>>>>> users >>>>>>>>>>> set >>>>>>>>>>>>> both configs, they may have no idea one is overriding the other. >>>>>>>>>>>>> There >>>>>>>>>>>>> might be two cases: >>>>>>>>>>>>> - If it is just the same config used by different >>>>>>>>>>>>> components to >>>>>>>>>>>>> act >>>>>>>>>>>>> accordingly, it might be better to just have one config, but >>>>>> describe >>>>>>>>>>>>> clearly on how that config will be used. >>>>>>>>>>>>> - If it is actually two configurations that can be set >>>>>>>>>>>>> differently, I >>>>>>>>>>>>> think the config names should just be different. >>>>>>>>>>>>> >>>>>>>>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() >>>>>> method >>>>>>>>>>>>> pretty much means getConfiguration()? The toConfiguration() >>>>>>>>>>>>> method >>>>>>>>>>> sounds >>>>>>>>>>>>> like converting an object to a configuration, which only >>>>>>>>>>>>> works if >>>>>> the >>>>>>>>>>>>> object does not contain any state / value. I am also >>>>>>>>>>>>> wondering if >>>>>>>>>>>>> there >>>>>>>>>>> is >>>>>>>>>>>>> a real use case of this method. Because supposedly the >>>>>> configurations >>>>>>>>>>> could >>>>>>>>>>>>> just be passed around to caller of this method. >>>>>>>>>>>>> >>>>>>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of >>>>>>>>>>>>> in the >>>>>>>>>>>>> doc before voting? The FLIP wiki allows track the modification >>>>>>>>>>>>> history >>>>>>>>>>> and >>>>>>>>>>>>> has a more established structure to ensure nothing is missed. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther >>>>>>>>>>>>> <[hidden email]> >>>>>>>>>>> wrote: >>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I updated the FLIP proposal one more time as mentioned in the >>>>>> voting >>>>>>>>>>>>>> thread. If there are no objections, I will start a new voting >>>>>> thread >>>>>>>>>>>>>> tomorrow at 9am Berlin time. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Timo >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote: >>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> thanks for all the feedback we have received online and >>>>>>>>>>>>>>> offline. >>>>>> It >>>>>>>>>>>>>>> showed that many people support the idea of evolving the Flink >>>>>>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP >>>>>>>>>>>>>>> will >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>> solve all issues but at least will improve the current status. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We've updated the document and replaced the Correlation part >>>>>>>>>>>>>>> with the >>>>>>>>>>>>>>> concept of a ConfigOptionGroup that can provide all available >>>>>>>>>>>>>>> options >>>>>>>>>>>>>>> of a group plus custom group validators for eager validation. >>>>>>>>>>>>>>> For now, >>>>>>>>>>>>>>> this eager group validation will only be used at certain >>>>>>>>>>>>>>> locations in >>>>>>>>>>>>>>> the Flink code but it prepares for maybe validating the entire >>>>>>>>>>>>>>> global >>>>>>>>>>>>>>> configuration before submitting a job in the future. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please take another look if you find time. I hope we can >>>>>>>>>>>>>>> proceed >>>>>>>>>>>>>>> with >>>>>>>>>>>>>>> the voting process if there are no objections. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: >>>>>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> thanks for your suggestions. Let me give you some background >>>>>> about >>>>>>>>>>>>>>>> the decisions made in this FLIP: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" >>>>>>>>>>>>>>>> because we >>>>>> did >>>>>>>>>>>>>>>> not want to change the entire configuration infrastructure. >>>>>>>>>>>>>>>> Both for >>>>>>>>>>>>>>>> backwards-compatibility reasons and the amount of work that >>>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>>> required to update all options. If our goal is to rework the >>>>>>>>>>>>>>>> configuration option entirely, I might suggest to switch >>>>>>>>>>>>>>>> to JSON >>>>>>>>>>>>>>>> format with JSON schema and JSON validator. However, setting >>>>>>>>>>>>>>>> properties in a CLI or web interface becomes more tricky the >>>>>> more >>>>>>>>>>>>>>>> nested structures are allowed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is >>>>>>>>>>>>>>>> centered >>>>>>>>>>>>>>>> around Java classes where T is determined by the default >>>>>>>>>>>>>>>> value. >>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit >>>>>>>>>>>>>>>> `intType()` method etc. The current design of validators >>>>>> centered >>>>>>>>>>>>>>>> around Java classes makes it possible to have typical domain >>>>>>>>>>>>>>>> validators baked by generics as you suggested. If we >>>>>>>>>>>>>>>> introduce >>>>>>>>>>>>>>>> types >>>>>>>>>>>>>>>> such as "quantity with measure and unit" we still need to >>>>>>>>>>>>>>>> get a >>>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>> out of this option at the end, so why changing a proven >>>>>>>>>>>>>>>> concept? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary >>>>>>>>>>>>>>>> nesting. As >>>>>>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every >>>>>>>>>>>>>>>> atomic >>>>>>>>>>>>>>>> option like "key=12" can be represented by a list >>>>>>>>>>>>>>>> "keys=12;13". >>>>>>>>>>>>>>>> But >>>>>>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated >>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>> option >>>>>>>>>>>>>>>> would start making this more complicated such as >>>>>>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), >>>>>>>>>>>>>>>> StringOption(...)))", do we want that? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 4. Correlation: The correlation part is one of the >>>>>>>>>>>>>>>> suggestions >>>>>>>>>>>>>>>> that I >>>>>>>>>>>>>>>> like least in the document. We can also discuss removing it >>>>>>>>>>>>>>>> entirely, >>>>>>>>>>>>>>>> but I think it solves the use case of relating options >>>>>>>>>>>>>>>> with each >>>>>>>>>>>>>>>> other in a flexible way right next to the actual option. >>>>>>>>>>>>>>>> Instead of >>>>>>>>>>>>>>>> being hidden in some component initialization, we should >>>>>>>>>>>>>>>> put it >>>>>>>>>>>>>>>> close >>>>>>>>>>>>>>>> to the option to also perform validation eagerly instead of >>>>>>>>>>>>>>>> failing >>>>>>>>>>>>>>>> at runtime when the option is accessed the first time. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>>>>>>>>>>>>>>>> A "List Type" sounds like a good direction to me. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. The >>>>>>>>>>>>>>>>> idea is >>>>>>>>>>>>>>>>> to see >>>>>>>>>>>>>>>>> if something like that can ease validation. Especially the >>>>>>>>>>>>> correlation >>>>>>>>>>>>>>>>> system seems quite complex (proxies to work around order of >>>>>>>>>>>>>>>>> initialization). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For example, let's assume we don't think primarily about >>>>>>>>>>>>>>>>> "java >>>>>>>>>>>>>>>>> types" but >>>>>>>>>>>>>>>>> would define types as one of the following (just examples, >>>>>>>>>>>>>>>>> haven't >>>>>>>>>>>>>>>>> thought >>>>>>>>>>>>>>>>> all the details through): >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> (a) category type: implies string, and a fix set of >>>>>> possible >>>>>>>>>>>>> values. >>>>>>>>>>>>>>>>> Those would be passes and naturally make it into the docs >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> validation. >>>>>>>>>>>>>>>>> Maps to a String or Enum in Java. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> (b) numeric integer type: implies long (or optionally >>>>>>>>>>>>>>>>> integer, >>>>>>>>>>> if >>>>>>>>>>>>>>>>> we want >>>>>>>>>>>>>>>>> to automatically check overflow / underflow). would take >>>>>> typical >>>>>>>>>>>>> domain >>>>>>>>>>>>>>>>> validators, like non-negative, etc. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> (c) numeric real type: same as above (double or >>>>>>>>>>>>>>>>> float) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> (d) numeric interval type: either defined as an >>>>>> interval, or >>>>>>>>>>>>>>>>> references >>>>>>>>>>>>>>>>> other parameter by key. validation by valid interval. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> (e) quantity: a measure and a unit. separately >>>>>>>>>>>>>>>>> parsable. >>>>>> The >>>>>>>>>>>>>>>>> measure's >>>>>>>>>>>>>>>>> type could be any of the numeric types above, with same >>>>>>>>>>>>>>>>> validation >>>>>>>>>>>>>>>>> rules. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> With a system like the above, would we still correlation >>>>>>>>>>>>>>>>> validators? >>>>>>>>>>>>>>>>> Are >>>>>>>>>>>>>>>>> there still cases that we need to catch early (config >>>>>>>>>>>>>>>>> loading) >>>>>> or >>>>>>>>>>>>>>>>> are the >>>>>>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup >>>>>> specific, >>>>>>>>>>>>>>>>> that it is >>>>>>>>>>>>>>>>> fine to handle them in component initialization? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>>>>>>>>>>>>>>>> <[hidden email]> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you for your opinion. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Actually list/composite types are the topics we spent the >>>>>>>>>>>>>>>>>> most of >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> time. I understand that from a perspective of a full blown >>>>>> type >>>>>>>>>>>>>>>>>> system, >>>>>>>>>>>>>>>>>> a field like isList may look weird. Please let me >>>>>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>>>>> bit >>>>>>>>>>>>> more >>>>>>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear >>>>>>>>>>>>>>>>>> enough >>>>>>>>>>>>>>>>>> about >>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is >>>>>>>>>>>>>>>>>> that they >>>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>>> have a string representation as they might come from a >>>>>>>>>>> configuration >>>>>>>>>>>>>>>>>> file. Moreover it must be a human readable format, so >>>>>>>>>>>>>>>>>> that the >>>>>>>>>>>>> values >>>>>>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did not >>>>>>>>>>>>>>>>>> want to >>>>>>>>>>>>>>>>>> add a >>>>>>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for >>>>>>>>>>>>>>>>>> lists >>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>> (and flat objects - I think though in the current design >>>>>>>>>>>>>>>>>> there is a >>>>>>>>>>>>>>>>>> mistake around the Configurable interface). I think though >>>>>>>>>>>>>>>>>> you have >>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>> point here and it would be better to have a >>>>>>>>>>>>>>>>>> ListConfigOption >>>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>>> this field. Does it make sense to you? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As for the second part of your message. I am not sure if I >>>>>>>>>>>>> understood >>>>>>>>>>>>>>>>>> it. The validators work with parse/deserialized values from >>>>>>>>>>>>>>>>>> Configuration that means they can be bound to the generic >>>>>>>>>>>>>>>>>> parameter >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends >>>>>>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the >>>>>>>>>>>>>>>>>> ConfigOption >>>>>>>>>>>>>>>>>> has anything to do with the validation logic. Could you >>>>>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>>>>> more what did you mean? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Dawid >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>>>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to do >>>>>> early >>>>>>>>>>>>>>>>>> validation. >>>>>>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad >>>>>>>>>>>>>>>>>>> hoc, >>>>>>>>>>>>>>>>>>> though. For >>>>>>>>>>>>>>>>>>> example, having a boolean "isList" is a clear >>>>>>>>>>>>>>>>>>> indication of >>>>>> not >>>>>>>>>>>>>>>>>>> having >>>>>>>>>>>>>>>>>>> thought through the type/category system. >>>>>>>>>>>>>>>>>>> Also, having a more clear category system makes validation >>>>>>>>>>> simpler. >>>>>>>>>>>>>>>>>>> For example, I have seen systems distinguishing between >>>>>> numeric >>>>>>>>>>>>>>>>>> parameters >>>>>>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible >>>>>>>>>>>>>>>>>>> values), >>>>>>>>>>>>>>>>>>> quantities >>>>>>>>>>>>>>>>>>> like duration and memory size (need measure and unit), >>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>> results in >>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>> elegant system for validation. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < >>>>>>>>>>>>> [hidden email] >>>>>>>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>>>>>>>>>>>>>>>> This allows the currently cluttered configuration of >>>>>>>>>>>>>>>>>>>> various >>>>>>>>>>>>>>>>>>>> modules to be unified. >>>>>>>>>>>>>>>>>>>> This is also first step of one of the keys to making new >>>>>>>>>>>>>>>>>>>> unified >>>>>>>>>>>>>>>>>>>> TableEnvironment available for production. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Previously, we did encounter complex configurations, >>>>>>>>>>>>>>>>>>>> such as >>>>>>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The >>>>>>>>>>>>>>>>>>>> skew may >>>>>>>>>>>>>>>>>>>> be a single field or a combination of multiple >>>>>>>>>>>>>>>>>>>> fields. >>>>>>>>>>>>>>>>>>>> So the >>>>>>>>>>>>>>>>>>>> configuration is very troublesome. We used JSON >>>>>>>>>>>>>>>>>>>> string >>>>>> to >>>>>>>>>>>>>>>>>>>> configure it. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>> Jingsong Lee >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>>>>>>> From:Jark Wu <[hidden email]> >>>>>>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>>>>>>>>>>>>>>>> To:dev <[hidden email]> >>>>>>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and >>>>>>>>>>>>> Configuration >>>>>>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a >>>>>> long >>>>>>>>>>>>> time. >>>>>>>>>>>>>>>>>>>> We have seen the benefit when developing blink >>>>>>>>>>>>>>>>>>>> configurations and >>>>>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>>>>> properties in 1.9 release. >>>>>>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed >>>>>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>>>>> I will leave my thoughts and comments there. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen < >>>>>> [hidden email] >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which >>>>>>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of >>>>>>>>>>>>>>>>>>>>> Flink. >>>>>>>>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment >>>>>>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by >>>>>>>>>>>>> configuration. >>>>>>>>>>>>>>>>>>>>> Will take a look at the document in days. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>> tison. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 >>>>>>>>>>>>>>>>>>>>> 下午10:12写道: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of >>>>>>>>>>>>>>>>>>>>>> ExecutionConfig and >>>>>>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is >>>>>>>>>>>>>>>>>>>>>> necessary >>>>>>>>>>>>>>>>>>>>>> to make >>>>>>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, >>>>>>>>>>>>>>>>>>>>>> with >>>>>> the >>>>>>>>>>>>>>>>>>>>>> new SQL >>>>>>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>>>>>> based on properties as well as more connectors and >>>>>>>>>>>>>>>>>>>>>> formats >>>>>>>>>>>>>>>>>>>>>> coming up, >>>>>>>>>>>>>>>>>>>>>> unified configuration becomes more important. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> We need more features around string-based configuration >>>>>>>>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>>>>>> future, >>>>>>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose >>>>>>>>>>>>>>>>>>>>>> FLIP-54 for >>>>>>>>>>>>>>>>>>>>>> evolving >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>>>>>>>>>>>>>>>>>>>>> In summary it adds: >>>>>>>>>>>>>>>>>>>>>> - documented types and validation >>>>>>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, list >>>>>>>>>>>>>>>>>>>>>> - simple non-nested object types >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>> |
Hi Timo,
I think I might have misunderstood the scope or motivation of the FLIP a little bit. Please let me clarify a little bit. Regarding "great if we don't put this burden on users", we should > consider who is actually using this API. It is not first-level API but > mostly API for Flink contributors. Most of the users will use API > classes ike ExecutionConfig or TableConfig or other builders for > performing configuration. They will never use the ConfigOptions classes > directly. So enforcing a duplicate method does not sound like a burden > to me. I thought the configuration will be used by all the plugins and components such as connectors. If that is the case, the Configuration sounds a public API just like UDF. So one may implement a plugin and set that plugin class in the configuration. Flink will do something like following: // Define the Plugin ConfigOption. SomePluginInterface extends Configurable. ConfigOption<SomePluginInterface> option = ConfigOptions.key("pluginConfigKey") .ConfigurableType(org.apache.flink.SomePluginInterface.class); // Instantiate the user configured plugin SomePluginInterface plugin = configuration.getConfigurableInstance(pluginConfigKey); // Programmatically, users will do the following to set the plugin. // Set the plugin class, alternatively, we can always require a ConfigurableFactory class as value. configurations.setConfigurable("pluginConfigKey", MyPluginClass.class); configurations.setInt("configKey1ForMyPluginClass", 123); // set the configurations required by my plugin class. // Non-programmatically, users can do the following to set the plugin in a config file, e.g. yaml pluginConfigKey: PATH_TO_MyPluginClass // Or PATH_TO_MyPluginClassFactory configKey1ForMyPluginClass: 123 Internally, the configuration may discover the MyPluginClassFactory, call MyPluginClassFactory.create(Configuration) and pass in itself as the configuration argument. From user's perspective, the way to use Configurable is the following: 1. Set a class type of the Plugin in the configuration via Configuration interface. 2. Provide a factory class for the Plugin, either by config value or by service provider mechanism. 3. Set the configurations consumed by the plugin, via something like a yaml file, or programmatically via Configuration interface. How would the util that you are suggesting look like? It would always > need to serialize/deserialize an object into an immutable string. This > is not very efficient, given that the instance already exists and can be > made immutable by the implementer by not exposing setters. Furthermore, > we would loose the declarative approach and could not generate > documentation. The current approach specifies the static final > sub-ConfigOptions either in Configurable object (initial design) or in > the ConfigurableFactory (current design) such that the docs generator > can read them. I'd imagine that in most cases, after a concrete Configurable (say ExecutionConfig) instance is created from the Configuration instance, we will just pass around the ExecutionConfig instead of the Configuration object. If so, the serialization to / deserialization from String will only happen once per JVM, which seems fine. I am not sure why the doc generation would be impacted. As long as the ConfigOptions go into the Configurable or ConfigurableFactory, the docs generator can still read them, right? Regarding "Configurable may be created and configured directly without > reading settings from a Configuration instance", there seems to be a > misunderstanding. This is a very common case if not the most common. As > mentioned before, take ExecutionConfig. This configuration is currently > only used in a programmatic-way and needs a way to be expressed as > ConfigOptions. CachedFile for instance will be a Configurable object > that will binary serialized most of the time when sending it to the > cluster but due to the Configurable design it is possible to store it in > a string representation as well. Thanks for the explanation. I feel this creating object then serialize / deserialize using configuration is more of an internal use case. We are essentially using the configurations to pass some arbitrary string around. Technically speaking we can use this way to send and receive any object. But I am not sure if this is something that we want to generalize and impact more public use cases. Personally I feel that Configurable As for CachedFile, it seems we do not plan to use configuration to pass that on? It would be good to avoid letting the Configurations to host arbitrary objects beyond the primitive types. To summarize, I am thinking if we should consider the following: 1. Design the Config mechanism as a cross-board API for not only internal usage, but for broader use cases. 2. If writeToConfiguration is only for internal use cases, maybe we can avoid adding it to the configurable interface. We can add another interface such as ExtractableConfigurable for internal usage. What do you think? Thanks, Jiangjie (Becket) Qin On Mon, Sep 2, 2019 at 11:59 PM Timo Walther <[hidden email]> wrote: > @Becket: > > Regarding "great if we don't put this burden on users", we should > consider who is actually using this API. It is not first-level API but > mostly API for Flink contributors. Most of the users will use API > classes ike ExecutionConfig or TableConfig or other builders for > performing configuration. They will never use the ConfigOptions classes > directly. So enforcing a duplicate method does not sound like a burden > to me. > > How would the util that you are suggesting look like? It would always > need to serialize/deserialize an object into an immutable string. This > is not very efficient, given that the instance already exists and can be > made immutable by the implementer by not exposing setters. Furthermore, > we would loose the declarative approach and could not generate > documentation. The current approach specifies the static final > sub-ConfigOptions either in Configurable object (initial design) or in > the ConfigurableFactory (current design) such that the docs generator > can read them. > > Regarding "Configurable may be created and configured directly without > reading settings from a Configuration instance", there seems to be a > misunderstanding. This is a very common case if not the most common. As > mentioned before, take ExecutionConfig. This configuration is currently > only used in a programmatic-way and needs a way to be expressed as > ConfigOptions. CachedFile for instance will be a Configurable object > that will binary serialized most of the time when sending it to the > cluster but due to the Configurable design it is possible to store it in > a string representation as well. > > @Aljoscha: > > Yes, this approach would also work. We still would need to call > writeToConf/readFromConf for duplicate() and ensure immutable semantics, > if this is really an important use case. But IMHO all configuration is > currently mutable (all API classes like ExecutionConfig, > CheckpointConfig, Configuration itself), I don't understand why > immutability needs to be discussed here. > > Regards, > Timo > > > On 02.09.19 16:22, Aljoscha Krettek wrote: > > Hi, > > > > Regarding the factory and duplicate() and whatnot, wouldn’t it work to > have a factory like this: > > > > /** > > * Allows to read and write an instance from and to {@link > Configuration}. A configurable instance > > * operates in its own key space in {@link Configuration} and will be > (de)prefixed by the framework. It cannot access keys from other options. A > factory must have a default constructor. > > * > > */ > > public interface ConfigurableFactory<T extends Configurable> { > > > > /** > > * Creates an instance from the given configuration. > > */ > > T fromConfiguration(ConfigurationReader configuration); > > } > > > > with Configurable being: > > > > /** > > * Allows to read and write an instance from and to {@link > Configuration}. A configurable instance > > * operates in its own key space in {@link Configuration} and will be > (de)prefixed by the framework. It cannot access keys from other options. A > factory must have a default constructor. > > * > > */ > > public interface Configurable { > > > > /** > > * Writes this instance to the given configuration. > > */ > > void writeToConfiguration(ConfigurationWriter configuration); // > method name TBD > > } > > > > This would make the Configurable immutable and we wouldn’t need a > duplicate() method. > > > > Best, > > Aljoscha > > > >> On 2. Sep 2019, at 14:40, Becket Qin <[hidden email]> wrote: > >> > >> Hi Timo and Dawid, > >> > >> Thanks for the patient reply. I agree that both option a) and option b) > can > >> solve the mutability problem. > >> > >> For option a), is it a little intrusive to add a duplicate() method for > a > >> Configurable? It would be great if we don't put this burden on users if > >> possible. > >> > >> For option b), I actually feel it is slightly better than option a) from > >> API perspective as getFactory() seems a more understandable method of a > >> Configurable compared with duplicate(). And users do not need to > implement > >> much more logic. > >> > >> I am curious what is the downside of keeping the Configuration simple to > >> only have primitive types, and always create the Configurable using a > util > >> method? If Configurables are rare, do we need to put the instantiation / > >> bookkeeping logic in Configuration? > >> > >> @Becket for the toConfiguration this is required for shipping the > >>> Configuration to TaskManager, so that we do not have to use java > >>> serializability. > >> Do you mean a Configurable may be created and configured directly > without > >> reading settings from a Configuration instance? I thought a Configurable > >> will always be created via a ConfigurableFactory by extracting required > >> configs from a Configuration. Am I missing something? > >> > >> Thanks, > >> > >> Jiangjie (Becket) Qin > >> > >> On Mon, Sep 2, 2019 at 4:47 PM Dawid Wysakowicz <[hidden email] > > > >> wrote: > >> > >>> Hi Timo, Becket > >>> > >>> From the options that Timo suggested for improving the mutability > >>> situation I would prefer option a) as this is the more explicit option > >>> and simpler option. Just as a remark, I think in general Configurable > >>> types for options will be rather very rare for some special use-cases, > >>> as the majority of options are rather simpler parameters of primitive > >>> types (+duration, memory) > >>> > >>> @Becket for the toConfiguration this is required for shipping the > >>> Configuration to TaskManager, so that we do not have to use java > >>> serializability. > >>> > >>> Best, > >>> > >>> Dawid > >>> > >>> > >>> On 02/09/2019 10:05, Timo Walther wrote: > >>>> Hi Becket, > >>>> > >>>> Re 1 & 3: "values in configurations should actually be immutable" > >>>> > >>>> I would also prefer immutability but most of our configuration is > >>>> mutable due to serialization/deserialization. Also maps and list could > >>>> be mutable in theory. It is difficult to really enforce that for > >>>> nested structures. I see two options: > >>>> > >>>> a) For the original design: How about we force implementers to add a > >>>> duplicate() method in a Configurable object? This would mean that the > >>>> object is still mutable but by duplicating the object both during > >>>> reading and writing we would avoid the problem you described. > >>>> > >>>> b) For the current design: We still use the factory approach but let a > >>>> Configurable object implement a getFactory() method such that we know > >>>> how to serialize the object. With the help of a factory we can also > >>>> duplicate the object easily during reading and writing and ensure > >>>> immutability. > >>>> > >>>> I would personally go for approach a) to not over-engineer this topic. > >>>> But I'm open for option b). > >>>> > >>>> Regards, > >>>> Timo > >>>> > >>>> > >>>> On 31.08.19 04:09, Becket Qin wrote: > >>>>> Hi Timo, > >>>>> > >>>>> Thanks for the reply. I am still a little concerned over the > >>>>> mutability of > >>>>> the Configurable which could be the value in Configuration. > >>>>> > >>>>> Re: 1 > >>>>> > >>>>>> But in general, people should not use any internal fields. > >>>>>> Configurable objects are meant for simple little helper POJOs, not > >>>>>> complex arbitrary nested data structures. > >>>>> This seems difficult to enforce... Ideally the values in > configurations > >>>>> should actually be immutable. The value can only be changed by > >>>>> explicitly > >>>>> calling setters in Configuration. Otherwise we may have weird > situation > >>>>> where the Configurable in the same configuration are different in two > >>>>> places because the configurable is modified in one places and not > >>>>> modified > >>>>> in another place. So I am a little concerned on putting a > >>>>> Configurable type > >>>>> in the Configuration map, because the value could be modified without > >>>>> explicitly setting the configuration. For example, can users do the > >>>>> following? > >>>>> > >>>>> Configurable configurable = > >>>>> configuration.getConfigurable(myConfigurableOption); > >>>>> configurable.setConfigA(123); // this already changes the > configurable > >>>>> object in the configuration. > >>>>> > >>>>> Re: 2 > >>>>> Thanks for confirming. As long as users will not have a situation > where > >>>>> they need to set two configurations with the same key but different > >>>>> descriptions, I think it is OK. > >>>>> > >>>>> Re: 3 > >>>>> This is actually kind of related to 1. i.e. Whether toConfiguration() > >>>>> guarantees the exact same object can be rebuilt from the > >>>>> configuration or > >>>>> not. I am still not quite sure about the use case of > toConfiguration() > >>>>> though. It seems indicating the Configurable is mutable, which might > be > >>>>> dangerous. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Jiangjie (Becket) Qin > >>>>> > >>>>> On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <[hidden email]> > >>>>> wrote: > >>>>> > >>>>>> Hi Becket, > >>>>>> > >>>>>> 1. First of all, you are totally right. The FLIP contains a bug due > to > >>>>>> the last minute changes that Dawid suggested: by having immutable > >>>>>> objects created by a factory we loose the serializability of the > >>>>>> Configuration because the factory itself is not stored in the > >>>>>> Configuration. I would propose to revert the last change and stick > to > >>>>>> the original design, which means that a object must implement the > >>>>>> Configurable interface and also implements > >>>>>> serialization/deserialization > >>>>>> methods such that also internal fields can be persisted as you > >>>>>> suggested. But in general, people should not use any internal > fields. > >>>>>> Configurable objects are meant for simple little helper POJOs, not > >>>>>> complex arbitrary nested data structures. > >>>>>> > >>>>>> It is Map<String, Object> because Configuration stores the raw > objects. > >>>>>> If you put a Boolean option into it, it remains Boolean. This makes > the > >>>>>> map very efficient for shipping to the cluster and accessing options > >>>>>> multiple times. The same for configurable objects. We put the pure > >>>>>> objects into the map without any serialization/deserialization. The > >>>>>> provided factory allows to convert the Object into a Configuration > and > >>>>>> we know how to serialize/deserializise a configuration because it is > >>>>>> just a key/value map. > >>>>>> > >>>>>> 2. Yes, this is what we had in mind. It should still be the same > >>>>>> configuration option. We would like to avoid specialized option keys > >>>>>> across components (exec.max-para and table.exec.max-para) if they > are > >>>>>> describing basically the same thing. But adding some more > description > >>>>>> like "TableOptions.MAX_PARALLELISM with description_1 + > description_2" > >>>>>> does not hurt. > >>>>>> > >>>>>> 3. They should restore the original object given that the > >>>>>> toConfiguration/fromConfiguration methods have been implemented > >>>>>> correctly. I will extend the example to make the logic clearer while > >>>>>> fixing the bug. > >>>>>> > >>>>>> Thanks for the healthy discussion, > >>>>>> Timo > >>>>>> > >>>>>> > >>>>>> On 30.08.19 15:29, Becket Qin wrote: > >>>>>>> Hi Timo, > >>>>>>> > >>>>>>> Thanks again for the clarification. Please see a few more questions > >>>>>> below. > >>>>>>> Re: 1 > >>>>>>> > >>>>>>>> Please also keep in mind that Configuration must not consist of > only > >>>>>>>> strings, it manages a Map<String, Object> for efficient access. > Every > >>>>>>>> map entry can have a string representation for persistence, but in > >>>>>>>> most > >>>>>>>> cases consists of unserialized objects. > >>>>>>> I'd like to understand this a bit more. The reason we have a > >>>>>>> Map<String, > >>>>>>> Object> in Configuration was because that Object could be either a > >>>>>> String, > >>>>>>> a List or a Map, right? But they eventually always boil down to > >>>>>>> Strings, > >>>>>> or > >>>>>>> maybe one of the predefined type that we know how to serialize. In > the > >>>>>>> current design, can the Object also be Configurable? > >>>>>>> If the value in the config Map<String, Object> can be Configurable > >>>>>> objects, > >>>>>>> how do we serialize them? Calling toConfiguration() seems not quite > >>>>>> working > >>>>>>> because there might be some other internal fields that are not > part of > >>>>>> the > >>>>>>> configuration. The modification to those fields will be lost if we > >>>>>>> simply > >>>>>>> use toConfiguration(). So the impact of modifying those > Configurable > >>>>>>> objects seems a little undefined. And it would be difficult to > prevent > >>>>>>> users from doing that. > >>>>>>> If the value in the config Map<String, Object> cannot be > Configurable > >>>>>>> objects, then it seems a little weird to have all the other > ConfigType > >>>>>>> stored in the ConfigMap in their own native type and accessed via > >>>>>>> getInteger() / getBoolean(), etc, while having ConfigurableType to > be > >>>>>>> different from others because one have to use ConfigurableFactory > >>>>>>> to get > >>>>>>> the configurations. > >>>>>>> > >>>>>>> Re: 2 > >>>>>>> > >>>>>>>> I think about the withExtendedDescription as a helper getter in a > >>>>>>>> different place, so that the option is easier to find in a > different > >>>>>>>> module from it was defined. > >>>>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be > >>>>>>>> equal > >>>>>> to: > >>>>>>>> public ConfigOption getMaxParallelismOption() { > >>>>>>>> return CoreOptions.MAX_PARALLELISM; > >>>>>>>> } > >>>>>>> Just to make sure I understand it correctly, does that mean users > will > >>>>>> see > >>>>>>> something like following? > >>>>>>> - CoreOptions.MAX_PARALLELISM with description_1; > >>>>>>> - TableOptions.MAX_PARALLELISM with description_1 + > description_2. > >>>>>>> - DataStreamOptions.MAX_PARALLELISM with description_1 + > >>>>>>> description_3. > >>>>>>> And users will only configure exactly one MAX_PARALLELISM cross the > >>>>>> board. > >>>>>>> So they won't be confused by setting two MAX_PARALLELISM config for > >>>>>>> two > >>>>>>> different modules, while they are actually the same config. If > that is > >>>>>> the > >>>>>>> case, I don't have further concern. > >>>>>>> > >>>>>>> Re: 3 > >>>>>>> Maybe I am thinking too much. I thought toBytes() / fromBytes() > >>>>>>> actually > >>>>>>> restore the original Object. But fromConfiguration() and > >>>>>> toConfiguration() > >>>>>>> does not do that, anything not in the configuration of the original > >>>>>> object > >>>>>>> will be lost. So it would be good to make that clear. Maybe a clear > >>>>>>> Java > >>>>>>> doc is sufficient. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Jiangjie (Becket) Qin > >>>>>>> > >>>>>>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz > >>>>>>> <[hidden email] > >>>>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Ad. 1 > >>>>>>>> > >>>>>>>> The advantage of our approach is that you have the type definition > >>>>>>>> close > >>>>>>>> to the option definition. The only difference is that it enables > >>>>>>>> expressing simple pojos with the primitives like > >>>>>>>> ConfigOption<Integer>, > >>>>>>>> ConfigOption<Long> etc. Otherwise as Timo said you will start > having > >>>>>>>> > >>>>>>>> the parsing logic scattered everywhere in the code base as it is > now. > >>>>>>>> The string representation in our proposal is exactly the same as > you > >>>>>>>> explained for those three options. The only difference is that you > >>>>>>>> don't > >>>>>>>> have to parse the elements of a List, Map etc. afterwards. > >>>>>>>> > >>>>>>>> Ad. 2 > >>>>>>>> > >>>>>>>> I think about the withExtendedDescription as a helper getter in a > >>>>>>>> different place, so that the option is easier to find in a > different > >>>>>>>> module from it was defined. > >>>>>>>> > >>>>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be > >>>>>>>> equal > >>>>>> to: > >>>>>>>> public ConfigOption getMaxParallelismOption() { > >>>>>>>> > >>>>>>>> return CoreOptions.MAX_PARALLELISM; > >>>>>>>> > >>>>>>>> } > >>>>>>>> > >>>>>>>> This allows to further clarify the description of the option in > the > >>>>>>>> context of a different module and end up in a seperate page in > >>>>>>>> documentation (but with a link to the original one). In the end > it is > >>>>>>>> exactly the same option. It has exactly same key, type, parsing > >>>>>>>> logic, > >>>>>>>> it is in the end forwarded to the same place. > >>>>>>>> > >>>>>>>> Ad. 3 > >>>>>>>> > >>>>>>>> Not sure if I understand your concerns here. As Timo said it is in > >>>>>>>> the > >>>>>>>> end sth similar to toBytes/fromBytes, but it puts itself to a > >>>>>>>> Configuration. Also just wanted to make sure we adjusted this part > >>>>>>>> slightly and now the ConfigOption takes ConfigurableFactory. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> > >>>>>>>> Dawid > >>>>>>>> > >>>>>>>> > >>>>>>>> On 30/08/2019 09:39, Timo Walther wrote: > >>>>>>>>> Hi Becket, > >>>>>>>>> > >>>>>>>>> thanks for the discussion. > >>>>>>>>> > >>>>>>>>> 1. ConfigOptions in their current design are bound to classes. > >>>>>>>>> Regarding, the option is "creating some Configurable objects > instead > >>>>>>>>> of defining the config to create > >>>>>>>>> those Configurable"? We just moved this logic to a factory, this > >>>>>>>>> factory can then also be used for other purposes. However, how > the > >>>>>>>>> option and objects are serialized to Configuration is still not > part > >>>>>>>>> of the option. The option is just pure declaration. > >>>>>>>>> > >>>>>>>>> If we would allow only List<String>, implementers would need to > >>>>>>>>> start > >>>>>>>>> implementing own parsing and validation logic all the time. We > would > >>>>>>>>> like to avoid that. > >>>>>>>>> > >>>>>>>>> Please also keep in mind that Configuration must not consist of > only > >>>>>>>>> strings, it manages a Map<String, Object> for efficient access. > >>>>>>>>> Every > >>>>>>>>> map entry can have a string representation for persistence, but > in > >>>>>>>>> most cases consists of unserialized objects. > >>>>>>>>> > >>>>>>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite > >>>>>>>>> keys, types or default values. But different layers might want to > >>>>>>>>> add > >>>>>>>>> helpful information. In our concrete use case for FLIP-59, > >>>>>>>>> ExecutionConfig has 50 properties and many of them are not > relevant > >>>>>>>>> for the Table layer or have no effect at all. We would like to > list > >>>>>>>>> and mention the most important config options again in the Table > >>>>>>>>> Configuration section, so that users are not confused, but with a > >>>>>>>>> strong link to the core option. E.g.: registered kryo serializers > >>>>>>>>> are > >>>>>>>>> also important also for Table users, we would like to add the > >>>>>>>>> comment > >>>>>>>>> "This option allows to modify the serialization of the ANY SQL > data > >>>>>>>>> type.". I think we should not spam the core configuration page > with > >>>>>>>>> comments from all layers, connectors, or libraries but keep this > in > >>>>>>>>> the corresponding component documentation. > >>>>>>>>> > >>>>>>>>> 3. But it is something like fromBytes() and toBytes()? It > serializes > >>>>>>>>> and deserializes T from a configuration? > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Timo > >>>>>>>>> > >>>>>>>>> On 29.08.19 19:14, Becket Qin wrote: > >>>>>>>>>> Hi Timo and Stephan, > >>>>>>>>>> > >>>>>>>>>> Thanks for the detail explanation. > >>>>>>>>>> > >>>>>>>>>> 1. I agree that each config should be in a human readable > >>>>>>>>>> format. My > >>>>>>>>>> concern is that the current List<Configurable> looks going a > little > >>>>>>>>>> too far > >>>>>>>>>> from what the configuration is supposed to do. They are > essentially > >>>>>>>>>> creating some Configurable objects instead of defining the > >>>>>>>>>> config to > >>>>>>>>>> create > >>>>>>>>>> those Configurable. This mixes ConfigOption and the usage of it. > >>>>>>>>>> API > >>>>>>>>>> wise > >>>>>>>>>> it would be good to keep the configs and their usages (such as > >>>>>>>>>> how to > >>>>>>>>>> create objects using the ConfigOption) apart from each other. > >>>>>>>>>> I am wondering if we can just make List also only take string. > For > >>>>>>>>>> example, > >>>>>>>>>> is the following definition of map and list sufficient? > >>>>>>>>>> > >>>>>>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can > be > >>>>>>>>>> defined > >>>>>>>>>> as: > >>>>>>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ... > >>>>>>>>>> > >>>>>>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be > defined > >>>>>> as: > >>>>>>>>>> list_config_name: v1, v2, v3, ... > >>>>>>>>>> > >>>>>>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, > >>>>>>>>>> String>>. It > >>>>>>>>>> can > >>>>>>>>>> be defined as: > >>>>>>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... > >>>>>>>>>> > >>>>>>>>>> All the key and values in the configuration are String. This > also > >>>>>>>>>> guarantees that the configuration is always serializable. > >>>>>>>>>> If we want to do one more step, we can allow the ConfigOption to > >>>>>>>>>> set > >>>>>> all > >>>>>>>>>> the primitive types and parse that for the users. So something > like > >>>>>>>>>> List<Integer>, List<Class<?>> seems fine. > >>>>>>>>>> > >>>>>>>>>> The configuration class could also have util methods to create a > >>>>>>>>>> list > >>>>>> of > >>>>>>>>>> configurable such as: > >>>>>>>>>> <T> List<T> > >>>>>> <Configuration#getConfigurableInstances(ListMapConfigOption, > >>>>>>>>>> Class<T> clazz). > >>>>>>>>>> But the configuration class will not take arbitrary > Configurable as > >>>>>> the > >>>>>>>>>> value of its config. > >>>>>>>>>> > >>>>>>>>>> 2. I might have misunderstood this. But my concern on the > >>>>>>>>>> description > >>>>>>>>>> extension is in the following example. > >>>>>>>>>> > >>>>>>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM = > >>>>>>>>>> > >>>>>>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( > >>>>>>>>>> "Note: That this property means that a table program has a > >>>>>>>>>> side-effect > >>>>>>>>>> XYZ."); > >>>>>>>>>> > >>>>>>>>>> In this case, we will have two MAX_PARALLELISM configs now. One > is > >>>>>>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I > >>>>>>>>>> suppose > >>>>>>>>>> users > >>>>>>>>>> will see both configurations. One with an extended description > >>>>>>>>>> and one > >>>>>>>>>> without. Let's say there is a third component which also users > >>>>>>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM > >>>>>>>>>> ConfigOption? If > >>>>>>>>>> so, what would that ConfigOption's description look like? > >>>>>>>>>> > >>>>>>>>>> Ideally, we would want to have just one > CoreOptions.MAX_PARALLELISM > >>>>>>>>>> and the > >>>>>>>>>> description should clearly state all the usage of this > >>>>>>>>>> ConfigOption. > >>>>>>>>>> > >>>>>>>>>> 3. I see, in that case, how about we name it something like > >>>>>>>>>> extractConfiguration()? I am just trying to see if we can make > it > >>>>>> clear > >>>>>>>>>> this is not something like fromBytes() and toBytes(). > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> > >>>>>>>>>> Jiangjie (Becket) Qin > >>>>>>>>>> > >>>>>>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther < > [hidden email]> > >>>>>>>> wrote: > >>>>>>>>>>> Hi Becket, > >>>>>>>>>>> > >>>>>>>>>>> let me try to clarify some of your questions: > >>>>>>>>>>> > >>>>>>>>>>> 1. For every option, we also needed to think about how to > >>>>>>>>>>> represent > >>>>>> it > >>>>>>>>>>> in a human readable format. We do not want to allow arbitrary > >>>>>>>>>>> nesting > >>>>>>>>>>> because that would easily allow to bypass the flattened > >>>>>>>>>>> hierarchy of > >>>>>>>>>>> config options (`session.memory.min`). The current design > >>>>>>>>>>> allows to > >>>>>>>>>>> represent every option type as a list. E.g.: > >>>>>>>>>>> > >>>>>>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12` > >>>>>>>>>>> `myObjectOption: field=12,other=true` can be > `myObjectListOption: > >>>>>>>>>>> field=12,other=true; field=12,other=true` > >>>>>>>>>>> `myPropertyOption: key=str0,other=str1` can be > >>>>>>>>>>> `myPropertyListOption: > >>>>>>>>>>> key=str0,other=str1;key=str0,other=str1` > >>>>>>>>>>> > >>>>>>>>>>> We need the atomic class for serialization/deserialization > both in > >>>>>>>>>>> binary and string format. > >>>>>>>>>>> > >>>>>>>>>>> ConfigOption<List> is not present in the code base yet, but > this > >>>>>>>>>>> FLIP is > >>>>>>>>>>> a preparation of making ExecutionConfig configurable. If you > look > >>>>>> into > >>>>>>>>>>> this class or also in existing table connectors/formats, you > >>>>>>>>>>> will see > >>>>>>>>>>> that each proposed option type has its requirements. > >>>>>>>>>>> > >>>>>>>>>>> 2. Regarding extending the description of ConfigOptions, the > >>>>>>>>>>> semantic of > >>>>>>>>>>> one option should be a super set of the other option. E.g. in > >>>>>>>>>>> Table > >>>>>> API > >>>>>>>>>>> we might use general ExecutionConfig properties. But we would > like > >>>>>>>>>>> to a) > >>>>>>>>>>> make external options more prominent in the Table API config > >>>>>>>>>>> docs to > >>>>>>>>>>> link people to properties they should pay attention b) notice > >>>>>>>>>>> about > >>>>>>>>>>> side > >>>>>>>>>>> effects. The core semantic of a property should not change. > >>>>>>>>>>> > >>>>>>>>>>> 3. The factory will not receive the entire configuration but > works > >>>>>> in a > >>>>>>>>>>> separate key space. For `myObjectOption` above, it would > receive a > >>>>>>>>>>> configuration that consists of `field: 12` and `other: true`. > >>>>>>>>>>> > >>>>>>>>>>> I agree. I will convert the document into a Wiki page today. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Timo > >>>>>>>>>>> > >>>>>>>>>>> On 29.08.19 09:00, Stephan Ewen wrote: > >>>>>>>>>>>> @Becket One thing that may be non-obvious is that the > >>>>>>>>>>>> Configuration > >>>>>>>>>>>> class > >>>>>>>>>>>> also defines serialization / persistence logic at the moment. > >>>>>>>>>>>> So it > >>>>>>>>>>>> needs > >>>>>>>>>>>> to know the set of types it supports. That stands in the way > >>>>>>>>>>>> of an > >>>>>>>>>>>> arbitrary generic map type. > >>>>>>>>>>>> > >>>>>>>>>>>> @Timo I agree though that it seems a bit inconsistent to have > one > >>>>>>>>>>>> collection orthogonal to the type (List) and another one > bound to > >>>>>>>>>>> specific > >>>>>>>>>>>> types (Map). > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin < > [hidden email] > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi Timo, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I > >>>>>>>>>>>>> have a > >>>>>>>>>>>>> few > >>>>>>>>>>>>> questions / comments. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption? > >>>>>>>>>>>>> Would it be enough to just check the atomicClass to see if it > >>>>>>>>>>>>> is a > >>>>>>>>>>>>> List > >>>>>>>>>>> or > >>>>>>>>>>>>> not? > >>>>>>>>>>>>> Also, in the ConfigOption<Map> class case, are we always > assume > >>>>>>>>>>>>> both key > >>>>>>>>>>>>> and value types are String? Can we just apply the same to the > >>>>>>>>>>>>> ConfigOption<List>? > >>>>>>>>>>>>> BTW, I did a quick search in the codebase but did not find > any > >>>>>>>>>>>>> usage of > >>>>>>>>>>>>> ConfigOption<List>. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2. The same config name, but with two ConfigOption with > >>>>>>>>>>>>> different > >>>>>>>>>>> semantic > >>>>>>>>>>>>> in different component seems super confusing. For example, > when > >>>>>> users > >>>>>>>>>>> set > >>>>>>>>>>>>> both configs, they may have no idea one is overriding the > other. > >>>>>>>>>>>>> There > >>>>>>>>>>>>> might be two cases: > >>>>>>>>>>>>> - If it is just the same config used by different > >>>>>>>>>>>>> components to > >>>>>>>>>>>>> act > >>>>>>>>>>>>> accordingly, it might be better to just have one config, but > >>>>>> describe > >>>>>>>>>>>>> clearly on how that config will be used. > >>>>>>>>>>>>> - If it is actually two configurations that can be set > >>>>>>>>>>>>> differently, I > >>>>>>>>>>>>> think the config names should just be different. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 3. Regarding the ConfigurableFactory, is the > toConfiguration() > >>>>>> method > >>>>>>>>>>>>> pretty much means getConfiguration()? The toConfiguration() > >>>>>>>>>>>>> method > >>>>>>>>>>> sounds > >>>>>>>>>>>>> like converting an object to a configuration, which only > >>>>>>>>>>>>> works if > >>>>>> the > >>>>>>>>>>>>> object does not contain any state / value. I am also > >>>>>>>>>>>>> wondering if > >>>>>>>>>>>>> there > >>>>>>>>>>> is > >>>>>>>>>>>>> a real use case of this method. Because supposedly the > >>>>>> configurations > >>>>>>>>>>> could > >>>>>>>>>>>>> just be passed around to caller of this method. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of > >>>>>>>>>>>>> in the > >>>>>>>>>>>>> doc before voting? The FLIP wiki allows track the > modification > >>>>>>>>>>>>> history > >>>>>>>>>>> and > >>>>>>>>>>>>> has a more established structure to ensure nothing is missed. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Jiangjie (Becket) Qin > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther > >>>>>>>>>>>>> <[hidden email]> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I updated the FLIP proposal one more time as mentioned in > the > >>>>>> voting > >>>>>>>>>>>>>> thread. If there are no objections, I will start a new > voting > >>>>>> thread > >>>>>>>>>>>>>> tomorrow at 9am Berlin time. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote: > >>>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> thanks for all the feedback we have received online and > >>>>>>>>>>>>>>> offline. > >>>>>> It > >>>>>>>>>>>>>>> showed that many people support the idea of evolving the > Flink > >>>>>>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP > >>>>>>>>>>>>>>> will > >>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>> solve all issues but at least will improve the current > status. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> We've updated the document and replaced the Correlation > part > >>>>>>>>>>>>>>> with the > >>>>>>>>>>>>>>> concept of a ConfigOptionGroup that can provide all > available > >>>>>>>>>>>>>>> options > >>>>>>>>>>>>>>> of a group plus custom group validators for eager > validation. > >>>>>>>>>>>>>>> For now, > >>>>>>>>>>>>>>> this eager group validation will only be used at certain > >>>>>>>>>>>>>>> locations in > >>>>>>>>>>>>>>> the Flink code but it prepares for maybe validating the > entire > >>>>>>>>>>>>>>> global > >>>>>>>>>>>>>>> configuration before submitting a job in the future. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Please take another look if you find time. I hope we can > >>>>>>>>>>>>>>> proceed > >>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>> the voting process if there are no objections. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: > >>>>>>>>>>>>>>>> Hi Stephan, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> thanks for your suggestions. Let me give you some > background > >>>>>> about > >>>>>>>>>>>>>>>> the decisions made in this FLIP: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" > >>>>>>>>>>>>>>>> because we > >>>>>> did > >>>>>>>>>>>>>>>> not want to change the entire configuration > infrastructure. > >>>>>>>>>>>>>>>> Both for > >>>>>>>>>>>>>>>> backwards-compatibility reasons and the amount of work > that > >>>>>>>>>>>>>>>> would be > >>>>>>>>>>>>>>>> required to update all options. If our goal is to rework > the > >>>>>>>>>>>>>>>> configuration option entirely, I might suggest to switch > >>>>>>>>>>>>>>>> to JSON > >>>>>>>>>>>>>>>> format with JSON schema and JSON validator. However, > setting > >>>>>>>>>>>>>>>> properties in a CLI or web interface becomes more tricky > the > >>>>>> more > >>>>>>>>>>>>>>>> nested structures are allowed. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class > is > >>>>>>>>>>>>>>>> centered > >>>>>>>>>>>>>>>> around Java classes where T is determined by the default > >>>>>>>>>>>>>>>> value. > >>>>>>>>>>>>>>>> The > >>>>>>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit > >>>>>>>>>>>>>>>> `intType()` method etc. The current design of validators > >>>>>> centered > >>>>>>>>>>>>>>>> around Java classes makes it possible to have typical > domain > >>>>>>>>>>>>>>>> validators baked by generics as you suggested. If we > >>>>>>>>>>>>>>>> introduce > >>>>>>>>>>>>>>>> types > >>>>>>>>>>>>>>>> such as "quantity with measure and unit" we still need to > >>>>>>>>>>>>>>>> get a > >>>>>>>>>>>>>>>> class > >>>>>>>>>>>>>>>> out of this option at the end, so why changing a proven > >>>>>>>>>>>>>>>> concept? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary > >>>>>>>>>>>>>>>> nesting. As > >>>>>>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For > every > >>>>>>>>>>>>>>>> atomic > >>>>>>>>>>>>>>>> option like "key=12" can be represented by a list > >>>>>>>>>>>>>>>> "keys=12;13". > >>>>>>>>>>>>>>>> But > >>>>>>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated > >>>>>>>>>>>>>>>> list > >>>>>>>>>>>>>>>> option > >>>>>>>>>>>>>>>> would start making this more complicated such as > >>>>>>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), > >>>>>>>>>>>>>>>> StringOption(...)))", do we want that? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 4. Correlation: The correlation part is one of the > >>>>>>>>>>>>>>>> suggestions > >>>>>>>>>>>>>>>> that I > >>>>>>>>>>>>>>>> like least in the document. We can also discuss removing > it > >>>>>>>>>>>>>>>> entirely, > >>>>>>>>>>>>>>>> but I think it solves the use case of relating options > >>>>>>>>>>>>>>>> with each > >>>>>>>>>>>>>>>> other in a flexible way right next to the actual option. > >>>>>>>>>>>>>>>> Instead of > >>>>>>>>>>>>>>>> being hidden in some component initialization, we should > >>>>>>>>>>>>>>>> put it > >>>>>>>>>>>>>>>> close > >>>>>>>>>>>>>>>> to the option to also perform validation eagerly instead > of > >>>>>>>>>>>>>>>> failing > >>>>>>>>>>>>>>>> at runtime when the option is accessed the first time. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: > >>>>>>>>>>>>>>>>> A "List Type" sounds like a good direction to me. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. > The > >>>>>>>>>>>>>>>>> idea is > >>>>>>>>>>>>>>>>> to see > >>>>>>>>>>>>>>>>> if something like that can ease validation. Especially > the > >>>>>>>>>>>>> correlation > >>>>>>>>>>>>>>>>> system seems quite complex (proxies to work around order > of > >>>>>>>>>>>>>>>>> initialization). > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> For example, let's assume we don't think primarily about > >>>>>>>>>>>>>>>>> "java > >>>>>>>>>>>>>>>>> types" but > >>>>>>>>>>>>>>>>> would define types as one of the following (just > examples, > >>>>>>>>>>>>>>>>> haven't > >>>>>>>>>>>>>>>>> thought > >>>>>>>>>>>>>>>>> all the details through): > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> (a) category type: implies string, and a fix set > of > >>>>>> possible > >>>>>>>>>>>>> values. > >>>>>>>>>>>>>>>>> Those would be passes and naturally make it into the docs > >>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> validation. > >>>>>>>>>>>>>>>>> Maps to a String or Enum in Java. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> (b) numeric integer type: implies long (or > optionally > >>>>>>>>>>>>>>>>> integer, > >>>>>>>>>>> if > >>>>>>>>>>>>>>>>> we want > >>>>>>>>>>>>>>>>> to automatically check overflow / underflow). would take > >>>>>> typical > >>>>>>>>>>>>> domain > >>>>>>>>>>>>>>>>> validators, like non-negative, etc. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> (c) numeric real type: same as above (double or > >>>>>>>>>>>>>>>>> float) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> (d) numeric interval type: either defined as an > >>>>>> interval, or > >>>>>>>>>>>>>>>>> references > >>>>>>>>>>>>>>>>> other parameter by key. validation by valid interval. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> (e) quantity: a measure and a unit. separately > >>>>>>>>>>>>>>>>> parsable. > >>>>>> The > >>>>>>>>>>>>>>>>> measure's > >>>>>>>>>>>>>>>>> type could be any of the numeric types above, with same > >>>>>>>>>>>>>>>>> validation > >>>>>>>>>>>>>>>>> rules. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> With a system like the above, would we still correlation > >>>>>>>>>>>>>>>>> validators? > >>>>>>>>>>>>>>>>> Are > >>>>>>>>>>>>>>>>> there still cases that we need to catch early (config > >>>>>>>>>>>>>>>>> loading) > >>>>>> or > >>>>>>>>>>>>>>>>> are the > >>>>>>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup > >>>>>> specific, > >>>>>>>>>>>>>>>>> that it is > >>>>>>>>>>>>>>>>> fine to handle them in component initialization? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz > >>>>>>>>>>>>>>>>> <[hidden email]> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hi Stephan, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thank you for your opinion. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Actually list/composite types are the topics we spent > the > >>>>>>>>>>>>>>>>>> most of > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> time. I understand that from a perspective of a full > blown > >>>>>> type > >>>>>>>>>>>>>>>>>> system, > >>>>>>>>>>>>>>>>>> a field like isList may look weird. Please let me > >>>>>>>>>>>>>>>>>> elaborate a > >>>>>>>>>>>>>>>>>> bit > >>>>>>>>>>>>> more > >>>>>>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear > >>>>>>>>>>>>>>>>>> enough > >>>>>>>>>>>>>>>>>> about > >>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options > is > >>>>>>>>>>>>>>>>>> that they > >>>>>>>>>>>>>>>>>> must > >>>>>>>>>>>>>>>>>> have a string representation as they might come from a > >>>>>>>>>>> configuration > >>>>>>>>>>>>>>>>>> file. Moreover it must be a human readable format, so > >>>>>>>>>>>>>>>>>> that the > >>>>>>>>>>>>> values > >>>>>>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did > not > >>>>>>>>>>>>>>>>>> want to > >>>>>>>>>>>>>>>>>> add a > >>>>>>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow > for > >>>>>>>>>>>>>>>>>> lists > >>>>>>>>>>>>> only > >>>>>>>>>>>>>>>>>> (and flat objects - I think though in the current design > >>>>>>>>>>>>>>>>>> there is a > >>>>>>>>>>>>>>>>>> mistake around the Configurable interface). I think > though > >>>>>>>>>>>>>>>>>> you have > >>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>> point here and it would be better to have a > >>>>>>>>>>>>>>>>>> ListConfigOption > >>>>>>>>>>>>>>>>>> instead of > >>>>>>>>>>>>>>>>>> this field. Does it make sense to you? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> As for the second part of your message. I am not sure > if I > >>>>>>>>>>>>> understood > >>>>>>>>>>>>>>>>>> it. The validators work with parse/deserialized values > from > >>>>>>>>>>>>>>>>>> Configuration that means they can be bound to the > generic > >>>>>>>>>>>>>>>>>> parameter > >>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends > >>>>>>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in > the > >>>>>>>>>>>>>>>>>> ConfigOption > >>>>>>>>>>>>>>>>>> has anything to do with the validation logic. Could you > >>>>>>>>>>>>>>>>>> elaborate a > >>>>>>>>>>>>>>>>>> bit > >>>>>>>>>>>>>>>>>> more what did you mean? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Dawid > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: > >>>>>>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to > do > >>>>>> early > >>>>>>>>>>>>>>>>>> validation. > >>>>>>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad > >>>>>>>>>>>>>>>>>>> hoc, > >>>>>>>>>>>>>>>>>>> though. For > >>>>>>>>>>>>>>>>>>> example, having a boolean "isList" is a clear > >>>>>>>>>>>>>>>>>>> indication of > >>>>>> not > >>>>>>>>>>>>>>>>>>> having > >>>>>>>>>>>>>>>>>>> thought through the type/category system. > >>>>>>>>>>>>>>>>>>> Also, having a more clear category system makes > validation > >>>>>>>>>>> simpler. > >>>>>>>>>>>>>>>>>>> For example, I have seen systems distinguishing between > >>>>>> numeric > >>>>>>>>>>>>>>>>>> parameters > >>>>>>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible > >>>>>>>>>>>>>>>>>>> values), > >>>>>>>>>>>>>>>>>>> quantities > >>>>>>>>>>>>>>>>>>> like duration and memory size (need measure and unit), > >>>>>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>>>>> results in > >>>>>>>>>>>>>>>>>> an > >>>>>>>>>>>>>>>>>>> elegant system for validation. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < > >>>>>>>>>>>>> [hidden email] > >>>>>>>>>>>>>>>>>> .invalid> > >>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. > >>>>>>>>>>>>>>>>>>>> This allows the currently cluttered configuration of > >>>>>>>>>>>>>>>>>>>> various > >>>>>>>>>>>>>>>>>>>> modules to be unified. > >>>>>>>>>>>>>>>>>>>> This is also first step of one of the keys to making > new > >>>>>>>>>>>>>>>>>>>> unified > >>>>>>>>>>>>>>>>>>>> TableEnvironment available for production. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Previously, we did encounter complex configurations, > >>>>>>>>>>>>>>>>>>>> such as > >>>>>>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The > >>>>>>>>>>>>>>>>>>>> skew may > >>>>>>>>>>>>>>>>>>>> be a single field or a combination of multiple > >>>>>>>>>>>>>>>>>>>> fields. > >>>>>>>>>>>>>>>>>>>> So the > >>>>>>>>>>>>>>>>>>>> configuration is very troublesome. We used JSON > >>>>>>>>>>>>>>>>>>>> string > >>>>>> to > >>>>>>>>>>>>>>>>>>>> configure it. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>> Jingsong Lee > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>> > ------------------------------------------------------------------ > >>>>>>>>>>>>>>>>>>>> From:Jark Wu <[hidden email]> > >>>>>>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 > >>>>>>>>>>>>>>>>>>>> To:dev <[hidden email]> > >>>>>>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and > >>>>>>>>>>>>> Configuration > >>>>>>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind > for a > >>>>>> long > >>>>>>>>>>>>> time. > >>>>>>>>>>>>>>>>>>>> We have seen the benefit when developing blink > >>>>>>>>>>>>>>>>>>>> configurations and > >>>>>>>>>>>>>>>>>> connector > >>>>>>>>>>>>>>>>>>>> properties in 1.9 release. > >>>>>>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed > >>>>>>>>>>>>>>>>>>>> design. > >>>>>>>>>>>>>>>>>>>> I will leave my thoughts and comments there. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen < > >>>>>> [hidden email] > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Hi Timo, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution > which > >>>>>>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of > >>>>>>>>>>>>>>>>>>>>> Flink. > >>>>>>>>>>>>>>>>>>>>> In client API enhancement, we likely make the > deployment > >>>>>>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by > >>>>>>>>>>>>> configuration. > >>>>>>>>>>>>>>>>>>>>> Will take a look at the document in days. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>> tison. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Timo Walther <[hidden email]> 于2019年8月16日周五 > >>>>>>>>>>>>>>>>>>>>> 下午10:12写道: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of > >>>>>>>>>>>>>>>>>>>>>> ExecutionConfig and > >>>>>>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is > >>>>>>>>>>>>>>>>>>>>>> necessary > >>>>>>>>>>>>>>>>>>>>>> to make > >>>>>>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, > >>>>>>>>>>>>>>>>>>>>>> with > >>>>>> the > >>>>>>>>>>>>>>>>>>>>>> new SQL > >>>>>>>>>>>>>>>>>>>> DDL > >>>>>>>>>>>>>>>>>>>>>> based on properties as well as more connectors and > >>>>>>>>>>>>>>>>>>>>>> formats > >>>>>>>>>>>>>>>>>>>>>> coming up, > >>>>>>>>>>>>>>>>>>>>>> unified configuration becomes more important. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> We need more features around string-based > configuration > >>>>>>>>>>>>>>>>>>>>>> in the > >>>>>>>>>>>>>>>>>>>>>> future, > >>>>>>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose > >>>>>>>>>>>>>>>>>>>>>> FLIP-54 for > >>>>>>>>>>>>>>>>>>>>>> evolving > >>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>> > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > >>>>>>>>>>>>>>>>>>>>>> In summary it adds: > >>>>>>>>>>>>>>>>>>>>>> - documented types and validation > >>>>>>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, > list > >>>>>>>>>>>>>>>>>>>>>> - simple non-nested object types > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Looking forward to your feedback, > >>>>>>>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>> > > |
Hi Becket, Regarding your example of the Configurable, this is not what we envisioned. What we think of a Configurable is a way to store simple POJOs that might be parameters for your some part of your system. The example you described should be rather expressed in a following way: // Define the Plugin class ConfigOption. ConfigOption<Class<SomePluginInterface>> option = ConfigOptions.key("pluginConfigKey") .classType(org.apache.flink.SomePluginInterface.class); class Host extends Configurable { private String address; private Int port; private static final ConfigOption<String> addressOption private static final ConfigOption<Int> portOption; //let's assume the original solution for now private void fromConfiguration(ConfigurationReader config) { this.address = config.get(addressOption); this.port = config.get(portOption); } } ConfigOption<Host> hostOption = ConfigOptions.key("configKey1ForMyPluginClass") .classType(org.apache.flink.SomePluginInterface.class); // Instantiate the user configured plugin Class<SomePluginInterface> pluginClass = configuration.get(pluginConfigKey); Host host = configuration.get(hostOption); pluginClass.newInstance(host); // Programmatically, users will do the following to set the plugin. configurations.set(option, MyPluginClass.class); configurations.set(hostOption, new Host()); // set the configurations required by my plugin class. // Non-programmatically, users can do the following to set the plugin in a config file, e.g. yaml pluginConfigKey: PATH_TO_MyPluginClass configKey1ForMyPluginClass: host: localhost, port: 1234 // see that all options of Configurable are stored in a single key "I'd imagine that in most cases, after a concrete Configurable (say ExecutionConfig) instance is created from the Configuration instance, we will just pass around the ExecutionConfig instead of the Configuration object." That's not the case. It's the
Configuration object that's sent around in most of the cases. It
would be a major rework of the Configuration handling to
actually change that. Best, Dawid On 03/09/2019 05:59, Becket Qin wrote:
Hi Timo, I think I might have misunderstood the scope or motivation of the FLIP a little bit. Please let me clarify a little bit. Regarding "great if we don't put this burden on users", we shouldconsider who is actually using this API. It is not first-level API but mostly API for Flink contributors. Most of the users will use API classes ike ExecutionConfig or TableConfig or other builders for performing configuration. They will never use the ConfigOptions classes directly. So enforcing a duplicate method does not sound like a burden to me.I thought the configuration will be used by all the plugins and components such as connectors. If that is the case, the Configuration sounds a public API just like UDF. So one may implement a plugin and set that plugin class in the configuration. Flink will do something like following: // Define the Plugin ConfigOption. SomePluginInterface extends Configurable. ConfigOption<SomePluginInterface> option = ConfigOptions.key("pluginConfigKey") .ConfigurableType(org.apache.flink.SomePluginInterface.class); // Instantiate the user configured plugin SomePluginInterface plugin = configuration.getConfigurableInstance(pluginConfigKey); // Programmatically, users will do the following to set the plugin. // Set the plugin class, alternatively, we can always require a ConfigurableFactory class as value. configurations.setConfigurable("pluginConfigKey", MyPluginClass.class); configurations.setInt("configKey1ForMyPluginClass", 123); // set the configurations required by my plugin class. // Non-programmatically, users can do the following to set the plugin in a config file, e.g. yaml pluginConfigKey: PATH_TO_MyPluginClass // Or PATH_TO_MyPluginClassFactory configKey1ForMyPluginClass: 123 Internally, the configuration may discover the MyPluginClassFactory, call MyPluginClassFactory.create(Configuration) and pass in itself as the configuration argument. From user's perspective, the way to use Configurable is the following: 1. Set a class type of the Plugin in the configuration via Configuration interface. 2. Provide a factory class for the Plugin, either by config value or by service provider mechanism. 3. Set the configurations consumed by the plugin, via something like a yaml file, or programmatically via Configuration interface. How would the util that you are suggesting look like? It would alwaysneed to serialize/deserialize an object into an immutable string. This is not very efficient, given that the instance already exists and can be made immutable by the implementer by not exposing setters. Furthermore, we would loose the declarative approach and could not generate documentation. The current approach specifies the static final sub-ConfigOptions either in Configurable object (initial design) or in the ConfigurableFactory (current design) such that the docs generator can read them.I'd imagine that in most cases, after a concrete Configurable (say ExecutionConfig) instance is created from the Configuration instance, we will just pass around the ExecutionConfig instead of the Configuration object. If so, the serialization to / deserialization from String will only happen once per JVM, which seems fine. I am not sure why the doc generation would be impacted. As long as the ConfigOptions go into the Configurable or ConfigurableFactory, the docs generator can still read them, right? Regarding "Configurable may be created and configured directly withoutreading settings from a Configuration instance", there seems to be a misunderstanding. This is a very common case if not the most common. As mentioned before, take ExecutionConfig. This configuration is currently only used in a programmatic-way and needs a way to be expressed as ConfigOptions. CachedFile for instance will be a Configurable object that will binary serialized most of the time when sending it to the cluster but due to the Configurable design it is possible to store it in a string representation as well.Thanks for the explanation. I feel this creating object then serialize / deserialize using configuration is more of an internal use case. We are essentially using the configurations to pass some arbitrary string around. Technically speaking we can use this way to send and receive any object. But I am not sure if this is something that we want to generalize and impact more public use cases. Personally I feel that Configurable As for CachedFile, it seems we do not plan to use configuration to pass that on? It would be good to avoid letting the Configurations to host arbitrary objects beyond the primitive types. To summarize, I am thinking if we should consider the following: 1. Design the Config mechanism as a cross-board API for not only internal usage, but for broader use cases. 2. If writeToConfiguration is only for internal use cases, maybe we can avoid adding it to the configurable interface. We can add another interface such as ExtractableConfigurable for internal usage. What do you think? Thanks, Jiangjie (Becket) Qin On Mon, Sep 2, 2019 at 11:59 PM Timo Walther [hidden email] wrote:@Becket: Regarding "great if we don't put this burden on users", we should consider who is actually using this API. It is not first-level API but mostly API for Flink contributors. Most of the users will use API classes ike ExecutionConfig or TableConfig or other builders for performing configuration. They will never use the ConfigOptions classes directly. So enforcing a duplicate method does not sound like a burden to me. How would the util that you are suggesting look like? It would always need to serialize/deserialize an object into an immutable string. This is not very efficient, given that the instance already exists and can be made immutable by the implementer by not exposing setters. Furthermore, we would loose the declarative approach and could not generate documentation. The current approach specifies the static final sub-ConfigOptions either in Configurable object (initial design) or in the ConfigurableFactory (current design) such that the docs generator can read them. Regarding "Configurable may be created and configured directly without reading settings from a Configuration instance", there seems to be a misunderstanding. This is a very common case if not the most common. As mentioned before, take ExecutionConfig. This configuration is currently only used in a programmatic-way and needs a way to be expressed as ConfigOptions. CachedFile for instance will be a Configurable object that will binary serialized most of the time when sending it to the cluster but due to the Configurable design it is possible to store it in a string representation as well. @Aljoscha: Yes, this approach would also work. We still would need to call writeToConf/readFromConf for duplicate() and ensure immutable semantics, if this is really an important use case. But IMHO all configuration is currently mutable (all API classes like ExecutionConfig, CheckpointConfig, Configuration itself), I don't understand why immutability needs to be discussed here. Regards, Timo On 02.09.19 16:22, Aljoscha Krettek wrote:Hi, Regarding the factory and duplicate() and whatnot, wouldn’t it work tohave a factory like this:/** * Allows to read and write an instance from and to {@linkConfiguration}. A configurable instance* operates in its own key space in {@link Configuration} and will be(de)prefixed by the framework. It cannot access keys from other options. A factory must have a default constructor.* */ public interface ConfigurableFactory<T extends Configurable> { /** * Creates an instance from the given configuration. */ T fromConfiguration(ConfigurationReader configuration); } with Configurable being: /** * Allows to read and write an instance from and to {@linkConfiguration}. A configurable instance* operates in its own key space in {@link Configuration} and will be(de)prefixed by the framework. It cannot access keys from other options. A factory must have a default constructor.* */ public interface Configurable { /** * Writes this instance to the given configuration. */ void writeToConfiguration(ConfigurationWriter configuration); //method name TBD} This would make the Configurable immutable and we wouldn’t need aduplicate() method.Best, AljoschaOn 2. Sep 2019, at 14:40, Becket Qin [hidden email] wrote: Hi Timo and Dawid, Thanks for the patient reply. I agree that both option a) and option b)cansolve the mutability problem. For option a), is it a little intrusive to add a duplicate() method foraConfigurable? It would be great if we don't put this burden on users if possible. For option b), I actually feel it is slightly better than option a) from API perspective as getFactory() seems a more understandable method of a Configurable compared with duplicate(). And users do not need toimplementmuch more logic. I am curious what is the downside of keeping the Configuration simple to only have primitive types, and always create the Configurable using autilmethod? If Configurables are rare, do we need to put the instantiation / bookkeeping logic in Configuration? @Becket for the toConfiguration this is required for shipping theConfiguration to TaskManager, so that we do not have to use java serializability.Do you mean a Configurable may be created and configured directlywithoutreading settings from a Configuration instance? I thought a Configurable will always be created via a ConfigurableFactory by extracting required configs from a Configuration. Am I missing something? Thanks, Jiangjie (Becket) Qin On Mon, Sep 2, 2019 at 4:47 PM Dawid Wysakowicz <[hidden email]wrote:Hi Timo, Becket From the options that Timo suggested for improving the mutability situation I would prefer option a) as this is the more explicit option and simpler option. Just as a remark, I think in general Configurable types for options will be rather very rare for some special use-cases, as the majority of options are rather simpler parameters of primitive types (+duration, memory) @Becket for the toConfiguration this is required for shipping the Configuration to TaskManager, so that we do not have to use java serializability. Best, Dawid On 02/09/2019 10:05, Timo Walther wrote:Hi Becket, Re 1 & 3: "values in configurations should actually be immutable" I would also prefer immutability but most of our configuration is mutable due to serialization/deserialization. Also maps and list could be mutable in theory. It is difficult to really enforce that for nested structures. I see two options: a) For the original design: How about we force implementers to add a duplicate() method in a Configurable object? This would mean that the object is still mutable but by duplicating the object both during reading and writing we would avoid the problem you described. b) For the current design: We still use the factory approach but let a Configurable object implement a getFactory() method such that we know how to serialize the object. With the help of a factory we can also duplicate the object easily during reading and writing and ensure immutability. I would personally go for approach a) to not over-engineer this topic. But I'm open for option b). Regards, Timo On 31.08.19 04:09, Becket Qin wrote:Hi Timo, Thanks for the reply. I am still a little concerned over the mutability of the Configurable which could be the value in Configuration. Re: 1But in general, people should not use any internal fields. Configurable objects are meant for simple little helper POJOs, not complex arbitrary nested data structures.This seems difficult to enforce... Ideally the values inconfigurationsshould actually be immutable. The value can only be changed by explicitly calling setters in Configuration. Otherwise we may have weirdsituationwhere the Configurable in the same configuration are different in two places because the configurable is modified in one places and not modified in another place. So I am a little concerned on putting a Configurable type in the Configuration map, because the value could be modified without explicitly setting the configuration. For example, can users do the following? Configurable configurable = configuration.getConfigurable(myConfigurableOption); configurable.setConfigA(123); // this already changes theconfigurableobject in the configuration. Re: 2 Thanks for confirming. As long as users will not have a situationwherethey need to set two configurations with the same key but different descriptions, I think it is OK. Re: 3 This is actually kind of related to 1. i.e. Whether toConfiguration() guarantees the exact same object can be rebuilt from the configuration or not. I am still not quite sure about the use case oftoConfiguration()though. It seems indicating the Configurable is mutable, which mightbedangerous. Thanks, Jiangjie (Becket) Qin On Fri, Aug 30, 2019 at 10:04 PM Timo Walther [hidden email] wrote:Hi Becket, 1. First of all, you are totally right. The FLIP contains a bug duetothe last minute changes that Dawid suggested: by having immutable objects created by a factory we loose the serializability of the Configuration because the factory itself is not stored in the Configuration. I would propose to revert the last change and sticktothe original design, which means that a object must implement the Configurable interface and also implements serialization/deserialization methods such that also internal fields can be persisted as you suggested. But in general, people should not use any internalfields.Configurable objects are meant for simple little helper POJOs, not complex arbitrary nested data structures. It is Map<String, Object> because Configuration stores the rawobjects.If you put a Boolean option into it, it remains Boolean. This makesthemap very efficient for shipping to the cluster and accessing options multiple times. The same for configurable objects. We put the pure objects into the map without any serialization/deserialization. The provided factory allows to convert the Object into a Configurationandwe know how to serialize/deserializise a configuration because it is just a key/value map. 2. Yes, this is what we had in mind. It should still be the same configuration option. We would like to avoid specialized option keys across components (exec.max-para and table.exec.max-para) if theyaredescribing basically the same thing. But adding some moredescriptionlike "TableOptions.MAX_PARALLELISM with description_1 +description_2"does not hurt. 3. They should restore the original object given that the toConfiguration/fromConfiguration methods have been implemented correctly. I will extend the example to make the logic clearer while fixing the bug. Thanks for the healthy discussion, Timo On 30.08.19 15:29, Becket Qin wrote:Hi Timo, Thanks again for the clarification. Please see a few more questionsbelow.Re: 1Please also keep in mind that Configuration must not consist ofonlystrings, it manages a Map<String, Object> for efficient access.Everymap entry can have a string representation for persistence, but in most cases consists of unserialized objects.I'd like to understand this a bit more. The reason we have a Map<String, Object> in Configuration was because that Object could be either aString,a List or a Map, right? But they eventually always boil down to Strings,ormaybe one of the predefined type that we know how to serialize. Inthecurrent design, can the Object also be Configurable? If the value in the config Map<String, Object> can be Configurableobjects,how do we serialize them? Calling toConfiguration() seems not quiteworkingbecause there might be some other internal fields that are notpart oftheconfiguration. The modification to those fields will be lost if we simply use toConfiguration(). So the impact of modifying thoseConfigurableobjects seems a little undefined. And it would be difficult topreventusers from doing that. If the value in the config Map<String, Object> cannot beConfigurableobjects, then it seems a little weird to have all the otherConfigTypestored in the ConfigMap in their own native type and accessed via getInteger() / getBoolean(), etc, while having ConfigurableType tobedifferent from others because one have to use ConfigurableFactory to get the configurations. Re: 2I think about the withExtendedDescription as a helper getter in a different place, so that the option is easier to find in adifferentmodule from it was defined. The MAX_PARALLELISM option in TableOptions would conceptually be equalto:public ConfigOption getMaxParallelismOption() { return CoreOptions.MAX_PARALLELISM; }Just to make sure I understand it correctly, does that mean userswillseesomething like following? - CoreOptions.MAX_PARALLELISM with description_1; - TableOptions.MAX_PARALLELISM with description_1 +description_2.- DataStreamOptions.MAX_PARALLELISM with description_1 + description_3. And users will only configure exactly one MAX_PARALLELISM cross theboard.So they won't be confused by setting two MAX_PARALLELISM config for two different modules, while they are actually the same config. Ifthat isthecase, I don't have further concern. Re: 3 Maybe I am thinking too much. I thought toBytes() / fromBytes() actually restore the original Object. But fromConfiguration() andtoConfiguration()does not do that, anything not in the configuration of the originalobjectwill be lost. So it would be good to make that clear. Maybe a clear Java doc is sufficient. Thanks, Jiangjie (Becket) Qin On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz <[hidden email] wrote:Hi, Ad. 1 The advantage of our approach is that you have the type definition close to the option definition. The only difference is that it enables expressing simple pojos with the primitives like ConfigOption<Integer>, ConfigOption<Long> etc. Otherwise as Timo said you will starthavingthe parsing logic scattered everywhere in the code base as it isnow.The string representation in our proposal is exactly the same asyouexplained for those three options. The only difference is that you don't have to parse the elements of a List, Map etc. afterwards. Ad. 2 I think about the withExtendedDescription as a helper getter in a different place, so that the option is easier to find in adifferentmodule from it was defined. The MAX_PARALLELISM option in TableOptions would conceptually be equalto:public ConfigOption getMaxParallelismOption() { return CoreOptions.MAX_PARALLELISM; } This allows to further clarify the description of the option inthecontext of a different module and end up in a seperate page in documentation (but with a link to the original one). In the endit isexactly the same option. It has exactly same key, type, parsing logic, it is in the end forwarded to the same place. Ad. 3 Not sure if I understand your concerns here. As Timo said it is in the end sth similar to toBytes/fromBytes, but it puts itself to a Configuration. Also just wanted to make sure we adjusted this part slightly and now the ConfigOption takes ConfigurableFactory. Best, Dawid On 30/08/2019 09:39, Timo Walther wrote:Hi Becket, thanks for the discussion. 1. ConfigOptions in their current design are bound to classes. Regarding, the option is "creating some Configurable objectsinsteadof defining the config to create those Configurable"? We just moved this logic to a factory, this factory can then also be used for other purposes. However, howtheoption and objects are serialized to Configuration is still notpartof the option. The option is just pure declaration. If we would allow only List<String>, implementers would need to start implementing own parsing and validation logic all the time. Wewouldlike to avoid that. Please also keep in mind that Configuration must not consist ofonlystrings, it manages a Map<String, Object> for efficient access. Every map entry can have a string representation for persistence, butinmost cases consists of unserialized objects. 2. MAX_PARALLELISM is still defined just once. We don't overwrite keys, types or default values. But different layers might want to add helpful information. In our concrete use case for FLIP-59, ExecutionConfig has 50 properties and many of them are notrelevantfor the Table layer or have no effect at all. We would like tolistand mention the most important config options again in the Table Configuration section, so that users are not confused, but with a strong link to the core option. E.g.: registered kryo serializers are also important also for Table users, we would like to add the comment "This option allows to modify the serialization of the ANY SQLdatatype.". I think we should not spam the core configuration pagewithcomments from all layers, connectors, or libraries but keep thisinthe corresponding component documentation. 3. But it is something like fromBytes() and toBytes()? Itserializesand deserializes T from a configuration? Regards, Timo On 29.08.19 19:14, Becket Qin wrote:Hi Timo and Stephan, Thanks for the detail explanation. 1. I agree that each config should be in a human readable format. My concern is that the current List<Configurable> looks going alittletoo far from what the configuration is supposed to do. They areessentiallycreating some Configurable objects instead of defining the config to create those Configurable. This mixes ConfigOption and the usage of it. API wise it would be good to keep the configs and their usages (such as how to create objects using the ConfigOption) apart from each other. I am wondering if we can just make List also only take string.Forexample, is the following definition of map and list sufficient? A MapConfigOption is ConfigOption<Map<String, String>>. It canbedefined as: map_config_name: k1=v1, k2=v2, k3=v3, ... A ListConfigOption is ConfigOption<List<String>>. It can bedefinedas:list_config_name: v1, v2, v3, ... A ListOfMapConfigOption is ConfigOption<List<Map<String, String>>. It can be defined as: list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... All the key and values in the configuration are String. Thisalsoguarantees that the configuration is always serializable. If we want to do one more step, we can allow the ConfigOption to setallthe primitive types and parse that for the users. So somethinglikeList<Integer>, List<Class<?>> seems fine. The configuration class could also have util methods to create a listofconfigurable such as: <T> List<T><Configuration#getConfigurableInstances(ListMapConfigOption,Class<T> clazz). But the configuration class will not take arbitraryConfigurable asthevalue of its config. 2. I might have misunderstood this. But my concern on the description extension is in the following example. public static final ConfigOption<Integer> MAX_PARALLELISM = CoreOptions.MAX_PARALLELISM.withExtendedDescription( "Note: That this property means that a table program has a side-effect XYZ."); In this case, we will have two MAX_PARALLELISM configs now. OneisCoreOptions.MAX_PARALLELISM. The other one is defined here. I suppose users will see both configurations. One with an extended description and one without. Let's say there is a third component which also users MAX_PARALLELISM, will there be yet another MAX_PARALLELISM ConfigOption? If so, what would that ConfigOption's description look like? Ideally, we would want to have just oneCoreOptions.MAX_PARALLELISMand the description should clearly state all the usage of this ConfigOption. 3. I see, in that case, how about we name it something like extractConfiguration()? I am just trying to see if we can makeitclearthis is not something like fromBytes() and toBytes(). Thanks, Jiangjie (Becket) Qin On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[hidden email]>wrote:Hi Becket, let me try to clarify some of your questions: 1. For every option, we also needed to think about how to representitin a human readable format. We do not want to allow arbitrary nesting because that would easily allow to bypass the flattened hierarchy of config options (`session.memory.min`). The current design allows to represent every option type as a list. E.g.: `myIntOption: 12` can be `myIntListOption: 12;12` `myObjectOption: field=12,other=true` can be`myObjectListOption:field=12,other=true; field=12,other=true` `myPropertyOption: key=str0,other=str1` can be `myPropertyListOption: key=str0,other=str1;key=str0,other=str1` We need the atomic class for serialization/deserializationboth inbinary and string format. ConfigOption<List> is not present in the code base yet, butthisFLIP is a preparation of making ExecutionConfig configurable. If youlookintothis class or also in existing table connectors/formats, you will see that each proposed option type has its requirements. 2. Regarding extending the description of ConfigOptions, the semantic of one option should be a super set of the other option. E.g. in TableAPIwe might use general ExecutionConfig properties. But we wouldliketo a) make external options more prominent in the Table API config docs to link people to properties they should pay attention b) notice about side effects. The core semantic of a property should not change. 3. The factory will not receive the entire configuration butworksin aseparate key space. For `myObjectOption` above, it wouldreceive aconfiguration that consists of `field: 12` and `other: true`. I agree. I will convert the document into a Wiki page today. Thanks, Timo On 29.08.19 09:00, Stephan Ewen wrote:@Becket One thing that may be non-obvious is that the Configuration class also defines serialization / persistence logic at the moment. So it needs to know the set of types it supports. That stands in the way of an arbitrary generic map type. @Timo I agree though that it seems a bit inconsistent to haveonecollection orthogonal to the type (List) and another onebound tospecifictypes (Map). On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[hidden email]wrote:Hi Timo, Thanks for the proposal. Sorry for the late comments, but I have a few questions / comments. 1. Is a new field of isList necessary in the ConfigOption? Would it be enough to just check the atomicClass to see if it is a Listornot? Also, in the ConfigOption<Map> class case, are we alwaysassumeboth key and value types are String? Can we just apply the same to the ConfigOption<List>? BTW, I did a quick search in the codebase but did not findanyusage of ConfigOption<List>. 2. The same config name, but with two ConfigOption with differentsemanticin different component seems super confusing. For example,whenuserssetboth configs, they may have no idea one is overriding theother.There might be two cases: - If it is just the same config used by different components to act accordingly, it might be better to just have one config, butdescribeclearly on how that config will be used. - If it is actually two configurations that can be set differently, I think the config names should just be different. 3. Regarding the ConfigurableFactory, is thetoConfiguration()methodpretty much means getConfiguration()? The toConfiguration() methodsoundslike converting an object to a configuration, which only works iftheobject does not contain any state / value. I am also wondering if thereisa real use case of this method. Because supposedly theconfigurationscouldjust be passed around to caller of this method. Also, can you put the proposal into the FLIP wiki instead of in thedoc before voting? The FLIP wiki allows track themodificationhistoryandhas a more established structure to ensure nothing is missed. Thanks, Jiangjie (Becket) Qin On Tue, Aug 27, 2019 at 11:34 PM Timo Walther [hidden email]wrote:Hi everyone, I updated the FLIP proposal one more time as mentioned inthevotingthread. If there are no objections, I will start a newvotingthreadtomorrow at 9am Berlin time. Thanks, Timo On 22.08.19 14:19, Timo Walther wrote:Hi everyone, thanks for all the feedback we have received online and offline.Itshowed that many people support the idea of evolving theFlinkconfiguration functionality. I'm almost sure that this FLIP will not solve all issues but at least will improve the currentstatus.We've updated the document and replaced the Correlationpartwith the concept of a ConfigOptionGroup that can provide allavailableoptions of a group plus custom group validators for eagervalidation.For now, this eager group validation will only be used at certain locations in the Flink code but it prepares for maybe validating theentireglobal configuration before submitting a job in the future. Please take another look if you find time. I hope we can proceed with the voting process if there are no objections. Regards, Timo Am 19.08.19 um 12:54 schrieb Timo Walther:Hi Stephan, thanks for your suggestions. Let me give you somebackgroundaboutthe decisions made in this FLIP: 1. Goal: The FLIP is labelled "evolve" not "rework" because wedidnot want to change the entire configurationinfrastructure.Both for backwards-compatibility reasons and the amount of workthatwould be required to update all options. If our goal is to reworktheconfiguration option entirely, I might suggest to switch to JSON format with JSON schema and JSON validator. However,settingproperties in a CLI or web interface becomes more trickythemorenested structures are allowed. 2. Class-based Options: The current ConfigOption<T> classiscentered around Java classes where T is determined by the default value. The FLIP just makes this more explicit by offering an explicit `intType()` method etc. The current design of validatorscenteredaround Java classes makes it possible to have typicaldomainvalidators baked by generics as you suggested. If we introduce types such as "quantity with measure and unit" we still need to get a class out of this option at the end, so why changing a proven concept? 3. List Options: The `isList` prevents having arbitrary nesting. As Dawid mentioned, we kept human readability in mind. Foreveryatomic option like "key=12" can be represented by a list "keys=12;13". But we don't want to go further; esp. no nesting. A dedicated list option would start making this more complicated such as "ListOption(ObjectOption(ListOption(IntOption, ...), StringOption(...)))", do we want that? 4. Correlation: The correlation part is one of the suggestions that I like least in the document. We can also discuss removingitentirely, but I think it solves the use case of relating options with each other in a flexible way right next to the actual option. Instead of being hidden in some component initialization, we should put it close to the option to also perform validation eagerly insteadoffailing at runtime when the option is accessed the first time. Regards, Timo Am 18.08.19 um 23:32 schrieb Stephan Ewen:A "List Type" sounds like a good direction to me. The comment on the type system was a bit brief, I agree.Theidea is to see if something like that can ease validation. Especiallythecorrelationsystem seems quite complex (proxies to work around orderofinitialization). For example, let's assume we don't think primarily about "java types" but would define types as one of the following (justexamples,haven't thought all the details through): (a) category type: implies string, and a fix setofpossiblevalues.Those would be passes and naturally make it into the docs and validation. Maps to a String or Enum in Java. (b) numeric integer type: implies long (oroptionallyinteger,ifwe want to automatically check overflow / underflow). would taketypicaldomainvalidators, like non-negative, etc. (c) numeric real type: same as above (double or float) (d) numeric interval type: either defined as aninterval, orreferences other parameter by key. validation by valid interval. (e) quantity: a measure and a unit. separately parsable.Themeasure's type could be any of the numeric types above, with same validation rules. With a system like the above, would we still correlation validators? Are there still cases that we need to catch early (config loading)orare the remaining cases sufficiently rare and runtime or setupspecific,that it is fine to handle them in component initialization? On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz [hidden email] wrote:Hi Stephan, Thank you for your opinion. Actually list/composite types are the topics we spentthemost ofthetime. I understand that from a perspective of a fullblowntypesystem, a field like isList may look weird. Please let me elaborate a bitmoreon the reason behind it though. Maybe we weren't clear enough aboutitin the FLIP. The key feature of all the conifg optionsisthat they must have a string representation as they might come from aconfigurationfile. Moreover it must be a human readable format, so that thevaluesmight be manually adjusted. Having that in mind we didnotwant to add a support of an arbitrary nesting and we decided to allowforlistsonly(and flat objects - I think though in the current design there is a mistake around the Configurable interface). I thinkthoughyou haveapoint here and it would be better to have a ListConfigOption instead of this field. Does it make sense to you? As for the second part of your message. I am not sureif Iunderstoodit. The validators work with parse/deserialized valuesfromConfiguration that means they can be bound to thegenericparameterofConfiguration. You can have a RangeValidator<? extends Comparable/Number>. I don't think the type hierarchy intheConfigOption has anything to do with the validation logic. Could you elaborate a bit more what did you mean? Best, Dawid On 18/08/2019 16:42, Stephan Ewen wrote:I like the idea of enhancing the configuration and todoearlyvalidation.I feel that some of the ideas in the FLIP seem a bit ad hoc, though. For example, having a boolean "isList" is a clear indication ofnothaving thought through the type/category system. Also, having a more clear category system makesvalidationsimpler.For example, I have seen systems distinguishing betweennumericparameters(valid ranges), category parameters (set of possible values), quantities like duration and memory size (need measure and unit), which results inanelegant system for validation. On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <[hidden email].invalid>wrote:+1 to this, thanks Timo and Dawid for the design. This allows the currently cluttered configuration of various modules to be unified. This is also first step of one of the keys to makingnewunified TableEnvironment available for production. Previously, we did encounter complex configurations, such as specifying the skewed values of column in DDL. The skew may be a single field or a combination of multiple fields. So the configuration is very troublesome. We used JSON stringtoconfigure it. Best, Jingsong Lee------------------------------------------------------------------From:Jark Wu [hidden email] Send Time:2019年8月16日(星期五) 16:44 To:dev [hidden email] Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption andConfigurationThanks for starting this design Timo and Dawid, Improving ConfigOption has been hovering in my mindfor alongtime.We have seen the benefit when developing blink configurations andconnectorproperties in 1.9 release. Thanks for bringing it up and make such a detailed design. I will leave my thoughts and comments there. Cheers, Jark On Fri, 16 Aug 2019 at 22:30, Zili Chen <[hidden email]wrote:Hi Timo, It looks interesting. Thanks for preparing this FLIP! Client API enhancement benefit from this evolutionwhichhopefully provides a better view of configuration of Flink. In client API enhancement, we likely make thedeploymentof cluster and submission of job totally defined byconfiguration.Will take a look at the document in days. Best, tison. Timo Walther [hidden email] 于2019年8月16日周五 下午10:12写道:Hi everyone, Dawid and I are working on making parts of ExecutionConfig and TableConfig configurable via config options. This is necessary to make all properties also available in SQL. Additionally, withthenew SQLDDLbased on properties as well as more connectors and formats coming up, unified configuration becomes more important. We need more features around string-basedconfigurationin the future, which is why Dawid and I would like to propose FLIP-54 for evolvingtheConfigOption and Configuration classes:https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/editIn summary it adds: - documented types and validation - more common types such as memory size, duration,list- simple non-nested object types Looking forward to your feedback, Timo signature.asc (849 bytes) Download Attachment |
Hi Becket,
it is definitely API but I would rather consider configuration as second-level API. Usually, a connector or plugin should have some nice builder pattern with helpful JavaDocs instead of just exposing the pure ConfigOptions. I think there is some general misunderstanding what a configurable object should be: 1) When serializing a configurable object into a string representation, all attributes are put under a single top-level key. Your example of a plugin class should already spread multiple keys and should not use a configurable object. 2) A configurable object should not represent an entire configuration like ExecutionConfig. Because why should you put all 50 attributes under a single key? 3) "avoid letting the Configurations to host arbitrary objects beyond the primitive types": We need a way of expressing List<Tuple3<String, String, Boolean>> or List<Tuple2<String, Integer>>. There is no clean way of expressing such data structures by only using primitive types. Regards, Timo On 03.09.19 09:08, Dawid Wysakowicz wrote: > > Hi Becket, > > Regarding your example of the Configurable, this is not what we > envisioned. What we think of a Configurable is a way to store simple > POJOs that might be parameters for your some part of your system. The > example you described should be rather expressed in a following way: > > // Define the Plugin class ConfigOption. > ConfigOption<Class<SomePluginInterface>> option = > ConfigOptions.key("pluginConfigKey") > .classType(org.apache.flink.SomePluginInterface.class); > > class Host extends Configurable { > > private String address; > private Int port; > > private static final ConfigOption<String> addressOption > private static final ConfigOption<Int> portOption; > > > //let's assume the original solution for now > private void fromConfiguration(ConfigurationReader config) { > > this.address = config.get(addressOption); > this.port = config.get(portOption); > } > > } > > ConfigOption<Host> hostOption = > ConfigOptions.key("configKey1ForMyPluginClass") > .classType(org.apache.flink.SomePluginInterface.class); > > > // Instantiate the user configured plugin > Class<SomePluginInterface> pluginClass = configuration.get(pluginConfigKey); > Host host = configuration.get(hostOption); > pluginClass.newInstance(host); > > // Programmatically, users will do the following to set the plugin. > configurations.set(option, MyPluginClass.class); > configurations.set(hostOption, new Host()); // set the > configurations required by my plugin class. > > // Non-programmatically, users can do the following to set the plugin in a > config file, e.g. yaml > pluginConfigKey: PATH_TO_MyPluginClass > configKey1ForMyPluginClass: host: localhost, port: 1234 // see that all options of Configurable are stored in a single key > > "I'd imagine that in most cases, after a concrete Configurable (say > ExecutionConfig) instance is created from the Configuration instance, > we will just pass around the ExecutionConfig instead of the > Configuration object." > > That's not the case. It's the Configuration object that's sent around > in most of the cases. Itwould be a major rework of the Configuration > handling to actually change that. > > Best, > > Dawid > > > On 03/09/2019 05:59, Becket Qin wrote: >> Hi Timo, >> >> I think I might have misunderstood the scope or motivation of the FLIP a >> little bit. Please let me clarify a little bit. >> >> Regarding "great if we don't put this burden on users", we should >>> consider who is actually using this API. It is not first-level API but >>> mostly API for Flink contributors. Most of the users will use API >>> classes ike ExecutionConfig or TableConfig or other builders for >>> performing configuration. They will never use the ConfigOptions classes >>> directly. So enforcing a duplicate method does not sound like a burden >>> to me. >> I thought the configuration will be used by all the plugins and components >> such as connectors. If that is the case, the Configuration sounds a public >> API just like UDF. So one may implement a plugin and set that plugin class >> in the configuration. Flink will do something like following: >> >> // Define the Plugin ConfigOption. SomePluginInterface extends Configurable. >> ConfigOption<SomePluginInterface> option = >> ConfigOptions.key("pluginConfigKey") >> >> .ConfigurableType(org.apache.flink.SomePluginInterface.class); >> >> // Instantiate the user configured plugin >> SomePluginInterface plugin = >> configuration.getConfigurableInstance(pluginConfigKey); >> >> // Programmatically, users will do the following to set the plugin. >> // Set the plugin class, alternatively, we can always require a >> ConfigurableFactory class as value. >> configurations.setConfigurable("pluginConfigKey", MyPluginClass.class); >> configurations.setInt("configKey1ForMyPluginClass", 123); // set the >> configurations required by my plugin class. >> >> // Non-programmatically, users can do the following to set the plugin in a >> config file, e.g. yaml >> pluginConfigKey: PATH_TO_MyPluginClass // Or PATH_TO_MyPluginClassFactory >> configKey1ForMyPluginClass: 123 >> >> Internally, the configuration may discover the MyPluginClassFactory, call >> MyPluginClassFactory.create(Configuration) and pass in itself as the >> configuration argument. >> >> From user's perspective, the way to use Configurable is the following: >> 1. Set a class type of the Plugin in the configuration via Configuration >> interface. >> 2. Provide a factory class for the Plugin, either by config value or by >> service provider mechanism. >> 3. Set the configurations consumed by the plugin, via something like a yaml >> file, or programmatically via Configuration interface. >> >> How would the util that you are suggesting look like? It would always >>> need to serialize/deserialize an object into an immutable string. This >>> is not very efficient, given that the instance already exists and can be >>> made immutable by the implementer by not exposing setters. Furthermore, >>> we would loose the declarative approach and could not generate >>> documentation. The current approach specifies the static final >>> sub-ConfigOptions either in Configurable object (initial design) or in >>> the ConfigurableFactory (current design) such that the docs generator >>> can read them. >> I'd imagine that in most cases, after a concrete Configurable (say >> ExecutionConfig) instance is created from the Configuration instance, we >> will just pass around the ExecutionConfig instead of the Configuration >> object. If so, the serialization to / deserialization from String will only >> happen once per JVM, which seems fine. I am not sure why the doc generation >> would be impacted. As long as the ConfigOptions go into the Configurable or >> ConfigurableFactory, the docs generator can still read them, right? >> >> Regarding "Configurable may be created and configured directly without >>> reading settings from a Configuration instance", there seems to be a >>> misunderstanding. This is a very common case if not the most common. As >>> mentioned before, take ExecutionConfig. This configuration is currently >>> only used in a programmatic-way and needs a way to be expressed as >>> ConfigOptions. CachedFile for instance will be a Configurable object >>> that will binary serialized most of the time when sending it to the >>> cluster but due to the Configurable design it is possible to store it in >>> a string representation as well. >> Thanks for the explanation. I feel this creating object then serialize / >> deserialize using configuration is more of an internal use case. We are >> essentially using the configurations to pass some arbitrary string around. >> Technically speaking we can use this way to send and receive any object. >> But I am not sure if this is something that we want to generalize and >> impact more public use cases. >> Personally I feel that Configurable >> As for CachedFile, it seems we do not plan to use configuration to pass >> that on? It would be good to avoid letting the Configurations to host >> arbitrary objects beyond the primitive types. >> >> To summarize, I am thinking if we should consider the following: >> 1. Design the Config mechanism as a cross-board API for not only internal >> usage, but for broader use cases. >> 2. If writeToConfiguration is only for internal use cases, maybe we can >> avoid adding it to the configurable interface. We can add another interface >> such as ExtractableConfigurable for internal usage. >> >> What do you think? >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Mon, Sep 2, 2019 at 11:59 PM Timo Walther<[hidden email]> wrote: >> >>> @Becket: >>> >>> Regarding "great if we don't put this burden on users", we should >>> consider who is actually using this API. It is not first-level API but >>> mostly API for Flink contributors. Most of the users will use API >>> classes ike ExecutionConfig or TableConfig or other builders for >>> performing configuration. They will never use the ConfigOptions classes >>> directly. So enforcing a duplicate method does not sound like a burden >>> to me. >>> >>> How would the util that you are suggesting look like? It would always >>> need to serialize/deserialize an object into an immutable string. This >>> is not very efficient, given that the instance already exists and can be >>> made immutable by the implementer by not exposing setters. Furthermore, >>> we would loose the declarative approach and could not generate >>> documentation. The current approach specifies the static final >>> sub-ConfigOptions either in Configurable object (initial design) or in >>> the ConfigurableFactory (current design) such that the docs generator >>> can read them. >>> >>> Regarding "Configurable may be created and configured directly without >>> reading settings from a Configuration instance", there seems to be a >>> misunderstanding. This is a very common case if not the most common. As >>> mentioned before, take ExecutionConfig. This configuration is currently >>> only used in a programmatic-way and needs a way to be expressed as >>> ConfigOptions. CachedFile for instance will be a Configurable object >>> that will binary serialized most of the time when sending it to the >>> cluster but due to the Configurable design it is possible to store it in >>> a string representation as well. >>> >>> @Aljoscha: >>> >>> Yes, this approach would also work. We still would need to call >>> writeToConf/readFromConf for duplicate() and ensure immutable semantics, >>> if this is really an important use case. But IMHO all configuration is >>> currently mutable (all API classes like ExecutionConfig, >>> CheckpointConfig, Configuration itself), I don't understand why >>> immutability needs to be discussed here. >>> >>> Regards, >>> Timo >>> >>> >>> On 02.09.19 16:22, Aljoscha Krettek wrote: >>>> Hi, >>>> >>>> Regarding the factory and duplicate() and whatnot, wouldn’t it work to >>> have a factory like this: >>>> /** >>>> * Allows to read and write an instance from and to {@link >>> Configuration}. A configurable instance >>>> * operates in its own key space in {@link Configuration} and will be >>> (de)prefixed by the framework. It cannot access keys from other options. A >>> factory must have a default constructor. >>>> * >>>> */ >>>> public interface ConfigurableFactory<T extends Configurable> { >>>> >>>> /** >>>> * Creates an instance from the given configuration. >>>> */ >>>> T fromConfiguration(ConfigurationReader configuration); >>>> } >>>> >>>> with Configurable being: >>>> >>>> /** >>>> * Allows to read and write an instance from and to {@link >>> Configuration}. A configurable instance >>>> * operates in its own key space in {@link Configuration} and will be >>> (de)prefixed by the framework. It cannot access keys from other options. A >>> factory must have a default constructor. >>>> * >>>> */ >>>> public interface Configurable { >>>> >>>> /** >>>> * Writes this instance to the given configuration. >>>> */ >>>> void writeToConfiguration(ConfigurationWriter configuration); // >>> method name TBD >>>> } >>>> >>>> This would make the Configurable immutable and we wouldn’t need a >>> duplicate() method. >>>> Best, >>>> Aljoscha >>>> >>>>> On 2. Sep 2019, at 14:40, Becket Qin<[hidden email]> wrote: >>>>> >>>>> Hi Timo and Dawid, >>>>> >>>>> Thanks for the patient reply. I agree that both option a) and option b) >>> can >>>>> solve the mutability problem. >>>>> >>>>> For option a), is it a little intrusive to add a duplicate() method for >>> a >>>>> Configurable? It would be great if we don't put this burden on users if >>>>> possible. >>>>> >>>>> For option b), I actually feel it is slightly better than option a) from >>>>> API perspective as getFactory() seems a more understandable method of a >>>>> Configurable compared with duplicate(). And users do not need to >>> implement >>>>> much more logic. >>>>> >>>>> I am curious what is the downside of keeping the Configuration simple to >>>>> only have primitive types, and always create the Configurable using a >>> util >>>>> method? If Configurables are rare, do we need to put the instantiation / >>>>> bookkeeping logic in Configuration? >>>>> >>>>> @Becket for the toConfiguration this is required for shipping the >>>>>> Configuration to TaskManager, so that we do not have to use java >>>>>> serializability. >>>>> Do you mean a Configurable may be created and configured directly >>> without >>>>> reading settings from a Configuration instance? I thought a Configurable >>>>> will always be created via a ConfigurableFactory by extracting required >>>>> configs from a Configuration. Am I missing something? >>>>> >>>>> Thanks, >>>>> >>>>> Jiangjie (Becket) Qin >>>>> >>>>> On Mon, Sep 2, 2019 at 4:47 PM Dawid Wysakowicz <[hidden email] >>>>> wrote: >>>>> >>>>>> Hi Timo, Becket >>>>>> >>>>>> From the options that Timo suggested for improving the mutability >>>>>> situation I would prefer option a) as this is the more explicit option >>>>>> and simpler option. Just as a remark, I think in general Configurable >>>>>> types for options will be rather very rare for some special use-cases, >>>>>> as the majority of options are rather simpler parameters of primitive >>>>>> types (+duration, memory) >>>>>> >>>>>> @Becket for the toConfiguration this is required for shipping the >>>>>> Configuration to TaskManager, so that we do not have to use java >>>>>> serializability. >>>>>> >>>>>> Best, >>>>>> >>>>>> Dawid >>>>>> >>>>>> >>>>>> On 02/09/2019 10:05, Timo Walther wrote: >>>>>>> Hi Becket, >>>>>>> >>>>>>> Re 1 & 3: "values in configurations should actually be immutable" >>>>>>> >>>>>>> I would also prefer immutability but most of our configuration is >>>>>>> mutable due to serialization/deserialization. Also maps and list could >>>>>>> be mutable in theory. It is difficult to really enforce that for >>>>>>> nested structures. I see two options: >>>>>>> >>>>>>> a) For the original design: How about we force implementers to add a >>>>>>> duplicate() method in a Configurable object? This would mean that the >>>>>>> object is still mutable but by duplicating the object both during >>>>>>> reading and writing we would avoid the problem you described. >>>>>>> >>>>>>> b) For the current design: We still use the factory approach but let a >>>>>>> Configurable object implement a getFactory() method such that we know >>>>>>> how to serialize the object. With the help of a factory we can also >>>>>>> duplicate the object easily during reading and writing and ensure >>>>>>> immutability. >>>>>>> >>>>>>> I would personally go for approach a) to not over-engineer this topic. >>>>>>> But I'm open for option b). >>>>>>> >>>>>>> Regards, >>>>>>> Timo >>>>>>> >>>>>>> >>>>>>> On 31.08.19 04:09, Becket Qin wrote: >>>>>>>> Hi Timo, >>>>>>>> >>>>>>>> Thanks for the reply. I am still a little concerned over the >>>>>>>> mutability of >>>>>>>> the Configurable which could be the value in Configuration. >>>>>>>> >>>>>>>> Re: 1 >>>>>>>> >>>>>>>>> But in general, people should not use any internal fields. >>>>>>>>> Configurable objects are meant for simple little helper POJOs, not >>>>>>>>> complex arbitrary nested data structures. >>>>>>>> This seems difficult to enforce... Ideally the values in >>> configurations >>>>>>>> should actually be immutable. The value can only be changed by >>>>>>>> explicitly >>>>>>>> calling setters in Configuration. Otherwise we may have weird >>> situation >>>>>>>> where the Configurable in the same configuration are different in two >>>>>>>> places because the configurable is modified in one places and not >>>>>>>> modified >>>>>>>> in another place. So I am a little concerned on putting a >>>>>>>> Configurable type >>>>>>>> in the Configuration map, because the value could be modified without >>>>>>>> explicitly setting the configuration. For example, can users do the >>>>>>>> following? >>>>>>>> >>>>>>>> Configurable configurable = >>>>>>>> configuration.getConfigurable(myConfigurableOption); >>>>>>>> configurable.setConfigA(123); // this already changes the >>> configurable >>>>>>>> object in the configuration. >>>>>>>> >>>>>>>> Re: 2 >>>>>>>> Thanks for confirming. As long as users will not have a situation >>> where >>>>>>>> they need to set two configurations with the same key but different >>>>>>>> descriptions, I think it is OK. >>>>>>>> >>>>>>>> Re: 3 >>>>>>>> This is actually kind of related to 1. i.e. Whether toConfiguration() >>>>>>>> guarantees the exact same object can be rebuilt from the >>>>>>>> configuration or >>>>>>>> not. I am still not quite sure about the use case of >>> toConfiguration() >>>>>>>> though. It seems indicating the Configurable is mutable, which might >>> be >>>>>>>> dangerous. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Jiangjie (Becket) Qin >>>>>>>> >>>>>>>> On Fri, Aug 30, 2019 at 10:04 PM Timo Walther<[hidden email]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Becket, >>>>>>>>> >>>>>>>>> 1. First of all, you are totally right. The FLIP contains a bug due >>> to >>>>>>>>> the last minute changes that Dawid suggested: by having immutable >>>>>>>>> objects created by a factory we loose the serializability of the >>>>>>>>> Configuration because the factory itself is not stored in the >>>>>>>>> Configuration. I would propose to revert the last change and stick >>> to >>>>>>>>> the original design, which means that a object must implement the >>>>>>>>> Configurable interface and also implements >>>>>>>>> serialization/deserialization >>>>>>>>> methods such that also internal fields can be persisted as you >>>>>>>>> suggested. But in general, people should not use any internal >>> fields. >>>>>>>>> Configurable objects are meant for simple little helper POJOs, not >>>>>>>>> complex arbitrary nested data structures. >>>>>>>>> >>>>>>>>> It is Map<String, Object> because Configuration stores the raw >>> objects. >>>>>>>>> If you put a Boolean option into it, it remains Boolean. This makes >>> the >>>>>>>>> map very efficient for shipping to the cluster and accessing options >>>>>>>>> multiple times. The same for configurable objects. We put the pure >>>>>>>>> objects into the map without any serialization/deserialization. The >>>>>>>>> provided factory allows to convert the Object into a Configuration >>> and >>>>>>>>> we know how to serialize/deserializise a configuration because it is >>>>>>>>> just a key/value map. >>>>>>>>> >>>>>>>>> 2. Yes, this is what we had in mind. It should still be the same >>>>>>>>> configuration option. We would like to avoid specialized option keys >>>>>>>>> across components (exec.max-para and table.exec.max-para) if they >>> are >>>>>>>>> describing basically the same thing. But adding some more >>> description >>>>>>>>> like "TableOptions.MAX_PARALLELISM with description_1 + >>> description_2" >>>>>>>>> does not hurt. >>>>>>>>> >>>>>>>>> 3. They should restore the original object given that the >>>>>>>>> toConfiguration/fromConfiguration methods have been implemented >>>>>>>>> correctly. I will extend the example to make the logic clearer while >>>>>>>>> fixing the bug. >>>>>>>>> >>>>>>>>> Thanks for the healthy discussion, >>>>>>>>> Timo >>>>>>>>> >>>>>>>>> >>>>>>>>> On 30.08.19 15:29, Becket Qin wrote: >>>>>>>>>> Hi Timo, >>>>>>>>>> >>>>>>>>>> Thanks again for the clarification. Please see a few more questions >>>>>>>>> below. >>>>>>>>>> Re: 1 >>>>>>>>>> >>>>>>>>>>> Please also keep in mind that Configuration must not consist of >>> only >>>>>>>>>>> strings, it manages a Map<String, Object> for efficient access. >>> Every >>>>>>>>>>> map entry can have a string representation for persistence, but in >>>>>>>>>>> most >>>>>>>>>>> cases consists of unserialized objects. >>>>>>>>>> I'd like to understand this a bit more. The reason we have a >>>>>>>>>> Map<String, >>>>>>>>>> Object> in Configuration was because that Object could be either a >>>>>>>>> String, >>>>>>>>>> a List or a Map, right? But they eventually always boil down to >>>>>>>>>> Strings, >>>>>>>>> or >>>>>>>>>> maybe one of the predefined type that we know how to serialize. In >>> the >>>>>>>>>> current design, can the Object also be Configurable? >>>>>>>>>> If the value in the config Map<String, Object> can be Configurable >>>>>>>>> objects, >>>>>>>>>> how do we serialize them? Calling toConfiguration() seems not quite >>>>>>>>> working >>>>>>>>>> because there might be some other internal fields that are not >>> part of >>>>>>>>> the >>>>>>>>>> configuration. The modification to those fields will be lost if we >>>>>>>>>> simply >>>>>>>>>> use toConfiguration(). So the impact of modifying those >>> Configurable >>>>>>>>>> objects seems a little undefined. And it would be difficult to >>> prevent >>>>>>>>>> users from doing that. >>>>>>>>>> If the value in the config Map<String, Object> cannot be >>> Configurable >>>>>>>>>> objects, then it seems a little weird to have all the other >>> ConfigType >>>>>>>>>> stored in the ConfigMap in their own native type and accessed via >>>>>>>>>> getInteger() / getBoolean(), etc, while having ConfigurableType to >>> be >>>>>>>>>> different from others because one have to use ConfigurableFactory >>>>>>>>>> to get >>>>>>>>>> the configurations. >>>>>>>>>> >>>>>>>>>> Re: 2 >>>>>>>>>> >>>>>>>>>>> I think about the withExtendedDescription as a helper getter in a >>>>>>>>>>> different place, so that the option is easier to find in a >>> different >>>>>>>>>>> module from it was defined. >>>>>>>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>>>>>>>> equal >>>>>>>>> to: >>>>>>>>>>> public ConfigOption getMaxParallelismOption() { >>>>>>>>>>> return CoreOptions.MAX_PARALLELISM; >>>>>>>>>>> } >>>>>>>>>> Just to make sure I understand it correctly, does that mean users >>> will >>>>>>>>> see >>>>>>>>>> something like following? >>>>>>>>>> - CoreOptions.MAX_PARALLELISM with description_1; >>>>>>>>>> - TableOptions.MAX_PARALLELISM with description_1 + >>> description_2. >>>>>>>>>> - DataStreamOptions.MAX_PARALLELISM with description_1 + >>>>>>>>>> description_3. >>>>>>>>>> And users will only configure exactly one MAX_PARALLELISM cross the >>>>>>>>> board. >>>>>>>>>> So they won't be confused by setting two MAX_PARALLELISM config for >>>>>>>>>> two >>>>>>>>>> different modules, while they are actually the same config. If >>> that is >>>>>>>>> the >>>>>>>>>> case, I don't have further concern. >>>>>>>>>> >>>>>>>>>> Re: 3 >>>>>>>>>> Maybe I am thinking too much. I thought toBytes() / fromBytes() >>>>>>>>>> actually >>>>>>>>>> restore the original Object. But fromConfiguration() and >>>>>>>>> toConfiguration() >>>>>>>>>> does not do that, anything not in the configuration of the original >>>>>>>>> object >>>>>>>>>> will be lost. So it would be good to make that clear. Maybe a clear >>>>>>>>>> Java >>>>>>>>>> doc is sufficient. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>> >>>>>>>>>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz >>>>>>>>>> <[hidden email] >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Ad. 1 >>>>>>>>>>> >>>>>>>>>>> The advantage of our approach is that you have the type definition >>>>>>>>>>> close >>>>>>>>>>> to the option definition. The only difference is that it enables >>>>>>>>>>> expressing simple pojos with the primitives like >>>>>>>>>>> ConfigOption<Integer>, >>>>>>>>>>> ConfigOption<Long> etc. Otherwise as Timo said you will start >>> having >>>>>>>>>>> the parsing logic scattered everywhere in the code base as it is >>> now. >>>>>>>>>>> The string representation in our proposal is exactly the same as >>> you >>>>>>>>>>> explained for those three options. The only difference is that you >>>>>>>>>>> don't >>>>>>>>>>> have to parse the elements of a List, Map etc. afterwards. >>>>>>>>>>> >>>>>>>>>>> Ad. 2 >>>>>>>>>>> >>>>>>>>>>> I think about the withExtendedDescription as a helper getter in a >>>>>>>>>>> different place, so that the option is easier to find in a >>> different >>>>>>>>>>> module from it was defined. >>>>>>>>>>> >>>>>>>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>>>>>>>> equal >>>>>>>>> to: >>>>>>>>>>> public ConfigOption getMaxParallelismOption() { >>>>>>>>>>> >>>>>>>>>>> return CoreOptions.MAX_PARALLELISM; >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> This allows to further clarify the description of the option in >>> the >>>>>>>>>>> context of a different module and end up in a seperate page in >>>>>>>>>>> documentation (but with a link to the original one). In the end >>> it is >>>>>>>>>>> exactly the same option. It has exactly same key, type, parsing >>>>>>>>>>> logic, >>>>>>>>>>> it is in the end forwarded to the same place. >>>>>>>>>>> >>>>>>>>>>> Ad. 3 >>>>>>>>>>> >>>>>>>>>>> Not sure if I understand your concerns here. As Timo said it is in >>>>>>>>>>> the >>>>>>>>>>> end sth similar to toBytes/fromBytes, but it puts itself to a >>>>>>>>>>> Configuration. Also just wanted to make sure we adjusted this part >>>>>>>>>>> slightly and now the ConfigOption takes ConfigurableFactory. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Dawid >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 30/08/2019 09:39, Timo Walther wrote: >>>>>>>>>>>> Hi Becket, >>>>>>>>>>>> >>>>>>>>>>>> thanks for the discussion. >>>>>>>>>>>> >>>>>>>>>>>> 1. ConfigOptions in their current design are bound to classes. >>>>>>>>>>>> Regarding, the option is "creating some Configurable objects >>> instead >>>>>>>>>>>> of defining the config to create >>>>>>>>>>>> those Configurable"? We just moved this logic to a factory, this >>>>>>>>>>>> factory can then also be used for other purposes. However, how >>> the >>>>>>>>>>>> option and objects are serialized to Configuration is still not >>> part >>>>>>>>>>>> of the option. The option is just pure declaration. >>>>>>>>>>>> >>>>>>>>>>>> If we would allow only List<String>, implementers would need to >>>>>>>>>>>> start >>>>>>>>>>>> implementing own parsing and validation logic all the time. We >>> would >>>>>>>>>>>> like to avoid that. >>>>>>>>>>>> >>>>>>>>>>>> Please also keep in mind that Configuration must not consist of >>> only >>>>>>>>>>>> strings, it manages a Map<String, Object> for efficient access. >>>>>>>>>>>> Every >>>>>>>>>>>> map entry can have a string representation for persistence, but >>> in >>>>>>>>>>>> most cases consists of unserialized objects. >>>>>>>>>>>> >>>>>>>>>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite >>>>>>>>>>>> keys, types or default values. But different layers might want to >>>>>>>>>>>> add >>>>>>>>>>>> helpful information. In our concrete use case for FLIP-59, >>>>>>>>>>>> ExecutionConfig has 50 properties and many of them are not >>> relevant >>>>>>>>>>>> for the Table layer or have no effect at all. We would like to >>> list >>>>>>>>>>>> and mention the most important config options again in the Table >>>>>>>>>>>> Configuration section, so that users are not confused, but with a >>>>>>>>>>>> strong link to the core option. E.g.: registered kryo serializers >>>>>>>>>>>> are >>>>>>>>>>>> also important also for Table users, we would like to add the >>>>>>>>>>>> comment >>>>>>>>>>>> "This option allows to modify the serialization of the ANY SQL >>> data >>>>>>>>>>>> type.". I think we should not spam the core configuration page >>> with >>>>>>>>>>>> comments from all layers, connectors, or libraries but keep this >>> in >>>>>>>>>>>> the corresponding component documentation. >>>>>>>>>>>> >>>>>>>>>>>> 3. But it is something like fromBytes() and toBytes()? It >>> serializes >>>>>>>>>>>> and deserializes T from a configuration? >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Timo >>>>>>>>>>>> >>>>>>>>>>>> On 29.08.19 19:14, Becket Qin wrote: >>>>>>>>>>>>> Hi Timo and Stephan, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the detail explanation. >>>>>>>>>>>>> >>>>>>>>>>>>> 1. I agree that each config should be in a human readable >>>>>>>>>>>>> format. My >>>>>>>>>>>>> concern is that the current List<Configurable> looks going a >>> little >>>>>>>>>>>>> too far >>>>>>>>>>>>> from what the configuration is supposed to do. They are >>> essentially >>>>>>>>>>>>> creating some Configurable objects instead of defining the >>>>>>>>>>>>> config to >>>>>>>>>>>>> create >>>>>>>>>>>>> those Configurable. This mixes ConfigOption and the usage of it. >>>>>>>>>>>>> API >>>>>>>>>>>>> wise >>>>>>>>>>>>> it would be good to keep the configs and their usages (such as >>>>>>>>>>>>> how to >>>>>>>>>>>>> create objects using the ConfigOption) apart from each other. >>>>>>>>>>>>> I am wondering if we can just make List also only take string. >>> For >>>>>>>>>>>>> example, >>>>>>>>>>>>> is the following definition of map and list sufficient? >>>>>>>>>>>>> >>>>>>>>>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can >>> be >>>>>>>>>>>>> defined >>>>>>>>>>>>> as: >>>>>>>>>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ... >>>>>>>>>>>>> >>>>>>>>>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be >>> defined >>>>>>>>> as: >>>>>>>>>>>>> list_config_name: v1, v2, v3, ... >>>>>>>>>>>>> >>>>>>>>>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, >>>>>>>>>>>>> String>>. It >>>>>>>>>>>>> can >>>>>>>>>>>>> be defined as: >>>>>>>>>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... >>>>>>>>>>>>> >>>>>>>>>>>>> All the key and values in the configuration are String. This >>> also >>>>>>>>>>>>> guarantees that the configuration is always serializable. >>>>>>>>>>>>> If we want to do one more step, we can allow the ConfigOption to >>>>>>>>>>>>> set >>>>>>>>> all >>>>>>>>>>>>> the primitive types and parse that for the users. So something >>> like >>>>>>>>>>>>> List<Integer>, List<Class<?>> seems fine. >>>>>>>>>>>>> >>>>>>>>>>>>> The configuration class could also have util methods to create a >>>>>>>>>>>>> list >>>>>>>>> of >>>>>>>>>>>>> configurable such as: >>>>>>>>>>>>> <T> List<T> >>>>>>>>> <Configuration#getConfigurableInstances(ListMapConfigOption, >>>>>>>>>>>>> Class<T> clazz). >>>>>>>>>>>>> But the configuration class will not take arbitrary >>> Configurable as >>>>>>>>> the >>>>>>>>>>>>> value of its config. >>>>>>>>>>>>> >>>>>>>>>>>>> 2. I might have misunderstood this. But my concern on the >>>>>>>>>>>>> description >>>>>>>>>>>>> extension is in the following example. >>>>>>>>>>>>> >>>>>>>>>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM = >>>>>>>>>>>>> >>>>>>>>>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( >>>>>>>>>>>>> "Note: That this property means that a table program has a >>>>>>>>>>>>> side-effect >>>>>>>>>>>>> XYZ."); >>>>>>>>>>>>> >>>>>>>>>>>>> In this case, we will have two MAX_PARALLELISM configs now. One >>> is >>>>>>>>>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I >>>>>>>>>>>>> suppose >>>>>>>>>>>>> users >>>>>>>>>>>>> will see both configurations. One with an extended description >>>>>>>>>>>>> and one >>>>>>>>>>>>> without. Let's say there is a third component which also users >>>>>>>>>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM >>>>>>>>>>>>> ConfigOption? If >>>>>>>>>>>>> so, what would that ConfigOption's description look like? >>>>>>>>>>>>> >>>>>>>>>>>>> Ideally, we would want to have just one >>> CoreOptions.MAX_PARALLELISM >>>>>>>>>>>>> and the >>>>>>>>>>>>> description should clearly state all the usage of this >>>>>>>>>>>>> ConfigOption. >>>>>>>>>>>>> >>>>>>>>>>>>> 3. I see, in that case, how about we name it something like >>>>>>>>>>>>> extractConfiguration()? I am just trying to see if we can make >>> it >>>>>>>>> clear >>>>>>>>>>>>> this is not something like fromBytes() and toBytes(). >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther < >>> [hidden email]> >>>>>>>>>>> wrote: >>>>>>>>>>>>>> Hi Becket, >>>>>>>>>>>>>> >>>>>>>>>>>>>> let me try to clarify some of your questions: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. For every option, we also needed to think about how to >>>>>>>>>>>>>> represent >>>>>>>>> it >>>>>>>>>>>>>> in a human readable format. We do not want to allow arbitrary >>>>>>>>>>>>>> nesting >>>>>>>>>>>>>> because that would easily allow to bypass the flattened >>>>>>>>>>>>>> hierarchy of >>>>>>>>>>>>>> config options (`session.memory.min`). The current design >>>>>>>>>>>>>> allows to >>>>>>>>>>>>>> represent every option type as a list. E.g.: >>>>>>>>>>>>>> >>>>>>>>>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12` >>>>>>>>>>>>>> `myObjectOption: field=12,other=true` can be >>> `myObjectListOption: >>>>>>>>>>>>>> field=12,other=true; field=12,other=true` >>>>>>>>>>>>>> `myPropertyOption: key=str0,other=str1` can be >>>>>>>>>>>>>> `myPropertyListOption: >>>>>>>>>>>>>> key=str0,other=str1;key=str0,other=str1` >>>>>>>>>>>>>> >>>>>>>>>>>>>> We need the atomic class for serialization/deserialization >>> both in >>>>>>>>>>>>>> binary and string format. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ConfigOption<List> is not present in the code base yet, but >>> this >>>>>>>>>>>>>> FLIP is >>>>>>>>>>>>>> a preparation of making ExecutionConfig configurable. If you >>> look >>>>>>>>> into >>>>>>>>>>>>>> this class or also in existing table connectors/formats, you >>>>>>>>>>>>>> will see >>>>>>>>>>>>>> that each proposed option type has its requirements. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2. Regarding extending the description of ConfigOptions, the >>>>>>>>>>>>>> semantic of >>>>>>>>>>>>>> one option should be a super set of the other option. E.g. in >>>>>>>>>>>>>> Table >>>>>>>>> API >>>>>>>>>>>>>> we might use general ExecutionConfig properties. But we would >>> like >>>>>>>>>>>>>> to a) >>>>>>>>>>>>>> make external options more prominent in the Table API config >>>>>>>>>>>>>> docs to >>>>>>>>>>>>>> link people to properties they should pay attention b) notice >>>>>>>>>>>>>> about >>>>>>>>>>>>>> side >>>>>>>>>>>>>> effects. The core semantic of a property should not change. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3. The factory will not receive the entire configuration but >>> works >>>>>>>>> in a >>>>>>>>>>>>>> separate key space. For `myObjectOption` above, it would >>> receive a >>>>>>>>>>>>>> configuration that consists of `field: 12` and `other: true`. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree. I will convert the document into a Wiki page today. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Timo >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 29.08.19 09:00, Stephan Ewen wrote: >>>>>>>>>>>>>>> @Becket One thing that may be non-obvious is that the >>>>>>>>>>>>>>> Configuration >>>>>>>>>>>>>>> class >>>>>>>>>>>>>>> also defines serialization / persistence logic at the moment. >>>>>>>>>>>>>>> So it >>>>>>>>>>>>>>> needs >>>>>>>>>>>>>>> to know the set of types it supports. That stands in the way >>>>>>>>>>>>>>> of an >>>>>>>>>>>>>>> arbitrary generic map type. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @Timo I agree though that it seems a bit inconsistent to have >>> one >>>>>>>>>>>>>>> collection orthogonal to the type (List) and another one >>> bound to >>>>>>>>>>>>>> specific >>>>>>>>>>>>>>> types (Map). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin < >>> [hidden email] >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I >>>>>>>>>>>>>>>> have a >>>>>>>>>>>>>>>> few >>>>>>>>>>>>>>>> questions / comments. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption? >>>>>>>>>>>>>>>> Would it be enough to just check the atomicClass to see if it >>>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>>> List >>>>>>>>>>>>>> or >>>>>>>>>>>>>>>> not? >>>>>>>>>>>>>>>> Also, in the ConfigOption<Map> class case, are we always >>> assume >>>>>>>>>>>>>>>> both key >>>>>>>>>>>>>>>> and value types are String? Can we just apply the same to the >>>>>>>>>>>>>>>> ConfigOption<List>? >>>>>>>>>>>>>>>> BTW, I did a quick search in the codebase but did not find >>> any >>>>>>>>>>>>>>>> usage of >>>>>>>>>>>>>>>> ConfigOption<List>. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2. The same config name, but with two ConfigOption with >>>>>>>>>>>>>>>> different >>>>>>>>>>>>>> semantic >>>>>>>>>>>>>>>> in different component seems super confusing. For example, >>> when >>>>>>>>> users >>>>>>>>>>>>>> set >>>>>>>>>>>>>>>> both configs, they may have no idea one is overriding the >>> other. >>>>>>>>>>>>>>>> There >>>>>>>>>>>>>>>> might be two cases: >>>>>>>>>>>>>>>> - If it is just the same config used by different >>>>>>>>>>>>>>>> components to >>>>>>>>>>>>>>>> act >>>>>>>>>>>>>>>> accordingly, it might be better to just have one config, but >>>>>>>>> describe >>>>>>>>>>>>>>>> clearly on how that config will be used. >>>>>>>>>>>>>>>> - If it is actually two configurations that can be set >>>>>>>>>>>>>>>> differently, I >>>>>>>>>>>>>>>> think the config names should just be different. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3. Regarding the ConfigurableFactory, is the >>> toConfiguration() >>>>>>>>> method >>>>>>>>>>>>>>>> pretty much means getConfiguration()? The toConfiguration() >>>>>>>>>>>>>>>> method >>>>>>>>>>>>>> sounds >>>>>>>>>>>>>>>> like converting an object to a configuration, which only >>>>>>>>>>>>>>>> works if >>>>>>>>> the >>>>>>>>>>>>>>>> object does not contain any state / value. I am also >>>>>>>>>>>>>>>> wondering if >>>>>>>>>>>>>>>> there >>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> a real use case of this method. Because supposedly the >>>>>>>>> configurations >>>>>>>>>>>>>> could >>>>>>>>>>>>>>>> just be passed around to caller of this method. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of >>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>> doc before voting? The FLIP wiki allows track the >>> modification >>>>>>>>>>>>>>>> history >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> has a more established structure to ensure nothing is missed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther >>>>>>>>>>>>>>>> <[hidden email]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I updated the FLIP proposal one more time as mentioned in >>> the >>>>>>>>> voting >>>>>>>>>>>>>>>>> thread. If there are no objections, I will start a new >>> voting >>>>>>>>> thread >>>>>>>>>>>>>>>>> tomorrow at 9am Berlin time. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote: >>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> thanks for all the feedback we have received online and >>>>>>>>>>>>>>>>>> offline. >>>>>>>>> It >>>>>>>>>>>>>>>>>> showed that many people support the idea of evolving the >>> Flink >>>>>>>>>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP >>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>> solve all issues but at least will improve the current >>> status. >>>>>>>>>>>>>>>>>> We've updated the document and replaced the Correlation >>> part >>>>>>>>>>>>>>>>>> with the >>>>>>>>>>>>>>>>>> concept of a ConfigOptionGroup that can provide all >>> available >>>>>>>>>>>>>>>>>> options >>>>>>>>>>>>>>>>>> of a group plus custom group validators for eager >>> validation. >>>>>>>>>>>>>>>>>> For now, >>>>>>>>>>>>>>>>>> this eager group validation will only be used at certain >>>>>>>>>>>>>>>>>> locations in >>>>>>>>>>>>>>>>>> the Flink code but it prepares for maybe validating the >>> entire >>>>>>>>>>>>>>>>>> global >>>>>>>>>>>>>>>>>> configuration before submitting a job in the future. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Please take another look if you find time. I hope we can >>>>>>>>>>>>>>>>>> proceed >>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>> the voting process if there are no objections. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: >>>>>>>>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> thanks for your suggestions. Let me give you some >>> background >>>>>>>>> about >>>>>>>>>>>>>>>>>>> the decisions made in this FLIP: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" >>>>>>>>>>>>>>>>>>> because we >>>>>>>>> did >>>>>>>>>>>>>>>>>>> not want to change the entire configuration >>> infrastructure. >>>>>>>>>>>>>>>>>>> Both for >>>>>>>>>>>>>>>>>>> backwards-compatibility reasons and the amount of work >>> that >>>>>>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>>>>>> required to update all options. If our goal is to rework >>> the >>>>>>>>>>>>>>>>>>> configuration option entirely, I might suggest to switch >>>>>>>>>>>>>>>>>>> to JSON >>>>>>>>>>>>>>>>>>> format with JSON schema and JSON validator. However, >>> setting >>>>>>>>>>>>>>>>>>> properties in a CLI or web interface becomes more tricky >>> the >>>>>>>>> more >>>>>>>>>>>>>>>>>>> nested structures are allowed. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class >>> is >>>>>>>>>>>>>>>>>>> centered >>>>>>>>>>>>>>>>>>> around Java classes where T is determined by the default >>>>>>>>>>>>>>>>>>> value. >>>>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit >>>>>>>>>>>>>>>>>>> `intType()` method etc. The current design of validators >>>>>>>>> centered >>>>>>>>>>>>>>>>>>> around Java classes makes it possible to have typical >>> domain >>>>>>>>>>>>>>>>>>> validators baked by generics as you suggested. If we >>>>>>>>>>>>>>>>>>> introduce >>>>>>>>>>>>>>>>>>> types >>>>>>>>>>>>>>>>>>> such as "quantity with measure and unit" we still need to >>>>>>>>>>>>>>>>>>> get a >>>>>>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>>>> out of this option at the end, so why changing a proven >>>>>>>>>>>>>>>>>>> concept? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary >>>>>>>>>>>>>>>>>>> nesting. As >>>>>>>>>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For >>> every >>>>>>>>>>>>>>>>>>> atomic >>>>>>>>>>>>>>>>>>> option like "key=12" can be represented by a list >>>>>>>>>>>>>>>>>>> "keys=12;13". >>>>>>>>>>>>>>>>>>> But >>>>>>>>>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated >>>>>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>>>>> option >>>>>>>>>>>>>>>>>>> would start making this more complicated such as >>>>>>>>>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), >>>>>>>>>>>>>>>>>>> StringOption(...)))", do we want that? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 4. Correlation: The correlation part is one of the >>>>>>>>>>>>>>>>>>> suggestions >>>>>>>>>>>>>>>>>>> that I >>>>>>>>>>>>>>>>>>> like least in the document. We can also discuss removing >>> it >>>>>>>>>>>>>>>>>>> entirely, >>>>>>>>>>>>>>>>>>> but I think it solves the use case of relating options >>>>>>>>>>>>>>>>>>> with each >>>>>>>>>>>>>>>>>>> other in a flexible way right next to the actual option. >>>>>>>>>>>>>>>>>>> Instead of >>>>>>>>>>>>>>>>>>> being hidden in some component initialization, we should >>>>>>>>>>>>>>>>>>> put it >>>>>>>>>>>>>>>>>>> close >>>>>>>>>>>>>>>>>>> to the option to also perform validation eagerly instead >>> of >>>>>>>>>>>>>>>>>>> failing >>>>>>>>>>>>>>>>>>> at runtime when the option is accessed the first time. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>>>>>>>>>>>>>>>>>>> A "List Type" sounds like a good direction to me. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. >>> The >>>>>>>>>>>>>>>>>>>> idea is >>>>>>>>>>>>>>>>>>>> to see >>>>>>>>>>>>>>>>>>>> if something like that can ease validation. Especially >>> the >>>>>>>>>>>>>>>> correlation >>>>>>>>>>>>>>>>>>>> system seems quite complex (proxies to work around order >>> of >>>>>>>>>>>>>>>>>>>> initialization). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> For example, let's assume we don't think primarily about >>>>>>>>>>>>>>>>>>>> "java >>>>>>>>>>>>>>>>>>>> types" but >>>>>>>>>>>>>>>>>>>> would define types as one of the following (just >>> examples, >>>>>>>>>>>>>>>>>>>> haven't >>>>>>>>>>>>>>>>>>>> thought >>>>>>>>>>>>>>>>>>>> all the details through): >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> (a) category type: implies string, and a fix set >>> of >>>>>>>>> possible >>>>>>>>>>>>>>>> values. >>>>>>>>>>>>>>>>>>>> Those would be passes and naturally make it into the docs >>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>> validation. >>>>>>>>>>>>>>>>>>>> Maps to a String or Enum in Java. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> (b) numeric integer type: implies long (or >>> optionally >>>>>>>>>>>>>>>>>>>> integer, >>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>> we want >>>>>>>>>>>>>>>>>>>> to automatically check overflow / underflow). would take >>>>>>>>> typical >>>>>>>>>>>>>>>> domain >>>>>>>>>>>>>>>>>>>> validators, like non-negative, etc. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> (c) numeric real type: same as above (double or >>>>>>>>>>>>>>>>>>>> float) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> (d) numeric interval type: either defined as an >>>>>>>>> interval, or >>>>>>>>>>>>>>>>>>>> references >>>>>>>>>>>>>>>>>>>> other parameter by key. validation by valid interval. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> (e) quantity: a measure and a unit. separately >>>>>>>>>>>>>>>>>>>> parsable. >>>>>>>>> The >>>>>>>>>>>>>>>>>>>> measure's >>>>>>>>>>>>>>>>>>>> type could be any of the numeric types above, with same >>>>>>>>>>>>>>>>>>>> validation >>>>>>>>>>>>>>>>>>>> rules. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> With a system like the above, would we still correlation >>>>>>>>>>>>>>>>>>>> validators? >>>>>>>>>>>>>>>>>>>> Are >>>>>>>>>>>>>>>>>>>> there still cases that we need to catch early (config >>>>>>>>>>>>>>>>>>>> loading) >>>>>>>>> or >>>>>>>>>>>>>>>>>>>> are the >>>>>>>>>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup >>>>>>>>> specific, >>>>>>>>>>>>>>>>>>>> that it is >>>>>>>>>>>>>>>>>>>> fine to handle them in component initialization? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>>>>>>>>>>>>>>>>>>> <[hidden email]> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank you for your opinion. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Actually list/composite types are the topics we spent >>> the >>>>>>>>>>>>>>>>>>>>> most of >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> time. I understand that from a perspective of a full >>> blown >>>>>>>>> type >>>>>>>>>>>>>>>>>>>>> system, >>>>>>>>>>>>>>>>>>>>> a field like isList may look weird. Please let me >>>>>>>>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear >>>>>>>>>>>>>>>>>>>>> enough >>>>>>>>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options >>> is >>>>>>>>>>>>>>>>>>>>> that they >>>>>>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>>>>>> have a string representation as they might come from a >>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>>>>>> file. Moreover it must be a human readable format, so >>>>>>>>>>>>>>>>>>>>> that the >>>>>>>>>>>>>>>> values >>>>>>>>>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did >>> not >>>>>>>>>>>>>>>>>>>>> want to >>>>>>>>>>>>>>>>>>>>> add a >>>>>>>>>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow >>> for >>>>>>>>>>>>>>>>>>>>> lists >>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>>>>> (and flat objects - I think though in the current design >>>>>>>>>>>>>>>>>>>>> there is a >>>>>>>>>>>>>>>>>>>>> mistake around the Configurable interface). I think >>> though >>>>>>>>>>>>>>>>>>>>> you have >>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>> point here and it would be better to have a >>>>>>>>>>>>>>>>>>>>> ListConfigOption >>>>>>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>>>>>> this field. Does it make sense to you? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> As for the second part of your message. I am not sure >>> if I >>>>>>>>>>>>>>>> understood >>>>>>>>>>>>>>>>>>>>> it. The validators work with parse/deserialized values >>> from >>>>>>>>>>>>>>>>>>>>> Configuration that means they can be bound to the >>> generic >>>>>>>>>>>>>>>>>>>>> parameter >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends >>>>>>>>>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in >>> the >>>>>>>>>>>>>>>>>>>>> ConfigOption >>>>>>>>>>>>>>>>>>>>> has anything to do with the validation logic. Could you >>>>>>>>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>>>>>>>> more what did you mean? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Dawid >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>>>>>>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to >>> do >>>>>>>>> early >>>>>>>>>>>>>>>>>>>>> validation. >>>>>>>>>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad >>>>>>>>>>>>>>>>>>>>>> hoc, >>>>>>>>>>>>>>>>>>>>>> though. For >>>>>>>>>>>>>>>>>>>>>> example, having a boolean "isList" is a clear >>>>>>>>>>>>>>>>>>>>>> indication of >>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>> having >>>>>>>>>>>>>>>>>>>>>> thought through the type/category system. >>>>>>>>>>>>>>>>>>>>>> Also, having a more clear category system makes >>> validation >>>>>>>>>>>>>> simpler. >>>>>>>>>>>>>>>>>>>>>> For example, I have seen systems distinguishing between >>>>>>>>> numeric >>>>>>>>>>>>>>>>>>>>> parameters >>>>>>>>>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible >>>>>>>>>>>>>>>>>>>>>> values), >>>>>>>>>>>>>>>>>>>>>> quantities >>>>>>>>>>>>>>>>>>>>>> like duration and memory size (need measure and unit), >>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>> results in >>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>> elegant system for validation. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < >>>>>>>>>>>>>>>> [hidden email] >>>>>>>>>>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>>>>>>>>>>>>>>>>>>> This allows the currently cluttered configuration of >>>>>>>>>>>>>>>>>>>>>>> various >>>>>>>>>>>>>>>>>>>>>>> modules to be unified. >>>>>>>>>>>>>>>>>>>>>>> This is also first step of one of the keys to making >>> new >>>>>>>>>>>>>>>>>>>>>>> unified >>>>>>>>>>>>>>>>>>>>>>> TableEnvironment available for production. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Previously, we did encounter complex configurations, >>>>>>>>>>>>>>>>>>>>>>> such as >>>>>>>>>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The >>>>>>>>>>>>>>>>>>>>>>> skew may >>>>>>>>>>>>>>>>>>>>>>> be a single field or a combination of multiple >>>>>>>>>>>>>>>>>>>>>>> fields. >>>>>>>>>>>>>>>>>>>>>>> So the >>>>>>>>>>>>>>>>>>>>>>> configuration is very troublesome. We used JSON >>>>>>>>>>>>>>>>>>>>>>> string >>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> configure it. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>> Jingsong Lee >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>> ------------------------------------------------------------------ >>>>>>>>>>>>>>>>>>>>>>> From:Jark Wu<[hidden email]> >>>>>>>>>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>>>>>>>>>>>>>>>>>>> To:dev<[hidden email]> >>>>>>>>>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and >>>>>>>>>>>>>>>> Configuration >>>>>>>>>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind >>> for a >>>>>>>>> long >>>>>>>>>>>>>>>> time. >>>>>>>>>>>>>>>>>>>>>>> We have seen the benefit when developing blink >>>>>>>>>>>>>>>>>>>>>>> configurations and >>>>>>>>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>>>>>>>> properties in 1.9 release. >>>>>>>>>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed >>>>>>>>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>>>>>>>> I will leave my thoughts and comments there. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen < >>>>>>>>> [hidden email] >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution >>> which >>>>>>>>>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of >>>>>>>>>>>>>>>>>>>>>>>> Flink. >>>>>>>>>>>>>>>>>>>>>>>> In client API enhancement, we likely make the >>> deployment >>>>>>>>>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by >>>>>>>>>>>>>>>> configuration. >>>>>>>>>>>>>>>>>>>>>>>> Will take a look at the document in days. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>> tison. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Timo Walther<[hidden email]> 于2019年8月16日周五 >>>>>>>>>>>>>>>>>>>>>>>> 下午10:12写道: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionConfig and >>>>>>>>>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is >>>>>>>>>>>>>>>>>>>>>>>>> necessary >>>>>>>>>>>>>>>>>>>>>>>>> to make >>>>>>>>>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, >>>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>> new SQL >>>>>>>>>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>>>>>>>>> based on properties as well as more connectors and >>>>>>>>>>>>>>>>>>>>>>>>> formats >>>>>>>>>>>>>>>>>>>>>>>>> coming up, >>>>>>>>>>>>>>>>>>>>>>>>> unified configuration becomes more important. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> We need more features around string-based >>> configuration >>>>>>>>>>>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>>>>>>>>> future, >>>>>>>>>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose >>>>>>>>>>>>>>>>>>>>>>>>> FLIP-54 for >>>>>>>>>>>>>>>>>>>>>>>>> evolving >>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>>>>>>>>>>>>>>>>>>>>>>>> In summary it adds: >>>>>>>>>>>>>>>>>>>>>>>>> - documented types and validation >>>>>>>>>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, >>> list >>>>>>>>>>>>>>>>>>>>>>>>> - simple non-nested object types >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> |
In reply to this post by Timo Walther-2
> with the new SQL DDL
based on properties as well as more connectors and formats coming up, unified configuration becomes more important I Cann’t agree more, do you think we can unify the config options key format here for all the DDL properties ? Best, Danny Chan 在 2019年8月16日 +0800 PM10:12,[hidden email],写道: > > with the new SQL DDL > based on properties as well as more connectors and formats coming up, > unified configuration becomes more important |
Hi Danny,
yes, this FLIP covers all the building blocks we need also for unification of the DDL properties. Regards, Timo On 03.09.19 13:45, Danny Chan wrote: >> with the new SQL DDL > based on properties as well as more connectors and formats coming up, > unified configuration becomes more important > > I Cann’t agree more, do you think we can unify the config options key format here for all the DDL properties ? > > Best, > Danny Chan > 在 2019年8月16日 +0800 PM10:12,[hidden email],写道: >> with the new SQL DDL >> based on properties as well as more connectors and formats coming up, >> unified configuration becomes more important |
Hi,
I think it’s important to keep in mind the original goals of this FLIP and not let the scope grow indefinitely. As I recall it, the goals are: - Extend the ConfigOption system enough to allow the Table API to configure options that are right now only available on CheckpointingOptions, ExecutionConfig, and StreamExecutionEnvironment. We also want to do this without manually having to “forward” all the available configuration options by introducing equivalent setters in the Table API - Do the above while keeping in mind that eventually we want to allow users to configure everything from either the flink-conf.yaml, vie command line parameters, or via a Configuration. I think the FLIP achieves this, with the added side goals of making validation a part of ConfigOptions, making them type safe, and making the validation constraints documentable (via automatic doc generation.) All this without breaking backwards compatibility, if I’m not mistaken. I think we should first agree what the basic goals are so that we can quickly converge to consensus on this FLIP because it blocks other people/work. Among other things FLIP-59 depends on this. What are other opinions that people have? I know Becket at least has some thoughts about immutability and loading objects via the configuration but maybe they could be put into a follow-up FLIP if they are needed. Also, I had one thought on the interaction of this FLIP-54 and FLIP-59 when it comes to naming. I think eventually it makes sense to have a common interface for things that are configurable from a Configuration (FLIP-59 introduces the first batch of this). It seems natural to call this interface Configurable. That’s a problem for this FLIP-54 because we also introduce a Configurable. Maybe the thing that we introduce here should be called ConfigObject or ConfigStruct to highlight that it has a more narrow focus and is really only a POJO for holding a bunch of config options that have to go together. What do you think? Best, Aljoscha > On 3. Sep 2019, at 14:08, Timo Walther <[hidden email]> wrote: > > Hi Danny, > > yes, this FLIP covers all the building blocks we need also for unification of the DDL properties. > > Regards, > Timo > > > On 03.09.19 13:45, Danny Chan wrote: >>> with the new SQL DDL >> based on properties as well as more connectors and formats coming up, >> unified configuration becomes more important >> >> I Cann’t agree more, do you think we can unify the config options key format here for all the DDL properties ? >> >> Best, >> Danny Chan >> 在 2019年8月16日 +0800 PM10:12,[hidden email],写道: >>> with the new SQL DDL >>> based on properties as well as more connectors and formats coming up, >>> unified configuration becomes more important > > |
Hi Timo, Dawid and Aljoscha,
Thanks for clarifying the goals. It is very helpful to understand the motivation here. It would be great to add them to the FLIP wiki. I agree that the current FLIP design achieves the two goals it wants to achieve. But I am trying to see is if the current approach is the most reasonable approach. Please let me check if I understand this correctly. From end users' perspective, they will do the following when they want to configure their Flink Jobs. 1. Create a Configuration instance, and call setters of Configuration with the ConfigOptions defined in different components. 2. The Configuration created in step 1 will be passed around, and each component will just exact their own options from it. 3. ExecutionConfig, CheckpointConfig (and other Config classes) will become a Configurable, which is responsible for extracting the configuration values from the Configuration set by users in step 1. The confusion I had was that in step 1, how users are going to set the configs for the ExecutionConfig / CheckpointConfig? There may be two ways: a) Users will call setConfigurable(ExectionConfigConfigurableOption, "config1:v1,config2:v2,config3:v3"), i.e. the entire ExecutionConfig is exposed as a Configurable to the users. b) Users will call setInteger(MAX_PARALLELISM, 1), setInteger(LATENCY_TRACKING_INTERVAL, 1000), etc.. This means users will set individual ConfigOptions for the ExecutionConfig. And they do not see ExecutionConfig as a Configurable. I assume we are following b), then do we need to expose Configurable to the users in this FLIP? My concern is that the Configurable may be related to other mechanism such as plugin which we have not really thought through in this FLIP. I know Becket at least has some thoughts about immutability and loading > objects via the configuration but maybe they could be put into a follow-up > FLIP if they are needed. I am perfectly fine to leave something out of the scope of this FLIP to later FLIPs. But I think it is important to avoid introducing something in this FLIP that will be shortly changed by the follow-up FLIPs. Thanks, Jiangjie (Becket) Qin On Tue, Sep 3, 2019 at 8:47 PM Aljoscha Krettek <[hidden email]> wrote: > Hi, > > I think it’s important to keep in mind the original goals of this FLIP and > not let the scope grow indefinitely. As I recall it, the goals are: > > - Extend the ConfigOption system enough to allow the Table API to > configure options that are right now only available on > CheckpointingOptions, ExecutionConfig, and StreamExecutionEnvironment. We > also want to do this without manually having to “forward” all the available > configuration options by introducing equivalent setters in the Table API > > - Do the above while keeping in mind that eventually we want to allow > users to configure everything from either the flink-conf.yaml, vie command > line parameters, or via a Configuration. > > I think the FLIP achieves this, with the added side goals of making > validation a part of ConfigOptions, making them type safe, and making the > validation constraints documentable (via automatic doc generation.) All > this without breaking backwards compatibility, if I’m not mistaken. > > I think we should first agree what the basic goals are so that we can > quickly converge to consensus on this FLIP because it blocks other > people/work. Among other things FLIP-59 depends on this. What are other > opinions that people have? I know Becket at least has some thoughts about > immutability and loading objects via the configuration but maybe they could > be put into a follow-up FLIP if they are needed. > > Also, I had one thought on the interaction of this FLIP-54 and FLIP-59 > when it comes to naming. I think eventually it makes sense to have a common > interface for things that are configurable from a Configuration (FLIP-59 > introduces the first batch of this). It seems natural to call this > interface Configurable. That’s a problem for this FLIP-54 because we also > introduce a Configurable. Maybe the thing that we introduce here should be > called ConfigObject or ConfigStruct to highlight that it has a more narrow > focus and is really only a POJO for holding a bunch of config options that > have to go together. What do you think? > > Best, > Aljoscha > > > On 3. Sep 2019, at 14:08, Timo Walther <[hidden email]> wrote: > > > > Hi Danny, > > > > yes, this FLIP covers all the building blocks we need also for > unification of the DDL properties. > > > > Regards, > > Timo > > > > > > On 03.09.19 13:45, Danny Chan wrote: > >>> with the new SQL DDL > >> based on properties as well as more connectors and formats coming up, > >> unified configuration becomes more important > >> > >> I Cann’t agree more, do you think we can unify the config options key > format here for all the DDL properties ? > >> > >> Best, > >> Danny Chan > >> 在 2019年8月16日 +0800 PM10:12,[hidden email],写道: > >>> with the new SQL DDL > >>> based on properties as well as more connectors and formats coming up, > >>> unified configuration becomes more important > > > > > > |
Hi Timo and Dawid,
I discussed this offline a little bit with Jingsong and want to double check with you on the followings 2 questions. Can you please help confirm? 1. How will user set the Configurations? Users will have two ways to set an ExecutionConfig. Option 1: a) Instantiate ExecutionConfig and call the setters in it. b) Then create an ExecutionConfigFactory instance to dump the ExecutionConfig into a Configuration instance. Option 2: a) calling Configuration.set(ExecutionConfig.MAX_PARALLELISM, 1) and so on, to set configs in the Configuration instance directly. If we must have option 1, I agree with Aljoscha that having a Configurable.writeConfiguration() is better. But if we consider the future use case of Configurable for Plugins, I would prefer only support option 2. Usually users will not first create a plugin instance then dump the configurations. Also, option 2 is more aligned with file based configuration from something like flink-config.yaml. 2. In the FLIP, will Configuration class hold Configurable objects in its internal Map<String, Object>? To create an ExecutionConfig from the Configuration instance, one would do the following: a) one will call Configuration.get(ExecutionConfigOption), where ExecutionConfigOption is a ConfigOption<ExecutionConfig> which has an ExecutionConfigFactory as the atomicClass. b) Create the ExecutionConfigFactory instance and call ExecutionConfigFactory.fromConfiguration() to create an ExecutionConfig instance. Will Configuration.get() method create a new ExecutionConfig and put it into the *confData* map Map<String, Object>? If not, the Configuration instance will only contain primitive types, List<PrimitiveType> or Map<String, String>. Then I have no concern on this part, because from Configuration instance's perspective, it is immutable. Thanks, Jiangjie (Becket) Qin On Wed, Sep 4, 2019 at 9:37 AM Becket Qin <[hidden email]> wrote: > Hi Timo, Dawid and Aljoscha, > > Thanks for clarifying the goals. It is very helpful to understand the > motivation here. It would be great to add them to the FLIP wiki. > > I agree that the current FLIP design achieves the two goals it wants to > achieve. But I am trying to see is if the current approach is the most > reasonable approach. > > Please let me check if I understand this correctly. From end users' > perspective, they will do the following when they want to configure their > Flink Jobs. > 1. Create a Configuration instance, and call setters of Configuration with > the ConfigOptions defined in different components. > 2. The Configuration created in step 1 will be passed around, and each > component will just exact their own options from it. > 3. ExecutionConfig, CheckpointConfig (and other Config classes) will > become a Configurable, which is responsible for extracting the > configuration values from the Configuration set by users in step 1. > > The confusion I had was that in step 1, how users are going to set the > configs for the ExecutionConfig / CheckpointConfig? There may be two ways: > a) Users will call setConfigurable(ExectionConfigConfigurableOption, > "config1:v1,config2:v2,config3:v3"), i.e. the entire ExecutionConfig is > exposed as a Configurable to the users. > b) Users will call setInteger(MAX_PARALLELISM, 1), > setInteger(LATENCY_TRACKING_INTERVAL, 1000), etc.. This means users will > set individual ConfigOptions for the ExecutionConfig. And they do not see > ExecutionConfig as a Configurable. > > I assume we are following b), then do we need to expose Configurable to > the users in this FLIP? My concern is that the Configurable may be related > to other mechanism such as plugin which we have not really thought through > in this FLIP. > > I know Becket at least has some thoughts about immutability and loading >> objects via the configuration but maybe they could be put into a follow-up >> FLIP if they are needed. > > I am perfectly fine to leave something out of the scope of this FLIP to > later FLIPs. But I think it is important to avoid introducing something in > this FLIP that will be shortly changed by the follow-up FLIPs. > > Thanks, > > Jiangjie (Becket) Qin > > On Tue, Sep 3, 2019 at 8:47 PM Aljoscha Krettek <[hidden email]> > wrote: > >> Hi, >> >> I think it’s important to keep in mind the original goals of this FLIP >> and not let the scope grow indefinitely. As I recall it, the goals are: >> >> - Extend the ConfigOption system enough to allow the Table API to >> configure options that are right now only available on >> CheckpointingOptions, ExecutionConfig, and StreamExecutionEnvironment. We >> also want to do this without manually having to “forward” all the available >> configuration options by introducing equivalent setters in the Table API >> >> - Do the above while keeping in mind that eventually we want to allow >> users to configure everything from either the flink-conf.yaml, vie command >> line parameters, or via a Configuration. >> >> I think the FLIP achieves this, with the added side goals of making >> validation a part of ConfigOptions, making them type safe, and making the >> validation constraints documentable (via automatic doc generation.) All >> this without breaking backwards compatibility, if I’m not mistaken. >> >> I think we should first agree what the basic goals are so that we can >> quickly converge to consensus on this FLIP because it blocks other >> people/work. Among other things FLIP-59 depends on this. What are other >> opinions that people have? I know Becket at least has some thoughts about >> immutability and loading objects via the configuration but maybe they could >> be put into a follow-up FLIP if they are needed. >> >> Also, I had one thought on the interaction of this FLIP-54 and FLIP-59 >> when it comes to naming. I think eventually it makes sense to have a common >> interface for things that are configurable from a Configuration (FLIP-59 >> introduces the first batch of this). It seems natural to call this >> interface Configurable. That’s a problem for this FLIP-54 because we also >> introduce a Configurable. Maybe the thing that we introduce here should be >> called ConfigObject or ConfigStruct to highlight that it has a more narrow >> focus and is really only a POJO for holding a bunch of config options that >> have to go together. What do you think? >> >> Best, >> Aljoscha >> >> > On 3. Sep 2019, at 14:08, Timo Walther <[hidden email]> wrote: >> > >> > Hi Danny, >> > >> > yes, this FLIP covers all the building blocks we need also for >> unification of the DDL properties. >> > >> > Regards, >> > Timo >> > >> > >> > On 03.09.19 13:45, Danny Chan wrote: >> >>> with the new SQL DDL >> >> based on properties as well as more connectors and formats coming up, >> >> unified configuration becomes more important >> >> >> >> I Cann’t agree more, do you think we can unify the config options key >> format here for all the DDL properties ? >> >> >> >> Best, >> >> Danny Chan >> >> 在 2019年8月16日 +0800 PM10:12,[hidden email],写道: >> >>> with the new SQL DDL >> >>> based on properties as well as more connectors and formats coming up, >> >>> unified configuration becomes more important >> > >> > >> >> |
In reply to this post by Becket Qin
Hi Becket,
You are right, that what we had in mind for ExecutionConfig/CheckpointConfig etc. is the option b) from your email. In the context of the FLIP-54, those objects are not Configurable. What we understood as a Configurable by the FLIP-54 are a simple pojos, that are stored under a single key. Such as the examples either from the ML thread (Host) or from the design doc (CacheFile). So when configuring the host user can provide a host like this: connector.host: address:localhost, port:1234 rather than connector.host.address: localhost connector.host.port: 1234 This is important especially if one wants to configure lists of such objects: connector.hosts: address:localhost,port:1234;address:localhost,port:4567 The intention was definitely not to store whole complex objects, such as ExecutionConfig, CheckpointConfig etc. that contain multiple different options Maybe it makes sense to call it ConfigObject as Aljosha suggested? What do you think? Would that make it more understandable? For the initialization/configuration of objects such as ExecutionConfig, CheckpointConfig you may have a look at FLIP-59[1] where we suggest to add a configure method to those classes and we pretty much describe the process you outline in the last message. Best, Dawid [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object On 04/09/2019 03:37, Becket Qin wrote: > Hi Timo, Dawid and Aljoscha, > > Thanks for clarifying the goals. It is very helpful to understand the > motivation here. It would be great to add them to the FLIP wiki. > > I agree that the current FLIP design achieves the two goals it wants to > achieve. But I am trying to see is if the current approach is the most > reasonable approach. > > Please let me check if I understand this correctly. From end users' > perspective, they will do the following when they want to configure their > Flink Jobs. > 1. Create a Configuration instance, and call setters of Configuration with > the ConfigOptions defined in different components. > 2. The Configuration created in step 1 will be passed around, and each > component will just exact their own options from it. > 3. ExecutionConfig, CheckpointConfig (and other Config classes) will become > a Configurable, which is responsible for extracting the configuration > values from the Configuration set by users in step 1. > > The confusion I had was that in step 1, how users are going to set the > configs for the ExecutionConfig / CheckpointConfig? There may be two ways: > a) Users will call setConfigurable(ExectionConfigConfigurableOption, > "config1:v1,config2:v2,config3:v3"), i.e. the entire ExecutionConfig is > exposed as a Configurable to the users. > b) Users will call setInteger(MAX_PARALLELISM, 1), > setInteger(LATENCY_TRACKING_INTERVAL, 1000), etc.. This means users will > set individual ConfigOptions for the ExecutionConfig. And they do not see > ExecutionConfig as a Configurable. > > I assume we are following b), then do we need to expose Configurable to the > users in this FLIP? My concern is that the Configurable may be related to > other mechanism such as plugin which we have not really thought through in > this FLIP. > > I know Becket at least has some thoughts about immutability and loading >> objects via the configuration but maybe they could be put into a follow-up >> FLIP if they are needed. > I am perfectly fine to leave something out of the scope of this FLIP to > later FLIPs. But I think it is important to avoid introducing something in > this FLIP that will be shortly changed by the follow-up FLIPs. > > Thanks, > > Jiangjie (Becket) Qin > > On Tue, Sep 3, 2019 at 8:47 PM Aljoscha Krettek <[hidden email]> wrote: > >> Hi, >> >> I think it’s important to keep in mind the original goals of this FLIP and >> not let the scope grow indefinitely. As I recall it, the goals are: >> >> - Extend the ConfigOption system enough to allow the Table API to >> configure options that are right now only available on >> CheckpointingOptions, ExecutionConfig, and StreamExecutionEnvironment. We >> also want to do this without manually having to “forward” all the available >> configuration options by introducing equivalent setters in the Table API >> >> - Do the above while keeping in mind that eventually we want to allow >> users to configure everything from either the flink-conf.yaml, vie command >> line parameters, or via a Configuration. >> >> I think the FLIP achieves this, with the added side goals of making >> validation a part of ConfigOptions, making them type safe, and making the >> validation constraints documentable (via automatic doc generation.) All >> this without breaking backwards compatibility, if I’m not mistaken. >> >> I think we should first agree what the basic goals are so that we can >> quickly converge to consensus on this FLIP because it blocks other >> people/work. Among other things FLIP-59 depends on this. What are other >> opinions that people have? I know Becket at least has some thoughts about >> immutability and loading objects via the configuration but maybe they could >> be put into a follow-up FLIP if they are needed. >> >> Also, I had one thought on the interaction of this FLIP-54 and FLIP-59 >> when it comes to naming. I think eventually it makes sense to have a common >> interface for things that are configurable from a Configuration (FLIP-59 >> introduces the first batch of this). It seems natural to call this >> interface Configurable. That’s a problem for this FLIP-54 because we also >> introduce a Configurable. Maybe the thing that we introduce here should be >> called ConfigObject or ConfigStruct to highlight that it has a more narrow >> focus and is really only a POJO for holding a bunch of config options that >> have to go together. What do you think? >> >> Best, >> Aljoscha >> >>> On 3. Sep 2019, at 14:08, Timo Walther <[hidden email]> wrote: >>> >>> Hi Danny, >>> >>> yes, this FLIP covers all the building blocks we need also for >> unification of the DDL properties. >>> Regards, >>> Timo >>> >>> >>> On 03.09.19 13:45, Danny Chan wrote: >>>>> with the new SQL DDL >>>> based on properties as well as more connectors and formats coming up, >>>> unified configuration becomes more important >>>> >>>> I Cann’t agree more, do you think we can unify the config options key >> format here for all the DDL properties ? >>>> Best, >>>> Danny Chan >>>> 在 2019年8月16日 +0800 PM10:12,[hidden email],写道: >>>>> with the new SQL DDL >>>>> based on properties as well as more connectors and formats coming up, >>>>> unified configuration becomes more important >>> >> signature.asc (849 bytes) Download Attachment |
Hi Dawid,
Thanks a lot for the clarification. Got it now. A few more thoughts: 1. Naming. I agree that if the name of "Configurable" is a little misleading if we just want to use it to save POJOs. It would probably help to just name it something like "ConfigPojo". 2. Flat config map v.s. structured config map. From user's perspective, I personally find a flat config map is usually easier to understand than a structured config format. But it is just my personal opinion and up for debate. Taking the Host and CachedFile as examples, personally I think the following format is more concise and user friendly: Host: 192.168.0.1:1234 (single config) Hosts: 192.168.0.1:1234, 192.168.0.2:5678 (list of configs) CachedFile: path:file:flag (single config) CachedFile: path1:file1:flag1, path2:file2:flag2 (list config) Maybe for complicate POJOs the full K-V pair would be necessary, but it looks we are trying to avoid such complicated POJOs to begin with. Even if a full K-V is needed, a List<Map<String, String>> format would also be almost equivalent to the current design. 3. The necessity of the POJO class in Configuration / ConfigOption system. I can see the convenience of have a POJO (or ConfigObject) type supported in the Configuration / ConfigOption. However, one thing to notice is that API wise, the ConfigurableFactory can return arbitrary type of class instead of just POJO. This can easily be misused or abused in cases such as plugins. And the current API design will force such plugins to implement methods like toConfiguration() which is a little awkward. Given that 1) there will not be many such Pojo classes and 2) these POJO classes are defined by Flink, I am thinking that one alternative approach is to just have the constructors to take the configuration String (or list of string) and parse that. This will avoid a few complications in this FLIP. a) No need to have the ConfigurableFactory class b) No need to have the toConfiguration() implementation. So there is just one way to set values in the Configuration instance. c) The Configuration / ConfigOption does not have to also deal with the Object creation. Instead, they will simply focus on configuration itself. Thanks for the patient discussion. I don't want to block this FLIP further, so I am fine to go with the current design with changing the name of Configurable to something like ConfigPojo in order to avoid misuse as much as possible. Thanks, Jiangjie (Becket) Qin On Wed, Sep 4, 2019 at 5:50 PM Dawid Wysakowicz <[hidden email]> wrote: > Hi Becket, > > You are right, that what we had in mind for > ExecutionConfig/CheckpointConfig etc. is the option b) from your email. > In the context of the FLIP-54, those objects are not Configurable. What > we understood as a Configurable by the FLIP-54 are a simple pojos, that > are stored under a single key. Such as the examples either from the ML > thread (Host) or from the design doc (CacheFile). So when configuring > the host user can provide a host like this: > > connector.host: address:localhost, port:1234 > > rather than > > connector.host.address: localhost > > connector.host.port: 1234 > > This is important especially if one wants to configure lists of such > objects: > > connector.hosts: address:localhost,port:1234;address:localhost,port:4567 > > The intention was definitely not to store whole complex objects, such as > ExecutionConfig, CheckpointConfig etc. that contain multiple different > options Maybe it makes sense to call it ConfigObject as Aljosha > suggested? What do you think? Would that make it more understandable? > > For the initialization/configuration of objects such as ExecutionConfig, > CheckpointConfig you may have a look at FLIP-59[1] where we suggest to > add a configure method to those classes and we pretty much describe the > process you outline in the last message. > > Best, > > Dawid > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object > > On 04/09/2019 03:37, Becket Qin wrote: > > Hi Timo, Dawid and Aljoscha, > > > > Thanks for clarifying the goals. It is very helpful to understand the > > motivation here. It would be great to add them to the FLIP wiki. > > > > I agree that the current FLIP design achieves the two goals it wants to > > achieve. But I am trying to see is if the current approach is the most > > reasonable approach. > > > > Please let me check if I understand this correctly. From end users' > > perspective, they will do the following when they want to configure their > > Flink Jobs. > > 1. Create a Configuration instance, and call setters of Configuration > with > > the ConfigOptions defined in different components. > > 2. The Configuration created in step 1 will be passed around, and each > > component will just exact their own options from it. > > 3. ExecutionConfig, CheckpointConfig (and other Config classes) will > become > > a Configurable, which is responsible for extracting the configuration > > values from the Configuration set by users in step 1. > > > > The confusion I had was that in step 1, how users are going to set the > > configs for the ExecutionConfig / CheckpointConfig? There may be two > ways: > > a) Users will call setConfigurable(ExectionConfigConfigurableOption, > > "config1:v1,config2:v2,config3:v3"), i.e. the entire ExecutionConfig is > > exposed as a Configurable to the users. > > b) Users will call setInteger(MAX_PARALLELISM, 1), > > setInteger(LATENCY_TRACKING_INTERVAL, 1000), etc.. This means users will > > set individual ConfigOptions for the ExecutionConfig. And they do not see > > ExecutionConfig as a Configurable. > > > > I assume we are following b), then do we need to expose Configurable to > the > > users in this FLIP? My concern is that the Configurable may be related to > > other mechanism such as plugin which we have not really thought through > in > > this FLIP. > > > > I know Becket at least has some thoughts about immutability and loading > >> objects via the configuration but maybe they could be put into a > follow-up > >> FLIP if they are needed. > > I am perfectly fine to leave something out of the scope of this FLIP to > > later FLIPs. But I think it is important to avoid introducing something > in > > this FLIP that will be shortly changed by the follow-up FLIPs. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Tue, Sep 3, 2019 at 8:47 PM Aljoscha Krettek <[hidden email]> > wrote: > > > >> Hi, > >> > >> I think it’s important to keep in mind the original goals of this FLIP > and > >> not let the scope grow indefinitely. As I recall it, the goals are: > >> > >> - Extend the ConfigOption system enough to allow the Table API to > >> configure options that are right now only available on > >> CheckpointingOptions, ExecutionConfig, and StreamExecutionEnvironment. > We > >> also want to do this without manually having to “forward” all the > available > >> configuration options by introducing equivalent setters in the Table API > >> > >> - Do the above while keeping in mind that eventually we want to allow > >> users to configure everything from either the flink-conf.yaml, vie > command > >> line parameters, or via a Configuration. > >> > >> I think the FLIP achieves this, with the added side goals of making > >> validation a part of ConfigOptions, making them type safe, and making > the > >> validation constraints documentable (via automatic doc generation.) All > >> this without breaking backwards compatibility, if I’m not mistaken. > >> > >> I think we should first agree what the basic goals are so that we can > >> quickly converge to consensus on this FLIP because it blocks other > >> people/work. Among other things FLIP-59 depends on this. What are other > >> opinions that people have? I know Becket at least has some thoughts > about > >> immutability and loading objects via the configuration but maybe they > could > >> be put into a follow-up FLIP if they are needed. > >> > >> Also, I had one thought on the interaction of this FLIP-54 and FLIP-59 > >> when it comes to naming. I think eventually it makes sense to have a > common > >> interface for things that are configurable from a Configuration (FLIP-59 > >> introduces the first batch of this). It seems natural to call this > >> interface Configurable. That’s a problem for this FLIP-54 because we > also > >> introduce a Configurable. Maybe the thing that we introduce here should > be > >> called ConfigObject or ConfigStruct to highlight that it has a more > narrow > >> focus and is really only a POJO for holding a bunch of config options > that > >> have to go together. What do you think? > >> > >> Best, > >> Aljoscha > >> > >>> On 3. Sep 2019, at 14:08, Timo Walther <[hidden email]> wrote: > >>> > >>> Hi Danny, > >>> > >>> yes, this FLIP covers all the building blocks we need also for > >> unification of the DDL properties. > >>> Regards, > >>> Timo > >>> > >>> > >>> On 03.09.19 13:45, Danny Chan wrote: > >>>>> with the new SQL DDL > >>>> based on properties as well as more connectors and formats coming up, > >>>> unified configuration becomes more important > >>>> > >>>> I Cann’t agree more, do you think we can unify the config options key > >> format here for all the DDL properties ? > >>>> Best, > >>>> Danny Chan > >>>> 在 2019年8月16日 +0800 PM10:12,[hidden email],写道: > >>>>> with the new SQL DDL > >>>>> based on properties as well as more connectors and formats coming up, > >>>>> unified configuration becomes more important > >>> > >> > > |
Hi all,
I would also like to give a +1 for supporting lists as config options with the delimeter being a parameter (if we cannot find a consensus). To some extent the current codebase has already solved the issue by already having lists as options, but the problem is that so far there was no principled way of doing it so everyone has his/her own way of encoding such lists. For example, we have: CoreOptions.TMP_DIRS: separated by ",", "|", or the system's path separator Whatever uses the NetUtils.getPortRangeFromString(), such as QueryableStateOptions.PROXY_PORT_RANGE and QueryableStateOptions.SERVER_PORT_RANGE : it can be a port: "9123", a range of ports: "50100-50200", or a list of ranges and ports: "50100-50200,50300-50400,51234" dynamicproperties in the FlinkYarnSessionCLI: which uses @@ as delimeter. This is not strictly a config option, but it would be nice to aim for unified user experience in CLI and config files. Cheers, Kostas On Thu, Sep 5, 2019 at 8:42 AM Becket Qin <[hidden email]> wrote: > > Hi Dawid, > > Thanks a lot for the clarification. Got it now. A few more thoughts: > > 1. Naming. > I agree that if the name of "Configurable" is a little misleading if we > just want to use it to save POJOs. It would probably help to just name it > something like "ConfigPojo". > > 2. Flat config map v.s. structured config map. > From user's perspective, I personally find a flat config map is usually > easier to understand than a structured config format. But it is just my > personal opinion and up for debate. > > Taking the Host and CachedFile as examples, personally I think the > following format is more concise and user friendly: > > Host: 192.168.0.1:1234 (single config) > Hosts: 192.168.0.1:1234, 192.168.0.2:5678 (list of configs) > > CachedFile: path:file:flag (single config) > CachedFile: path1:file1:flag1, path2:file2:flag2 (list config) > > Maybe for complicate POJOs the full K-V pair would be necessary, but it > looks we are trying to avoid such complicated POJOs to begin with. Even if > a full K-V is needed, a List<Map<String, String>> format would also be > almost equivalent to the current design. > > 3. The necessity of the POJO class in Configuration / ConfigOption system. > I can see the convenience of have a POJO (or ConfigObject) type supported > in the Configuration / ConfigOption. However, one thing to notice is that > API wise, the ConfigurableFactory can return arbitrary type of class > instead of just POJO. This can easily be misused or abused in cases such as > plugins. And the current API design will force such plugins to implement > methods like toConfiguration() which is a little awkward. > > Given that 1) there will not be many such Pojo classes and 2) these POJO > classes are defined by Flink, I am thinking that one alternative approach > is to just have the constructors to take the configuration String (or list > of string) and parse that. This will avoid a few complications in this FLIP. > a) No need to have the ConfigurableFactory class > b) No need to have the toConfiguration() implementation. So there is just > one way to set values in the Configuration instance. > c) The Configuration / ConfigOption does not have to also deal with the > Object creation. Instead, they will simply focus on configuration itself. > > Thanks for the patient discussion. I don't want to block this FLIP further, > so I am fine to go with the current design with changing the name of > Configurable to something like ConfigPojo in order to avoid misuse as much > as possible. > > Thanks, > > Jiangjie (Becket) Qin > > On Wed, Sep 4, 2019 at 5:50 PM Dawid Wysakowicz <[hidden email]> > wrote: > > > Hi Becket, > > > > You are right, that what we had in mind for > > ExecutionConfig/CheckpointConfig etc. is the option b) from your email. > > In the context of the FLIP-54, those objects are not Configurable. What > > we understood as a Configurable by the FLIP-54 are a simple pojos, that > > are stored under a single key. Such as the examples either from the ML > > thread (Host) or from the design doc (CacheFile). So when configuring > > the host user can provide a host like this: > > > > connector.host: address:localhost, port:1234 > > > > rather than > > > > connector.host.address: localhost > > > > connector.host.port: 1234 > > > > This is important especially if one wants to configure lists of such > > objects: > > > > connector.hosts: address:localhost,port:1234;address:localhost,port:4567 > > > > The intention was definitely not to store whole complex objects, such as > > ExecutionConfig, CheckpointConfig etc. that contain multiple different > > options Maybe it makes sense to call it ConfigObject as Aljosha > > suggested? What do you think? Would that make it more understandable? > > > > For the initialization/configuration of objects such as ExecutionConfig, > > CheckpointConfig you may have a look at FLIP-59[1] where we suggest to > > add a configure method to those classes and we pretty much describe the > > process you outline in the last message. > > > > Best, > > > > Dawid > > > > [1] > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object > > > > On 04/09/2019 03:37, Becket Qin wrote: > > > Hi Timo, Dawid and Aljoscha, > > > > > > Thanks for clarifying the goals. It is very helpful to understand the > > > motivation here. It would be great to add them to the FLIP wiki. > > > > > > I agree that the current FLIP design achieves the two goals it wants to > > > achieve. But I am trying to see is if the current approach is the most > > > reasonable approach. > > > > > > Please let me check if I understand this correctly. From end users' > > > perspective, they will do the following when they want to configure their > > > Flink Jobs. > > > 1. Create a Configuration instance, and call setters of Configuration > > with > > > the ConfigOptions defined in different components. > > > 2. The Configuration created in step 1 will be passed around, and each > > > component will just exact their own options from it. > > > 3. ExecutionConfig, CheckpointConfig (and other Config classes) will > > become > > > a Configurable, which is responsible for extracting the configuration > > > values from the Configuration set by users in step 1. > > > > > > The confusion I had was that in step 1, how users are going to set the > > > configs for the ExecutionConfig / CheckpointConfig? There may be two > > ways: > > > a) Users will call setConfigurable(ExectionConfigConfigurableOption, > > > "config1:v1,config2:v2,config3:v3"), i.e. the entire ExecutionConfig is > > > exposed as a Configurable to the users. > > > b) Users will call setInteger(MAX_PARALLELISM, 1), > > > setInteger(LATENCY_TRACKING_INTERVAL, 1000), etc.. This means users will > > > set individual ConfigOptions for the ExecutionConfig. And they do not see > > > ExecutionConfig as a Configurable. > > > > > > I assume we are following b), then do we need to expose Configurable to > > the > > > users in this FLIP? My concern is that the Configurable may be related to > > > other mechanism such as plugin which we have not really thought through > > in > > > this FLIP. > > > > > > I know Becket at least has some thoughts about immutability and loading > > >> objects via the configuration but maybe they could be put into a > > follow-up > > >> FLIP if they are needed. > > > I am perfectly fine to leave something out of the scope of this FLIP to > > > later FLIPs. But I think it is important to avoid introducing something > > in > > > this FLIP that will be shortly changed by the follow-up FLIPs. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Tue, Sep 3, 2019 at 8:47 PM Aljoscha Krettek <[hidden email]> > > wrote: > > > > > >> Hi, > > >> > > >> I think it’s important to keep in mind the original goals of this FLIP > > and > > >> not let the scope grow indefinitely. As I recall it, the goals are: > > >> > > >> - Extend the ConfigOption system enough to allow the Table API to > > >> configure options that are right now only available on > > >> CheckpointingOptions, ExecutionConfig, and StreamExecutionEnvironment. > > We > > >> also want to do this without manually having to “forward” all the > > available > > >> configuration options by introducing equivalent setters in the Table API > > >> > > >> - Do the above while keeping in mind that eventually we want to allow > > >> users to configure everything from either the flink-conf.yaml, vie > > command > > >> line parameters, or via a Configuration. > > >> > > >> I think the FLIP achieves this, with the added side goals of making > > >> validation a part of ConfigOptions, making them type safe, and making > > the > > >> validation constraints documentable (via automatic doc generation.) All > > >> this without breaking backwards compatibility, if I’m not mistaken. > > >> > > >> I think we should first agree what the basic goals are so that we can > > >> quickly converge to consensus on this FLIP because it blocks other > > >> people/work. Among other things FLIP-59 depends on this. What are other > > >> opinions that people have? I know Becket at least has some thoughts > > about > > >> immutability and loading objects via the configuration but maybe they > > could > > >> be put into a follow-up FLIP if they are needed. > > >> > > >> Also, I had one thought on the interaction of this FLIP-54 and FLIP-59 > > >> when it comes to naming. I think eventually it makes sense to have a > > common > > >> interface for things that are configurable from a Configuration (FLIP-59 > > >> introduces the first batch of this). It seems natural to call this > > >> interface Configurable. That’s a problem for this FLIP-54 because we > > also > > >> introduce a Configurable. Maybe the thing that we introduce here should > > be > > >> called ConfigObject or ConfigStruct to highlight that it has a more > > narrow > > >> focus and is really only a POJO for holding a bunch of config options > > that > > >> have to go together. What do you think? > > >> > > >> Best, > > >> Aljoscha > > >> > > >>> On 3. Sep 2019, at 14:08, Timo Walther <[hidden email]> wrote: > > >>> > > >>> Hi Danny, > > >>> > > >>> yes, this FLIP covers all the building blocks we need also for > > >> unification of the DDL properties. > > >>> Regards, > > >>> Timo > > >>> > > >>> > > >>> On 03.09.19 13:45, Danny Chan wrote: > > >>>>> with the new SQL DDL > > >>>> based on properties as well as more connectors and formats coming up, > > >>>> unified configuration becomes more important > > >>>> > > >>>> I Cann’t agree more, do you think we can unify the config options key > > >> format here for all the DDL properties ? > > >>>> Best, > > >>>> Danny Chan > > >>>> 在 2019年8月16日 +0800 PM10:12,[hidden email],写道: > > >>>>> with the new SQL DDL > > >>>>> based on properties as well as more connectors and formats coming up, > > >>>>> unified configuration becomes more important > > >>> > > >> > > > > |
Hi all,
thanks for all the feedback we have received so far. Also from the offline feedback that I have received so far, it seems this is a very sensitive topic. I think we have reached consensus that we want to improve the configuration experience in Flink. We just need to figure out the "how". Maybe FLIP-54 contains too many features at the same time which made the discussion around this topic quite difficult to follow. In order to unblock the efforts, I would like to split FLIP-54 into 3 independent FLIPs: - One FLIP for adding a new way of declaring, reading and writing config options with a basic set of data types. - One FLIP for adding the concept of a "tuple/struct/object" (terminology is up for discussion) data type. - One FLIP that discusses validation for config options and the concept of config option sets. This allows us to have progress and unblock features that depend on parts of FLIP-54. What do you think? Thanks, Timo On 30.09.19 12:03, Kostas Kloudas wrote: > Hi all, > > I would also like to give a +1 for supporting lists as config options > with the delimeter being a parameter (if we cannot find a consensus). > To some extent the current codebase has already solved the issue by > already having lists as options, but the problem is that so far there > was no principled way of doing it so everyone has his/her own way of > encoding such lists. For example, we have: > > CoreOptions.TMP_DIRS: separated by ",", "|", or the system's path separator > Whatever uses the NetUtils.getPortRangeFromString(), such as > QueryableStateOptions.PROXY_PORT_RANGE and > QueryableStateOptions.SERVER_PORT_RANGE : it can be a port: > "9123", a range of ports: "50100-50200", or a list of ranges and > ports: "50100-50200,50300-50400,51234" > dynamicproperties in the FlinkYarnSessionCLI: which uses @@ as > delimeter. This is not strictly a config option, but it would be nice > to aim for unified user experience in CLI and config files. > > Cheers, > Kostas > > On Thu, Sep 5, 2019 at 8:42 AM Becket Qin <[hidden email]> wrote: >> Hi Dawid, >> >> Thanks a lot for the clarification. Got it now. A few more thoughts: >> >> 1. Naming. >> I agree that if the name of "Configurable" is a little misleading if we >> just want to use it to save POJOs. It would probably help to just name it >> something like "ConfigPojo". >> >> 2. Flat config map v.s. structured config map. >> From user's perspective, I personally find a flat config map is usually >> easier to understand than a structured config format. But it is just my >> personal opinion and up for debate. >> >> Taking the Host and CachedFile as examples, personally I think the >> following format is more concise and user friendly: >> >> Host: 192.168.0.1:1234 (single config) >> Hosts: 192.168.0.1:1234, 192.168.0.2:5678 (list of configs) >> >> CachedFile: path:file:flag (single config) >> CachedFile: path1:file1:flag1, path2:file2:flag2 (list config) >> >> Maybe for complicate POJOs the full K-V pair would be necessary, but it >> looks we are trying to avoid such complicated POJOs to begin with. Even if >> a full K-V is needed, a List<Map<String, String>> format would also be >> almost equivalent to the current design. >> >> 3. The necessity of the POJO class in Configuration / ConfigOption system. >> I can see the convenience of have a POJO (or ConfigObject) type supported >> in the Configuration / ConfigOption. However, one thing to notice is that >> API wise, the ConfigurableFactory can return arbitrary type of class >> instead of just POJO. This can easily be misused or abused in cases such as >> plugins. And the current API design will force such plugins to implement >> methods like toConfiguration() which is a little awkward. >> >> Given that 1) there will not be many such Pojo classes and 2) these POJO >> classes are defined by Flink, I am thinking that one alternative approach >> is to just have the constructors to take the configuration String (or list >> of string) and parse that. This will avoid a few complications in this FLIP. >> a) No need to have the ConfigurableFactory class >> b) No need to have the toConfiguration() implementation. So there is just >> one way to set values in the Configuration instance. >> c) The Configuration / ConfigOption does not have to also deal with the >> Object creation. Instead, they will simply focus on configuration itself. >> >> Thanks for the patient discussion. I don't want to block this FLIP further, >> so I am fine to go with the current design with changing the name of >> Configurable to something like ConfigPojo in order to avoid misuse as much >> as possible. >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Wed, Sep 4, 2019 at 5:50 PM Dawid Wysakowicz <[hidden email]> >> wrote: >> >>> Hi Becket, >>> >>> You are right, that what we had in mind for >>> ExecutionConfig/CheckpointConfig etc. is the option b) from your email. >>> In the context of the FLIP-54, those objects are not Configurable. What >>> we understood as a Configurable by the FLIP-54 are a simple pojos, that >>> are stored under a single key. Such as the examples either from the ML >>> thread (Host) or from the design doc (CacheFile). So when configuring >>> the host user can provide a host like this: >>> >>> connector.host: address:localhost, port:1234 >>> >>> rather than >>> >>> connector.host.address: localhost >>> >>> connector.host.port: 1234 >>> >>> This is important especially if one wants to configure lists of such >>> objects: >>> >>> connector.hosts: address:localhost,port:1234;address:localhost,port:4567 >>> >>> The intention was definitely not to store whole complex objects, such as >>> ExecutionConfig, CheckpointConfig etc. that contain multiple different >>> options Maybe it makes sense to call it ConfigObject as Aljosha >>> suggested? What do you think? Would that make it more understandable? >>> >>> For the initialization/configuration of objects such as ExecutionConfig, >>> CheckpointConfig you may have a look at FLIP-59[1] where we suggest to >>> add a configure method to those classes and we pretty much describe the >>> process you outline in the last message. >>> >>> Best, >>> >>> Dawid >>> >>> [1] >>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object >>> >>> On 04/09/2019 03:37, Becket Qin wrote: >>>> Hi Timo, Dawid and Aljoscha, >>>> >>>> Thanks for clarifying the goals. It is very helpful to understand the >>>> motivation here. It would be great to add them to the FLIP wiki. >>>> >>>> I agree that the current FLIP design achieves the two goals it wants to >>>> achieve. But I am trying to see is if the current approach is the most >>>> reasonable approach. >>>> >>>> Please let me check if I understand this correctly. From end users' >>>> perspective, they will do the following when they want to configure their >>>> Flink Jobs. >>>> 1. Create a Configuration instance, and call setters of Configuration >>> with >>>> the ConfigOptions defined in different components. >>>> 2. The Configuration created in step 1 will be passed around, and each >>>> component will just exact their own options from it. >>>> 3. ExecutionConfig, CheckpointConfig (and other Config classes) will >>> become >>>> a Configurable, which is responsible for extracting the configuration >>>> values from the Configuration set by users in step 1. >>>> >>>> The confusion I had was that in step 1, how users are going to set the >>>> configs for the ExecutionConfig / CheckpointConfig? There may be two >>> ways: >>>> a) Users will call setConfigurable(ExectionConfigConfigurableOption, >>>> "config1:v1,config2:v2,config3:v3"), i.e. the entire ExecutionConfig is >>>> exposed as a Configurable to the users. >>>> b) Users will call setInteger(MAX_PARALLELISM, 1), >>>> setInteger(LATENCY_TRACKING_INTERVAL, 1000), etc.. This means users will >>>> set individual ConfigOptions for the ExecutionConfig. And they do not see >>>> ExecutionConfig as a Configurable. >>>> >>>> I assume we are following b), then do we need to expose Configurable to >>> the >>>> users in this FLIP? My concern is that the Configurable may be related to >>>> other mechanism such as plugin which we have not really thought through >>> in >>>> this FLIP. >>>> >>>> I know Becket at least has some thoughts about immutability and loading >>>>> objects via the configuration but maybe they could be put into a >>> follow-up >>>>> FLIP if they are needed. >>>> I am perfectly fine to leave something out of the scope of this FLIP to >>>> later FLIPs. But I think it is important to avoid introducing something >>> in >>>> this FLIP that will be shortly changed by the follow-up FLIPs. >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Tue, Sep 3, 2019 at 8:47 PM Aljoscha Krettek <[hidden email]> >>> wrote: >>>>> Hi, >>>>> >>>>> I think it’s important to keep in mind the original goals of this FLIP >>> and >>>>> not let the scope grow indefinitely. As I recall it, the goals are: >>>>> >>>>> - Extend the ConfigOption system enough to allow the Table API to >>>>> configure options that are right now only available on >>>>> CheckpointingOptions, ExecutionConfig, and StreamExecutionEnvironment. >>> We >>>>> also want to do this without manually having to “forward” all the >>> available >>>>> configuration options by introducing equivalent setters in the Table API >>>>> >>>>> - Do the above while keeping in mind that eventually we want to allow >>>>> users to configure everything from either the flink-conf.yaml, vie >>> command >>>>> line parameters, or via a Configuration. >>>>> >>>>> I think the FLIP achieves this, with the added side goals of making >>>>> validation a part of ConfigOptions, making them type safe, and making >>> the >>>>> validation constraints documentable (via automatic doc generation.) All >>>>> this without breaking backwards compatibility, if I’m not mistaken. >>>>> >>>>> I think we should first agree what the basic goals are so that we can >>>>> quickly converge to consensus on this FLIP because it blocks other >>>>> people/work. Among other things FLIP-59 depends on this. What are other >>>>> opinions that people have? I know Becket at least has some thoughts >>> about >>>>> immutability and loading objects via the configuration but maybe they >>> could >>>>> be put into a follow-up FLIP if they are needed. >>>>> >>>>> Also, I had one thought on the interaction of this FLIP-54 and FLIP-59 >>>>> when it comes to naming. I think eventually it makes sense to have a >>> common >>>>> interface for things that are configurable from a Configuration (FLIP-59 >>>>> introduces the first batch of this). It seems natural to call this >>>>> interface Configurable. That’s a problem for this FLIP-54 because we >>> also >>>>> introduce a Configurable. Maybe the thing that we introduce here should >>> be >>>>> called ConfigObject or ConfigStruct to highlight that it has a more >>> narrow >>>>> focus and is really only a POJO for holding a bunch of config options >>> that >>>>> have to go together. What do you think? >>>>> >>>>> Best, >>>>> Aljoscha >>>>> >>>>>> On 3. Sep 2019, at 14:08, Timo Walther <[hidden email]> wrote: >>>>>> >>>>>> Hi Danny, >>>>>> >>>>>> yes, this FLIP covers all the building blocks we need also for >>>>> unification of the DDL properties. >>>>>> Regards, >>>>>> Timo >>>>>> >>>>>> >>>>>> On 03.09.19 13:45, Danny Chan wrote: >>>>>>>> with the new SQL DDL >>>>>>> based on properties as well as more connectors and formats coming up, >>>>>>> unified configuration becomes more important >>>>>>> >>>>>>> I Cann’t agree more, do you think we can unify the config options key >>>>> format here for all the DDL properties ? >>>>>>> Best, >>>>>>> Danny Chan >>>>>>> 在 2019年8月16日 +0800 PM10:12,[hidden email],写道: >>>>>>>> with the new SQL DDL >>>>>>>> based on properties as well as more connectors and formats coming up, >>>>>>>> unified configuration becomes more important >>> |
Free forum by Nabble | Edit this page |