[DISCUSS] Hierarchies in ConfigOption

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Hierarchies in ConfigOption

Timo Walther-2
Hi everyone,

discussions around ConfigOption seem to be very popular recently. So I
would also like to get some opinions on a different topic.

How do we represent hierarchies in ConfigOption? In FLIP-122, we agreed
on the following DDL syntax:

CREATE TABLE fs_table (
  ...
) WITH (
  'connector' = 'filesystem',
  'path' = 'file:///path/to/whatever',
  'format' = 'csv',
  'format.allow-comments' = 'true',
  'format.ignore-parse-errors' = 'true'
);

Of course this is slightly different from regular Flink core
configuration but a connector still needs to be configured based on
these options.

However, I think this FLIP violates our code style guidelines because

'format' = 'json',
'format.fail-on-missing-field' = 'false'

is an invalid hierarchy. `format` cannot be a string and a top-level
object at the same time.

We have similar problems in our runtime configuration:

state.backend=
state.backend.incremental=
restart-strategy=
restart-strategy.fixed-delay.delay=
high-availability=
high-availability.cluster-id=

The code style guide states "Think of the configuration as nested
objects (JSON style)". So such hierarchies cannot be represented in a
nested JSON style.

Therefore, should we advocate instead:

'format.kind' = 'json',
'format.fail-on-missing-field' = 'false'

What do you think?

Thanks,
Timo

[1]
https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Chesnay Schepler-3
 > Therefore, should we advocate instead:
 >
 > 'format.kind' = 'json',
 > 'format.fail-on-missing-field' = 'false'

Yes. That's pretty much it.

This is reasonable important to nail down as with such violations I
believe we could not actually switch to a standard YAML parser.

On 29/04/2020 16:05, Timo Walther wrote:

> Hi everyone,
>
> discussions around ConfigOption seem to be very popular recently. So I
> would also like to get some opinions on a different topic.
>
> How do we represent hierarchies in ConfigOption? In FLIP-122, we
> agreed on the following DDL syntax:
>
> CREATE TABLE fs_table (
>  ...
> ) WITH (
>  'connector' = 'filesystem',
>  'path' = 'file:///path/to/whatever',
>  'format' = 'csv',
>  'format.allow-comments' = 'true',
>  'format.ignore-parse-errors' = 'true'
> );
>
> Of course this is slightly different from regular Flink core
> configuration but a connector still needs to be configured based on
> these options.
>
> However, I think this FLIP violates our code style guidelines because
>
> 'format' = 'json',
> 'format.fail-on-missing-field' = 'false'
>
> is an invalid hierarchy. `format` cannot be a string and a top-level
> object at the same time.
>
> We have similar problems in our runtime configuration:
>
> state.backend=
> state.backend.incremental=
> restart-strategy=
> restart-strategy.fixed-delay.delay=
> high-availability=
> high-availability.cluster-id=
>
> The code style guide states "Think of the configuration as nested
> objects (JSON style)". So such hierarchies cannot be represented in a
> nested JSON style.
>
> Therefore, should we advocate instead:
>
> 'format.kind' = 'json',
> 'format.fail-on-missing-field' = 'false'
>
> What do you think?
>
> Thanks,
> Timo
>
> [1]
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Jark Wu-2
From a user's perspective, I prefer the shorter one "format=json", because
it's more concise and straightforward. The "kind" is redundant for users.
Is there a real case requires to represent the configuration in JSON style?
As far as I can see, I don't see such requirement, and everything works
fine by now.

So I'm in favor of "format=json". But if the community insist to follow
code style on this, I'm also fine with the longer one.

Btw, I also CC user mailing list to listen more user's feedback. Because I
think this is relative to usability.

Best,
Jark

On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> wrote:

>  > Therefore, should we advocate instead:
>  >
>  > 'format.kind' = 'json',
>  > 'format.fail-on-missing-field' = 'false'
>
> Yes. That's pretty much it.
>
> This is reasonable important to nail down as with such violations I
> believe we could not actually switch to a standard YAML parser.
>
> On 29/04/2020 16:05, Timo Walther wrote:
> > Hi everyone,
> >
> > discussions around ConfigOption seem to be very popular recently. So I
> > would also like to get some opinions on a different topic.
> >
> > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > agreed on the following DDL syntax:
> >
> > CREATE TABLE fs_table (
> >  ...
> > ) WITH (
> >  'connector' = 'filesystem',
> >  'path' = 'file:///path/to/whatever',
> >  'format' = 'csv',
> >  'format.allow-comments' = 'true',
> >  'format.ignore-parse-errors' = 'true'
> > );
> >
> > Of course this is slightly different from regular Flink core
> > configuration but a connector still needs to be configured based on
> > these options.
> >
> > However, I think this FLIP violates our code style guidelines because
> >
> > 'format' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > is an invalid hierarchy. `format` cannot be a string and a top-level
> > object at the same time.
> >
> > We have similar problems in our runtime configuration:
> >
> > state.backend=
> > state.backend.incremental=
> > restart-strategy=
> > restart-strategy.fixed-delay.delay=
> > high-availability=
> > high-availability.cluster-id=
> >
> > The code style guide states "Think of the configuration as nested
> > objects (JSON style)". So such hierarchies cannot be represented in a
> > nested JSON style.
> >
> > Therefore, should we advocate instead:
> >
> > 'format.kind' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Flavio Pompermaier
Personally I don't have any preference here.  Compliance wih standard YAML
parser is probably more important

On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:

> From a user's perspective, I prefer the shorter one "format=json", because
> it's more concise and straightforward. The "kind" is redundant for users.
> Is there a real case requires to represent the configuration in JSON style?
> As far as I can see, I don't see such requirement, and everything works
> fine by now.
>
> So I'm in favor of "format=json". But if the community insist to follow
> code style on this, I'm also fine with the longer one.
>
> Btw, I also CC user mailing list to listen more user's feedback. Because I
> think this is relative to usability.
>
> Best,
> Jark
>
> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> wrote:
>
> >  > Therefore, should we advocate instead:
> >  >
> >  > 'format.kind' = 'json',
> >  > 'format.fail-on-missing-field' = 'false'
> >
> > Yes. That's pretty much it.
> >
> > This is reasonable important to nail down as with such violations I
> > believe we could not actually switch to a standard YAML parser.
> >
> > On 29/04/2020 16:05, Timo Walther wrote:
> > > Hi everyone,
> > >
> > > discussions around ConfigOption seem to be very popular recently. So I
> > > would also like to get some opinions on a different topic.
> > >
> > > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > > agreed on the following DDL syntax:
> > >
> > > CREATE TABLE fs_table (
> > >  ...
> > > ) WITH (
> > >  'connector' = 'filesystem',
> > >  'path' = 'file:///path/to/whatever',
> > >  'format' = 'csv',
> > >  'format.allow-comments' = 'true',
> > >  'format.ignore-parse-errors' = 'true'
> > > );
> > >
> > > Of course this is slightly different from regular Flink core
> > > configuration but a connector still needs to be configured based on
> > > these options.
> > >
> > > However, I think this FLIP violates our code style guidelines because
> > >
> > > 'format' = 'json',
> > > 'format.fail-on-missing-field' = 'false'
> > >
> > > is an invalid hierarchy. `format` cannot be a string and a top-level
> > > object at the same time.
> > >
> > > We have similar problems in our runtime configuration:
> > >
> > > state.backend=
> > > state.backend.incremental=
> > > restart-strategy=
> > > restart-strategy.fixed-delay.delay=
> > > high-availability=
> > > high-availability.cluster-id=
> > >
> > > The code style guide states "Think of the configuration as nested
> > > objects (JSON style)". So such hierarchies cannot be represented in a
> > > nested JSON style.
> > >
> > > Therefore, should we advocate instead:
> > >
> > > 'format.kind' = 'json',
> > > 'format.fail-on-missing-field' = 'false'
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Timo
> > >
> > > [1]
> > >
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> > >
> >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

dwysakowicz

Hi all,

I also wanted to share my opinion.

When talking about a ConfigOption hierarchy we use for configuring Flink cluster I would be a strong advocate for keeping a yaml/hocon/json/... compatible style. Those options are primarily read from a file and thus should at least try to follow common practices for nested formats if we ever decide to switch to one.

Here the question is about the properties we use in SQL statements. The origin/destination of these usually will be external catalog, usually in a flattened(key/value) representation so I agree it is not as important as in the aforementioned case. Nevertheless having a yaml based catalog or being able to have e.g. yaml based snapshots of a catalog in my opinion is appealing. At the same time cost of being able to have a nice yaml/hocon/json representation is just adding a single suffix to a single(at most 2 key + value) property. The question is between `format` = `json` vs `format.kind` = `json`. That said I'd be slighty in favor of doing it.

Just to have a full picture. Both cases can be represented in yaml, but the difference is significant:

format: 'json'
format.option: 'value'

vs

format:
    kind: 'json'

    option: 'value'


Best,
Dawid

On 29/04/2020 17:13, Flavio Pompermaier wrote:
Personally I don't have any preference here.  Compliance wih standard YAML parser is probably more important

On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
From a user's perspective, I prefer the shorter one "format=json", because
it's more concise and straightforward. The "kind" is redundant for users.
Is there a real case requires to represent the configuration in JSON style?
As far as I can see, I don't see such requirement, and everything works
fine by now.

So I'm in favor of "format=json". But if the community insist to follow
code style on this, I'm also fine with the longer one.

Btw, I also CC user mailing list to listen more user's feedback. Because I
think this is relative to usability.

Best,
Jark

On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> wrote:

>  > Therefore, should we advocate instead:
>  >
>  > 'format.kind' = 'json',
>  > 'format.fail-on-missing-field' = 'false'
>
> Yes. That's pretty much it.
>
> This is reasonable important to nail down as with such violations I
> believe we could not actually switch to a standard YAML parser.
>
> On 29/04/2020 16:05, Timo Walther wrote:
> > Hi everyone,
> >
> > discussions around ConfigOption seem to be very popular recently. So I
> > would also like to get some opinions on a different topic.
> >
> > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > agreed on the following DDL syntax:
> >
> > CREATE TABLE fs_table (
> >  ...
> > ) WITH (
> >  'connector' = 'filesystem',
> >  'path' = 'file:///path/to/whatever',
> >  'format' = 'csv',
> >  'format.allow-comments' = 'true',
> >  'format.ignore-parse-errors' = 'true'
> > );
> >
> > Of course this is slightly different from regular Flink core
> > configuration but a connector still needs to be configured based on
> > these options.
> >
> > However, I think this FLIP violates our code style guidelines because
> >
> > 'format' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > is an invalid hierarchy. `format` cannot be a string and a top-level
> > object at the same time.
> >
> > We have similar problems in our runtime configuration:
> >
> > state.backend=
> > state.backend.incremental=
> > restart-strategy=
> > restart-strategy.fixed-delay.delay=
> > high-availability=
> > high-availability.cluster-id=
> >
> > The code style guide states "Think of the configuration as nested
> > objects (JSON style)". So such hierarchies cannot be represented in a
> > nested JSON style.
> >
> > Therefore, should we advocate instead:
> >
> > 'format.kind' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >
>
>

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Benchao Li
Thanks Timo for staring the discussion.

Generally I like the idea to keep the config align with a standard like
json/yaml.

From the user's perspective, I don't use table configs from a config file
like yaml or json for now,
And it's ok to change it to yaml like style. Actually we didn't know that
this could be a yaml like
configuration hierarchy. If it has a hierarchy, we maybe consider that in
the future to load the
config from a yaml/json file.

Regarding the name,
'format.kind' looks fine to me. However there is another name from the top
of my head:
'format.name', WDYT?

Dawid Wysakowicz <[hidden email]> 于2020年4月29日周三 下午11:56写道:

> Hi all,
>
> I also wanted to share my opinion.
>
> When talking about a ConfigOption hierarchy we use for configuring Flink
> cluster I would be a strong advocate for keeping a yaml/hocon/json/...
> compatible style. Those options are primarily read from a file and thus
> should at least try to follow common practices for nested formats if we
> ever decide to switch to one.
>
> Here the question is about the properties we use in SQL statements. The
> origin/destination of these usually will be external catalog, usually in a
> flattened(key/value) representation so I agree it is not as important as in
> the aforementioned case. Nevertheless having a yaml based catalog or being
> able to have e.g. yaml based snapshots of a catalog in my opinion is
> appealing. At the same time cost of being able to have a nice
> yaml/hocon/json representation is just adding a single suffix to a
> single(at most 2 key + value) property. The question is between `format` =
> `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
> doing it.
>
> Just to have a full picture. Both cases can be represented in yaml, but
> the difference is significant:
> format: 'json'
> format.option: 'value'
>
> vs
> format:
>     kind: 'json'
>
>     option: 'value'
>
> Best,
> Dawid
>
> On 29/04/2020 17:13, Flavio Pompermaier wrote:
>
> Personally I don't have any preference here.  Compliance wih standard YAML
> parser is probably more important
>
> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
>
>> From a user's perspective, I prefer the shorter one "format=json", because
>> it's more concise and straightforward. The "kind" is redundant for users.
>> Is there a real case requires to represent the configuration in JSON
>> style?
>> As far as I can see, I don't see such requirement, and everything works
>> fine by now.
>>
>> So I'm in favor of "format=json". But if the community insist to follow
>> code style on this, I'm also fine with the longer one.
>>
>> Btw, I also CC user mailing list to listen more user's feedback. Because I
>> think this is relative to usability.
>>
>> Best,
>> Jark
>>
>> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]>
>> wrote:
>>
>> >  > Therefore, should we advocate instead:
>> >  >
>> >  > 'format.kind' = 'json',
>> >  > 'format.fail-on-missing-field' = 'false'
>> >
>> > Yes. That's pretty much it.
>> >
>> > This is reasonable important to nail down as with such violations I
>> > believe we could not actually switch to a standard YAML parser.
>> >
>> > On 29/04/2020 16:05, Timo Walther wrote:
>> > > Hi everyone,
>> > >
>> > > discussions around ConfigOption seem to be very popular recently. So I
>> > > would also like to get some opinions on a different topic.
>> > >
>> > > How do we represent hierarchies in ConfigOption? In FLIP-122, we
>> > > agreed on the following DDL syntax:
>> > >
>> > > CREATE TABLE fs_table (
>> > >  ...
>> > > ) WITH (
>> > >  'connector' = 'filesystem',
>> > >  'path' = 'file:///path/to/whatever',
>> > >  'format' = 'csv',
>> > >  'format.allow-comments' = 'true',
>> > >  'format.ignore-parse-errors' = 'true'
>> > > );
>> > >
>> > > Of course this is slightly different from regular Flink core
>> > > configuration but a connector still needs to be configured based on
>> > > these options.
>> > >
>> > > However, I think this FLIP violates our code style guidelines because
>> > >
>> > > 'format' = 'json',
>> > > 'format.fail-on-missing-field' = 'false'
>> > >
>> > > is an invalid hierarchy. `format` cannot be a string and a top-level
>> > > object at the same time.
>> > >
>> > > We have similar problems in our runtime configuration:
>> > >
>> > > state.backend=
>> > > state.backend.incremental=
>> > > restart-strategy=
>> > > restart-strategy.fixed-delay.delay=
>> > > high-availability=
>> > > high-availability.cluster-id=
>> > >
>> > > The code style guide states "Think of the configuration as nested
>> > > objects (JSON style)". So such hierarchies cannot be represented in a
>> > > nested JSON style.
>> > >
>> > > Therefore, should we advocate instead:
>> > >
>> > > 'format.kind' = 'json',
>> > > 'format.fail-on-missing-field' = 'false'
>> > >
>> > > What do you think?
>> > >
>> > > Thanks,
>> > > Timo
>> > >
>> > > [1]
>> > >
>> >
>> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
>> > >
>> >
>> >
>
>

--

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Jingsong Li
Thanks Timo for staring the discussion.

I am +1 for "format: 'json'".
Take a look to Dawid's yaml case:

connector: 'filesystem'
path: '...'
format: 'json'
format:
    option1: '...'
    option2: '...'
    option3: '...'

Is this work?
According to my understanding, 'format' key is the attribute of connector,
which can be separately configured outside. In the 'format' block, they are
the attribute of format.
So this json style block can only contain the properties exclude format
itself.

Best,
Jingsong Lee

On Thu, Apr 30, 2020 at 9:58 AM Benchao Li <[hidden email]> wrote:

> Thanks Timo for staring the discussion.
>
> Generally I like the idea to keep the config align with a standard like
> json/yaml.
>
> From the user's perspective, I don't use table configs from a config file
> like yaml or json for now,
> And it's ok to change it to yaml like style. Actually we didn't know that
> this could be a yaml like
> configuration hierarchy. If it has a hierarchy, we maybe consider that in
> the future to load the
> config from a yaml/json file.
>
> Regarding the name,
> 'format.kind' looks fine to me. However there is another name from the top
> of my head:
> 'format.name', WDYT?
>
> Dawid Wysakowicz <[hidden email]> 于2020年4月29日周三 下午11:56写道:
>
>> Hi all,
>>
>> I also wanted to share my opinion.
>>
>> When talking about a ConfigOption hierarchy we use for configuring Flink
>> cluster I would be a strong advocate for keeping a yaml/hocon/json/...
>> compatible style. Those options are primarily read from a file and thus
>> should at least try to follow common practices for nested formats if we
>> ever decide to switch to one.
>>
>> Here the question is about the properties we use in SQL statements. The
>> origin/destination of these usually will be external catalog, usually in a
>> flattened(key/value) representation so I agree it is not as important as in
>> the aforementioned case. Nevertheless having a yaml based catalog or being
>> able to have e.g. yaml based snapshots of a catalog in my opinion is
>> appealing. At the same time cost of being able to have a nice
>> yaml/hocon/json representation is just adding a single suffix to a
>> single(at most 2 key + value) property. The question is between `format` =
>> `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
>> doing it.
>>
>> Just to have a full picture. Both cases can be represented in yaml, but
>> the difference is significant:
>> format: 'json'
>> format.option: 'value'
>>
>> vs
>> format:
>>     kind: 'json'
>>
>>     option: 'value'
>>
>> Best,
>> Dawid
>>
>> On 29/04/2020 17:13, Flavio Pompermaier wrote:
>>
>> Personally I don't have any preference here.  Compliance wih standard
>> YAML parser is probably more important
>>
>> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
>>
>>> From a user's perspective, I prefer the shorter one "format=json",
>>> because
>>> it's more concise and straightforward. The "kind" is redundant for users.
>>> Is there a real case requires to represent the configuration in JSON
>>> style?
>>> As far as I can see, I don't see such requirement, and everything works
>>> fine by now.
>>>
>>> So I'm in favor of "format=json". But if the community insist to follow
>>> code style on this, I'm also fine with the longer one.
>>>
>>> Btw, I also CC user mailing list to listen more user's feedback. Because
>>> I
>>> think this is relative to usability.
>>>
>>> Best,
>>> Jark
>>>
>>> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]>
>>> wrote:
>>>
>>> >  > Therefore, should we advocate instead:
>>> >  >
>>> >  > 'format.kind' = 'json',
>>> >  > 'format.fail-on-missing-field' = 'false'
>>> >
>>> > Yes. That's pretty much it.
>>> >
>>> > This is reasonable important to nail down as with such violations I
>>> > believe we could not actually switch to a standard YAML parser.
>>> >
>>> > On 29/04/2020 16:05, Timo Walther wrote:
>>> > > Hi everyone,
>>> > >
>>> > > discussions around ConfigOption seem to be very popular recently. So
>>> I
>>> > > would also like to get some opinions on a different topic.
>>> > >
>>> > > How do we represent hierarchies in ConfigOption? In FLIP-122, we
>>> > > agreed on the following DDL syntax:
>>> > >
>>> > > CREATE TABLE fs_table (
>>> > >  ...
>>> > > ) WITH (
>>> > >  'connector' = 'filesystem',
>>> > >  'path' = 'file:///path/to/whatever',
>>> > >  'format' = 'csv',
>>> > >  'format.allow-comments' = 'true',
>>> > >  'format.ignore-parse-errors' = 'true'
>>> > > );
>>> > >
>>> > > Of course this is slightly different from regular Flink core
>>> > > configuration but a connector still needs to be configured based on
>>> > > these options.
>>> > >
>>> > > However, I think this FLIP violates our code style guidelines because
>>> > >
>>> > > 'format' = 'json',
>>> > > 'format.fail-on-missing-field' = 'false'
>>> > >
>>> > > is an invalid hierarchy. `format` cannot be a string and a top-level
>>> > > object at the same time.
>>> > >
>>> > > We have similar problems in our runtime configuration:
>>> > >
>>> > > state.backend=
>>> > > state.backend.incremental=
>>> > > restart-strategy=
>>> > > restart-strategy.fixed-delay.delay=
>>> > > high-availability=
>>> > > high-availability.cluster-id=
>>> > >
>>> > > The code style guide states "Think of the configuration as nested
>>> > > objects (JSON style)". So such hierarchies cannot be represented in a
>>> > > nested JSON style.
>>> > >
>>> > > Therefore, should we advocate instead:
>>> > >
>>> > > 'format.kind' = 'json',
>>> > > 'format.fail-on-missing-field' = 'false'
>>> > >
>>> > > What do you think?
>>> > >
>>> > > Thanks,
>>> > > Timo
>>> > >
>>> > > [1]
>>> > >
>>> >
>>> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
>>> > >
>>> >
>>> >
>>
>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: [hidden email]; [hidden email]
>
>

--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Forward Xu
In reply to this post by Benchao Li
Here I have a little doubt. At present, our json only supports the
conventional json format. If we need to implement json with bson, json with
avro, etc., how should we express it?
Do you need like the following:

‘format.name' = 'json',

‘format.json.fail-on-missing-field' = 'false'


‘format.name' = 'bson',

‘format.bson.fail-on-missing-field' = ‘false'


Best,

Forward

Benchao Li <[hidden email]> 于2020年4月30日周四 上午9:58写道:

> Thanks Timo for staring the discussion.
>
> Generally I like the idea to keep the config align with a standard like
> json/yaml.
>
> From the user's perspective, I don't use table configs from a config file
> like yaml or json for now,
> And it's ok to change it to yaml like style. Actually we didn't know that
> this could be a yaml like
> configuration hierarchy. If it has a hierarchy, we maybe consider that in
> the future to load the
> config from a yaml/json file.
>
> Regarding the name,
> 'format.kind' looks fine to me. However there is another name from the top
> of my head:
> 'format.name', WDYT?
>
> Dawid Wysakowicz <[hidden email]> 于2020年4月29日周三 下午11:56写道:
>
> > Hi all,
> >
> > I also wanted to share my opinion.
> >
> > When talking about a ConfigOption hierarchy we use for configuring Flink
> > cluster I would be a strong advocate for keeping a yaml/hocon/json/...
> > compatible style. Those options are primarily read from a file and thus
> > should at least try to follow common practices for nested formats if we
> > ever decide to switch to one.
> >
> > Here the question is about the properties we use in SQL statements. The
> > origin/destination of these usually will be external catalog, usually in
> a
> > flattened(key/value) representation so I agree it is not as important as
> in
> > the aforementioned case. Nevertheless having a yaml based catalog or
> being
> > able to have e.g. yaml based snapshots of a catalog in my opinion is
> > appealing. At the same time cost of being able to have a nice
> > yaml/hocon/json representation is just adding a single suffix to a
> > single(at most 2 key + value) property. The question is between `format`
> =
> > `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
> > doing it.
> >
> > Just to have a full picture. Both cases can be represented in yaml, but
> > the difference is significant:
> > format: 'json'
> > format.option: 'value'
> >
> > vs
> > format:
> >     kind: 'json'
> >
> >     option: 'value'
> >
> > Best,
> > Dawid
> >
> > On 29/04/2020 17:13, Flavio Pompermaier wrote:
> >
> > Personally I don't have any preference here.  Compliance wih standard
> YAML
> > parser is probably more important
> >
> > On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
> >
> >> From a user's perspective, I prefer the shorter one "format=json",
> because
> >> it's more concise and straightforward. The "kind" is redundant for
> users.
> >> Is there a real case requires to represent the configuration in JSON
> >> style?
> >> As far as I can see, I don't see such requirement, and everything works
> >> fine by now.
> >>
> >> So I'm in favor of "format=json". But if the community insist to follow
> >> code style on this, I'm also fine with the longer one.
> >>
> >> Btw, I also CC user mailing list to listen more user's feedback.
> Because I
> >> think this is relative to usability.
> >>
> >> Best,
> >> Jark
> >>
> >> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]>
> >> wrote:
> >>
> >> >  > Therefore, should we advocate instead:
> >> >  >
> >> >  > 'format.kind' = 'json',
> >> >  > 'format.fail-on-missing-field' = 'false'
> >> >
> >> > Yes. That's pretty much it.
> >> >
> >> > This is reasonable important to nail down as with such violations I
> >> > believe we could not actually switch to a standard YAML parser.
> >> >
> >> > On 29/04/2020 16:05, Timo Walther wrote:
> >> > > Hi everyone,
> >> > >
> >> > > discussions around ConfigOption seem to be very popular recently.
> So I
> >> > > would also like to get some opinions on a different topic.
> >> > >
> >> > > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> >> > > agreed on the following DDL syntax:
> >> > >
> >> > > CREATE TABLE fs_table (
> >> > >  ...
> >> > > ) WITH (
> >> > >  'connector' = 'filesystem',
> >> > >  'path' = 'file:///path/to/whatever',
> >> > >  'format' = 'csv',
> >> > >  'format.allow-comments' = 'true',
> >> > >  'format.ignore-parse-errors' = 'true'
> >> > > );
> >> > >
> >> > > Of course this is slightly different from regular Flink core
> >> > > configuration but a connector still needs to be configured based on
> >> > > these options.
> >> > >
> >> > > However, I think this FLIP violates our code style guidelines
> because
> >> > >
> >> > > 'format' = 'json',
> >> > > 'format.fail-on-missing-field' = 'false'
> >> > >
> >> > > is an invalid hierarchy. `format` cannot be a string and a top-level
> >> > > object at the same time.
> >> > >
> >> > > We have similar problems in our runtime configuration:
> >> > >
> >> > > state.backend=
> >> > > state.backend.incremental=
> >> > > restart-strategy=
> >> > > restart-strategy.fixed-delay.delay=
> >> > > high-availability=
> >> > > high-availability.cluster-id=
> >> > >
> >> > > The code style guide states "Think of the configuration as nested
> >> > > objects (JSON style)". So such hierarchies cannot be represented in
> a
> >> > > nested JSON style.
> >> > >
> >> > > Therefore, should we advocate instead:
> >> > >
> >> > > 'format.kind' = 'json',
> >> > > 'format.fail-on-missing-field' = 'false'
> >> > >
> >> > > What do you think?
> >> > >
> >> > > Thanks,
> >> > > Timo
> >> > >
> >> > > [1]
> >> > >
> >> >
> >>
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >> > >
> >> >
> >> >
> >
> >
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: [hidden email]; [hidden email]
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Kurt Young
In reply to this post by Jingsong Li
IIUC FLIP-122 already delegate the responsibility for designing and parsing
connector properties to connector developers.
So frankly speaking, no matter which style we choose, there is no strong
guarantee for either of these. So it's also possible
that developers can choose a totally different way to express properties,
such as:

'format' = 'csv',
'csv.allow-comments' = 'true',
'csv.ignore-parse-errors' = 'true'

which also seems quite straightforward and easy to use. So my opinion on
this would be since there is no guarantee for developers
to choose "format" as common prefix of all format related properties, there
is not much value to extend 'format' to 'format.kind'.


Best,
Kurt


On Thu, Apr 30, 2020 at 10:17 AM Jingsong Li <[hidden email]> wrote:

> Thanks Timo for staring the discussion.
>
> I am +1 for "format: 'json'".
> Take a look to Dawid's yaml case:
>
> connector: 'filesystem'
> path: '...'
> format: 'json'
> format:
>     option1: '...'
>     option2: '...'
>     option3: '...'
>
> Is this work?
> According to my understanding, 'format' key is the attribute of connector,
> which can be separately configured outside. In the 'format' block, they are
> the attribute of format.
> So this json style block can only contain the properties exclude format
> itself.
>
> Best,
> Jingsong Lee
>
> On Thu, Apr 30, 2020 at 9:58 AM Benchao Li <[hidden email]> wrote:
>
>> Thanks Timo for staring the discussion.
>>
>> Generally I like the idea to keep the config align with a standard like
>> json/yaml.
>>
>> From the user's perspective, I don't use table configs from a config file
>> like yaml or json for now,
>> And it's ok to change it to yaml like style. Actually we didn't know that
>> this could be a yaml like
>> configuration hierarchy. If it has a hierarchy, we maybe consider that in
>> the future to load the
>> config from a yaml/json file.
>>
>> Regarding the name,
>> 'format.kind' looks fine to me. However there is another name from the
>> top of my head:
>> 'format.name', WDYT?
>>
>> Dawid Wysakowicz <[hidden email]> 于2020年4月29日周三 下午11:56写道:
>>
>>> Hi all,
>>>
>>> I also wanted to share my opinion.
>>>
>>> When talking about a ConfigOption hierarchy we use for configuring Flink
>>> cluster I would be a strong advocate for keeping a yaml/hocon/json/...
>>> compatible style. Those options are primarily read from a file and thus
>>> should at least try to follow common practices for nested formats if we
>>> ever decide to switch to one.
>>>
>>> Here the question is about the properties we use in SQL statements. The
>>> origin/destination of these usually will be external catalog, usually in a
>>> flattened(key/value) representation so I agree it is not as important as in
>>> the aforementioned case. Nevertheless having a yaml based catalog or being
>>> able to have e.g. yaml based snapshots of a catalog in my opinion is
>>> appealing. At the same time cost of being able to have a nice
>>> yaml/hocon/json representation is just adding a single suffix to a
>>> single(at most 2 key + value) property. The question is between `format` =
>>> `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
>>> doing it.
>>>
>>> Just to have a full picture. Both cases can be represented in yaml, but
>>> the difference is significant:
>>> format: 'json'
>>> format.option: 'value'
>>>
>>> vs
>>> format:
>>>     kind: 'json'
>>>
>>>     option: 'value'
>>>
>>> Best,
>>> Dawid
>>>
>>> On 29/04/2020 17:13, Flavio Pompermaier wrote:
>>>
>>> Personally I don't have any preference here.  Compliance wih standard
>>> YAML parser is probably more important
>>>
>>> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
>>>
>>>> From a user's perspective, I prefer the shorter one "format=json",
>>>> because
>>>> it's more concise and straightforward. The "kind" is redundant for
>>>> users.
>>>> Is there a real case requires to represent the configuration in JSON
>>>> style?
>>>> As far as I can see, I don't see such requirement, and everything works
>>>> fine by now.
>>>>
>>>> So I'm in favor of "format=json". But if the community insist to follow
>>>> code style on this, I'm also fine with the longer one.
>>>>
>>>> Btw, I also CC user mailing list to listen more user's feedback.
>>>> Because I
>>>> think this is relative to usability.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]>
>>>> wrote:
>>>>
>>>> >  > Therefore, should we advocate instead:
>>>> >  >
>>>> >  > 'format.kind' = 'json',
>>>> >  > 'format.fail-on-missing-field' = 'false'
>>>> >
>>>> > Yes. That's pretty much it.
>>>> >
>>>> > This is reasonable important to nail down as with such violations I
>>>> > believe we could not actually switch to a standard YAML parser.
>>>> >
>>>> > On 29/04/2020 16:05, Timo Walther wrote:
>>>> > > Hi everyone,
>>>> > >
>>>> > > discussions around ConfigOption seem to be very popular recently.
>>>> So I
>>>> > > would also like to get some opinions on a different topic.
>>>> > >
>>>> > > How do we represent hierarchies in ConfigOption? In FLIP-122, we
>>>> > > agreed on the following DDL syntax:
>>>> > >
>>>> > > CREATE TABLE fs_table (
>>>> > >  ...
>>>> > > ) WITH (
>>>> > >  'connector' = 'filesystem',
>>>> > >  'path' = 'file:///path/to/whatever',
>>>> > >  'format' = 'csv',
>>>> > >  'format.allow-comments' = 'true',
>>>> > >  'format.ignore-parse-errors' = 'true'
>>>> > > );
>>>> > >
>>>> > > Of course this is slightly different from regular Flink core
>>>> > > configuration but a connector still needs to be configured based on
>>>> > > these options.
>>>> > >
>>>> > > However, I think this FLIP violates our code style guidelines
>>>> because
>>>> > >
>>>> > > 'format' = 'json',
>>>> > > 'format.fail-on-missing-field' = 'false'
>>>> > >
>>>> > > is an invalid hierarchy. `format` cannot be a string and a top-level
>>>> > > object at the same time.
>>>> > >
>>>> > > We have similar problems in our runtime configuration:
>>>> > >
>>>> > > state.backend=
>>>> > > state.backend.incremental=
>>>> > > restart-strategy=
>>>> > > restart-strategy.fixed-delay.delay=
>>>> > > high-availability=
>>>> > > high-availability.cluster-id=
>>>> > >
>>>> > > The code style guide states "Think of the configuration as nested
>>>> > > objects (JSON style)". So such hierarchies cannot be represented in
>>>> a
>>>> > > nested JSON style.
>>>> > >
>>>> > > Therefore, should we advocate instead:
>>>> > >
>>>> > > 'format.kind' = 'json',
>>>> > > 'format.fail-on-missing-field' = 'false'
>>>> > >
>>>> > > What do you think?
>>>> > >
>>>> > > Thanks,
>>>> > > Timo
>>>> > >
>>>> > > [1]
>>>> > >
>>>> >
>>>> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
>>>> > >
>>>> >
>>>> >
>>>
>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: [hidden email]; [hidden email]
>>
>>
>
> --
> Best, Jingsong Lee
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Jingsong Li
In reply to this post by Jingsong Li
Sorry for mistake,

I proposal:

connector: 'filesystem'
path: '...'
format: 'json'
format.option:
    option1: '...'
    option2: '...'
    option3: '...'

And I think most of cases, users just need configure 'format' key, we
should make it convenient for them. There is no big problem in making
format options more complex.

And for Kafka key and value, we can:

connector: 'kafka'
key.format: 'json'
key.format.option:
    option1: '...'
    option2: '...'
value.format: 'json'
value.format.option:
    option1: '...'
    option2: '...'

Best,
Jingsong Lee

On Thu, Apr 30, 2020 at 10:16 AM Jingsong Li <[hidden email]> wrote:

> Thanks Timo for staring the discussion.
>
> I am +1 for "format: 'json'".
> Take a look to Dawid's yaml case:
>
> connector: 'filesystem'
> path: '...'
> format: 'json'
> format:
>     option1: '...'
>     option2: '...'
>     option3: '...'
>
> Is this work?
> According to my understanding, 'format' key is the attribute of connector,
> which can be separately configured outside. In the 'format' block, they are
> the attribute of format.
> So this json style block can only contain the properties exclude format
> itself.
>
> Best,
> Jingsong Lee
>
> On Thu, Apr 30, 2020 at 9:58 AM Benchao Li <[hidden email]> wrote:
>
>> Thanks Timo for staring the discussion.
>>
>> Generally I like the idea to keep the config align with a standard like
>> json/yaml.
>>
>> From the user's perspective, I don't use table configs from a config file
>> like yaml or json for now,
>> And it's ok to change it to yaml like style. Actually we didn't know that
>> this could be a yaml like
>> configuration hierarchy. If it has a hierarchy, we maybe consider that in
>> the future to load the
>> config from a yaml/json file.
>>
>> Regarding the name,
>> 'format.kind' looks fine to me. However there is another name from the
>> top of my head:
>> 'format.name', WDYT?
>>
>> Dawid Wysakowicz <[hidden email]> 于2020年4月29日周三 下午11:56写道:
>>
>>> Hi all,
>>>
>>> I also wanted to share my opinion.
>>>
>>> When talking about a ConfigOption hierarchy we use for configuring Flink
>>> cluster I would be a strong advocate for keeping a yaml/hocon/json/...
>>> compatible style. Those options are primarily read from a file and thus
>>> should at least try to follow common practices for nested formats if we
>>> ever decide to switch to one.
>>>
>>> Here the question is about the properties we use in SQL statements. The
>>> origin/destination of these usually will be external catalog, usually in a
>>> flattened(key/value) representation so I agree it is not as important as in
>>> the aforementioned case. Nevertheless having a yaml based catalog or being
>>> able to have e.g. yaml based snapshots of a catalog in my opinion is
>>> appealing. At the same time cost of being able to have a nice
>>> yaml/hocon/json representation is just adding a single suffix to a
>>> single(at most 2 key + value) property. The question is between `format` =
>>> `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
>>> doing it.
>>>
>>> Just to have a full picture. Both cases can be represented in yaml, but
>>> the difference is significant:
>>> format: 'json'
>>> format.option: 'value'
>>>
>>> vs
>>> format:
>>>     kind: 'json'
>>>
>>>     option: 'value'
>>>
>>> Best,
>>> Dawid
>>>
>>> On 29/04/2020 17:13, Flavio Pompermaier wrote:
>>>
>>> Personally I don't have any preference here.  Compliance wih standard
>>> YAML parser is probably more important
>>>
>>> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> wrote:
>>>
>>>> From a user's perspective, I prefer the shorter one "format=json",
>>>> because
>>>> it's more concise and straightforward. The "kind" is redundant for
>>>> users.
>>>> Is there a real case requires to represent the configuration in JSON
>>>> style?
>>>> As far as I can see, I don't see such requirement, and everything works
>>>> fine by now.
>>>>
>>>> So I'm in favor of "format=json". But if the community insist to follow
>>>> code style on this, I'm also fine with the longer one.
>>>>
>>>> Btw, I also CC user mailing list to listen more user's feedback.
>>>> Because I
>>>> think this is relative to usability.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]>
>>>> wrote:
>>>>
>>>> >  > Therefore, should we advocate instead:
>>>> >  >
>>>> >  > 'format.kind' = 'json',
>>>> >  > 'format.fail-on-missing-field' = 'false'
>>>> >
>>>> > Yes. That's pretty much it.
>>>> >
>>>> > This is reasonable important to nail down as with such violations I
>>>> > believe we could not actually switch to a standard YAML parser.
>>>> >
>>>> > On 29/04/2020 16:05, Timo Walther wrote:
>>>> > > Hi everyone,
>>>> > >
>>>> > > discussions around ConfigOption seem to be very popular recently.
>>>> So I
>>>> > > would also like to get some opinions on a different topic.
>>>> > >
>>>> > > How do we represent hierarchies in ConfigOption? In FLIP-122, we
>>>> > > agreed on the following DDL syntax:
>>>> > >
>>>> > > CREATE TABLE fs_table (
>>>> > >  ...
>>>> > > ) WITH (
>>>> > >  'connector' = 'filesystem',
>>>> > >  'path' = 'file:///path/to/whatever',
>>>> > >  'format' = 'csv',
>>>> > >  'format.allow-comments' = 'true',
>>>> > >  'format.ignore-parse-errors' = 'true'
>>>> > > );
>>>> > >
>>>> > > Of course this is slightly different from regular Flink core
>>>> > > configuration but a connector still needs to be configured based on
>>>> > > these options.
>>>> > >
>>>> > > However, I think this FLIP violates our code style guidelines
>>>> because
>>>> > >
>>>> > > 'format' = 'json',
>>>> > > 'format.fail-on-missing-field' = 'false'
>>>> > >
>>>> > > is an invalid hierarchy. `format` cannot be a string and a top-level
>>>> > > object at the same time.
>>>> > >
>>>> > > We have similar problems in our runtime configuration:
>>>> > >
>>>> > > state.backend=
>>>> > > state.backend.incremental=
>>>> > > restart-strategy=
>>>> > > restart-strategy.fixed-delay.delay=
>>>> > > high-availability=
>>>> > > high-availability.cluster-id=
>>>> > >
>>>> > > The code style guide states "Think of the configuration as nested
>>>> > > objects (JSON style)". So such hierarchies cannot be represented in
>>>> a
>>>> > > nested JSON style.
>>>> > >
>>>> > > Therefore, should we advocate instead:
>>>> > >
>>>> > > 'format.kind' = 'json',
>>>> > > 'format.fail-on-missing-field' = 'false'
>>>> > >
>>>> > > What do you think?
>>>> > >
>>>> > > Thanks,
>>>> > > Timo
>>>> > >
>>>> > > [1]
>>>> > >
>>>> >
>>>> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
>>>> > >
>>>> >
>>>> >
>>>
>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: [hidden email]; [hidden email]
>>
>>
>
> --
> Best, Jingsong Lee
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

dwysakowicz
In reply to this post by Forward Xu

Hi all,

I'd like to start with a comment that I am ok with the current state of the FLIP-122 if there is a strong preference for it. Nevertheless I still like the idea of adding `type` to the `format` to have it as `format.type` = `json`.

I wanted to clarify a few things though:

@Jingsong As far as I see it most of the users copy/paste the properties from the documentation to the SQL, so I don't think additional four characters are too cumbersome. Plus if you force the additional suffix onto all the options of a format you introduce way more boilerplate than if we added the `type/kind/name`

@Kurt I agree that we cannot force it, but I think it is more of a question to set standards/implicit contracts on the properties. What you described with

'format' = 'csv',
'csv.allow-comments' = 'true',
'csv.ignore-parse-errors' = 'true'

would not work though as the `format` prefix is mandatory in the sources as only the properties with format will be passed to the format factory in majority of cases. We already have some implicit contracts.

@Forward I did not necessarily get the example. Aren't json and bson two separate formats? Do you mean you can have those two at the same time? Why do you need to differentiate the options for each? The way I see it is:

‘format(.name)' = 'json',
‘format.fail-on-missing-field' = 'false'

or

‘format(.name)' = 'bson',
‘format.fail-on-missing-field' = 'false'

@Benchao I'd be fine with any of name, kind, type(this we already had in the past)

Best,
Dawid

On 30/04/2020 04:17, Forward Xu wrote:
Here I have a little doubt. At present, our json only supports the
conventional json format. If we need to implement json with bson, json with
avro, etc., how should we express it?
Do you need like the following:

‘format.name' = 'json',

‘format.json.fail-on-missing-field' = 'false'


‘format.name' = 'bson',

‘format.bson.fail-on-missing-field' = ‘false'


Best,

Forward

Benchao Li [hidden email] 于2020年4月30日周四 上午9:58写道:

Thanks Timo for staring the discussion.

Generally I like the idea to keep the config align with a standard like
json/yaml.

From the user's perspective, I don't use table configs from a config file
like yaml or json for now,
And it's ok to change it to yaml like style. Actually we didn't know that
this could be a yaml like
configuration hierarchy. If it has a hierarchy, we maybe consider that in
the future to load the
config from a yaml/json file.

Regarding the name,
'format.kind' looks fine to me. However there is another name from the top
of my head:
'format.name', WDYT?

Dawid Wysakowicz [hidden email] 于2020年4月29日周三 下午11:56写道:

Hi all,

I also wanted to share my opinion.

When talking about a ConfigOption hierarchy we use for configuring Flink
cluster I would be a strong advocate for keeping a yaml/hocon/json/...
compatible style. Those options are primarily read from a file and thus
should at least try to follow common practices for nested formats if we
ever decide to switch to one.

Here the question is about the properties we use in SQL statements. The
origin/destination of these usually will be external catalog, usually in
a
flattened(key/value) representation so I agree it is not as important as
in
the aforementioned case. Nevertheless having a yaml based catalog or
being
able to have e.g. yaml based snapshots of a catalog in my opinion is
appealing. At the same time cost of being able to have a nice
yaml/hocon/json representation is just adding a single suffix to a
single(at most 2 key + value) property. The question is between `format`
=
`json` vs `format.kind` = `json`. That said I'd be slighty in favor of
doing it.

Just to have a full picture. Both cases can be represented in yaml, but
the difference is significant:
format: 'json'
format.option: 'value'

vs
format:
    kind: 'json'

    option: 'value'

Best,
Dawid

On 29/04/2020 17:13, Flavio Pompermaier wrote:

Personally I don't have any preference here.  Compliance wih standard
YAML
parser is probably more important

On Wed, Apr 29, 2020 at 5:10 PM Jark Wu [hidden email] wrote:

From a user's perspective, I prefer the shorter one "format=json",
because
it's more concise and straightforward. The "kind" is redundant for
users.
Is there a real case requires to represent the configuration in JSON
style?
As far as I can see, I don't see such requirement, and everything works
fine by now.

So I'm in favor of "format=json". But if the community insist to follow
code style on this, I'm also fine with the longer one.

Btw, I also CC user mailing list to listen more user's feedback.
Because I
think this is relative to usability.

Best,
Jark

On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler [hidden email]
wrote:

 > Therefore, should we advocate instead:
 >
 > 'format.kind' = 'json',
 > 'format.fail-on-missing-field' = 'false'

Yes. That's pretty much it.

This is reasonable important to nail down as with such violations I
believe we could not actually switch to a standard YAML parser.

On 29/04/2020 16:05, Timo Walther wrote:
Hi everyone,

discussions around ConfigOption seem to be very popular recently.
So I
would also like to get some opinions on a different topic.

How do we represent hierarchies in ConfigOption? In FLIP-122, we
agreed on the following DDL syntax:

CREATE TABLE fs_table (
 ...
) WITH (
 'connector' = 'filesystem',
 'path' = 'file:///path/to/whatever',
 'format' = 'csv',
 'format.allow-comments' = 'true',
 'format.ignore-parse-errors' = 'true'
);

Of course this is slightly different from regular Flink core
configuration but a connector still needs to be configured based on
these options.

However, I think this FLIP violates our code style guidelines
because
'format' = 'json',
'format.fail-on-missing-field' = 'false'

is an invalid hierarchy. `format` cannot be a string and a top-level
object at the same time.

We have similar problems in our runtime configuration:

state.backend=
state.backend.incremental=
restart-strategy=
restart-strategy.fixed-delay.delay=
high-availability=
high-availability.cluster-id=

The code style guide states "Think of the configuration as nested
objects (JSON style)". So such hierarchies cannot be represented in
a
nested JSON style.

Therefore, should we advocate instead:

'format.kind' = 'json',
'format.fail-on-missing-field' = 'false'

What do you think?

Thanks,
Timo

[1]


            

          
https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes

              


--

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
<a class="moz-txt-link-freetext" href="Tel:+86-15650713730">Tel:+86-15650713730
Email: [hidden email]; [hidden email]


    

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Jark Wu-2
Hi Dawid,

I just want to mention one of your response,

> What you described with
> 'format' = 'csv',
> 'csv.allow-comments' = 'true',
> 'csv.ignore-parse-errors' = 'true'
> would not work though as the `format` prefix is mandatory in the sources
as only the properties with format
>  will be passed to the format factory in majority of cases. We already
have some implicit contracts.

IIUC, in FLIP-95 and FLIP-122, the property key style are totally decided
by connectors, not the framework.
So I custom connector can define above properties, and extract the value of
'format', i.e. 'csv', to find the format factory.
And extract the properties with `csv.` prefix and remove the prefix, and
pass the properties (e.g. 'allow-comments' = 'true')
into the format factory to create format.

So there is no a strict guarantee to have a "nested JSON style" properties.
Users can still develop a custom connector with this
un-hierarchy properties and works well.

'format' = 'json',
'format.fail-on-missing-field' = 'false'

Best,
Jark


On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz <[hidden email]>
wrote:

> Hi all,
>
> I'd like to start with a comment that I am ok with the current state of
> the FLIP-122 if there is a strong preference for it. Nevertheless I still
> like the idea of adding `type` to the `format` to have it as `format.type`
> = `json`.
>
> I wanted to clarify a few things though:
>
> @Jingsong As far as I see it most of the users copy/paste the properties
> from the documentation to the SQL, so I don't think additional four
> characters are too cumbersome. Plus if you force the additional suffix onto
> all the options of a format you introduce way more boilerplate than if we
> added the `type/kind/name`
>
> @Kurt I agree that we cannot force it, but I think it is more of a
> question to set standards/implicit contracts on the properties. What you
> described with
> 'format' = 'csv',
> 'csv.allow-comments' = 'true',
> 'csv.ignore-parse-errors' = 'true'
>
> would not work though as the `format` prefix is mandatory in the sources
> as only the properties with format will be passed to the format factory in
> majority of cases. We already have some implicit contracts.
>
> @Forward I did not necessarily get the example. Aren't json and bson two
> separate formats? Do you mean you can have those two at the same time? Why
> do you need to differentiate the options for each? The way I see it is:
>
> ‘format(.name)' = 'json',
> ‘format.fail-on-missing-field' = 'false'
>
> or
>
> ‘format(.name)' = 'bson',
> ‘format.fail-on-missing-field' = 'false'
>
> @Benchao I'd be fine with any of name, kind, type(this we already had in
> the past)
>
> Best,
> Dawid
>
> On 30/04/2020 04:17, Forward Xu wrote:
>
> Here I have a little doubt. At present, our json only supports the
> conventional json format. If we need to implement json with bson, json with
> avro, etc., how should we express it?
> Do you need like the following:
>
> ‘format.name' = 'json',
>
> ‘format.json.fail-on-missing-field' = 'false'
>
>
> ‘format.name' = 'bson',
>
> ‘format.bson.fail-on-missing-field' = ‘false'
>
>
> Best,
>
> Forward
>
> Benchao Li <[hidden email]> <[hidden email]> 于2020年4月30日周四 上午9:58写道:
>
>
> Thanks Timo for staring the discussion.
>
> Generally I like the idea to keep the config align with a standard like
> json/yaml.
>
> From the user's perspective, I don't use table configs from a config file
> like yaml or json for now,
> And it's ok to change it to yaml like style. Actually we didn't know that
> this could be a yaml like
> configuration hierarchy. If it has a hierarchy, we maybe consider that in
> the future to load the
> config from a yaml/json file.
>
> Regarding the name,
> 'format.kind' looks fine to me. However there is another name from the top
> of my head:
> 'format.name', WDYT?
>
> Dawid Wysakowicz <[hidden email]> <[hidden email]> 于2020年4月29日周三 下午11:56写道:
>
>
> Hi all,
>
> I also wanted to share my opinion.
>
> When talking about a ConfigOption hierarchy we use for configuring Flink
> cluster I would be a strong advocate for keeping a yaml/hocon/json/...
> compatible style. Those options are primarily read from a file and thus
> should at least try to follow common practices for nested formats if we
> ever decide to switch to one.
>
> Here the question is about the properties we use in SQL statements. The
> origin/destination of these usually will be external catalog, usually in
>
> a
>
> flattened(key/value) representation so I agree it is not as important as
>
> in
>
> the aforementioned case. Nevertheless having a yaml based catalog or
>
> being
>
> able to have e.g. yaml based snapshots of a catalog in my opinion is
> appealing. At the same time cost of being able to have a nice
> yaml/hocon/json representation is just adding a single suffix to a
> single(at most 2 key + value) property. The question is between `format`
>
> =
>
> `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
> doing it.
>
> Just to have a full picture. Both cases can be represented in yaml, but
> the difference is significant:
> format: 'json'
> format.option: 'value'
>
> vs
> format:
>     kind: 'json'
>
>     option: 'value'
>
> Best,
> Dawid
>
> On 29/04/2020 17:13, Flavio Pompermaier wrote:
>
> Personally I don't have any preference here.  Compliance wih standard
>
> YAML
>
> parser is probably more important
>
> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> <[hidden email]> wrote:
>
>
> From a user's perspective, I prefer the shorter one "format=json",
>
> because
>
> it's more concise and straightforward. The "kind" is redundant for
>
> users.
>
> Is there a real case requires to represent the configuration in JSON
> style?
> As far as I can see, I don't see such requirement, and everything works
> fine by now.
>
> So I'm in favor of "format=json". But if the community insist to follow
> code style on this, I'm also fine with the longer one.
>
> Btw, I also CC user mailing list to listen more user's feedback.
>
> Because I
>
> think this is relative to usability.
>
> Best,
> Jark
>
> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> <[hidden email]>
> wrote:
>
>
>  > Therefore, should we advocate instead:
>  >
>  > 'format.kind' = 'json',
>  > 'format.fail-on-missing-field' = 'false'
>
> Yes. That's pretty much it.
>
> This is reasonable important to nail down as with such violations I
> believe we could not actually switch to a standard YAML parser.
>
> On 29/04/2020 16:05, Timo Walther wrote:
>
> Hi everyone,
>
> discussions around ConfigOption seem to be very popular recently.
>
> So I
>
> would also like to get some opinions on a different topic.
>
> How do we represent hierarchies in ConfigOption? In FLIP-122, we
> agreed on the following DDL syntax:
>
> CREATE TABLE fs_table (
>  ...
> ) WITH (
>  'connector' = 'filesystem',
>  'path' = 'file:///path/to/whatever',
>  'format' = 'csv',
>  'format.allow-comments' = 'true',
>  'format.ignore-parse-errors' = 'true'
> );
>
> Of course this is slightly different from regular Flink core
> configuration but a connector still needs to be configured based on
> these options.
>
> However, I think this FLIP violates our code style guidelines
>
> because
>
> 'format' = 'json',
> 'format.fail-on-missing-field' = 'false'
>
> is an invalid hierarchy. `format` cannot be a string and a top-level
> object at the same time.
>
> We have similar problems in our runtime configuration:
>
> state.backend=
> state.backend.incremental=
> restart-strategy=
> restart-strategy.fixed-delay.delay=
> high-availability=
> high-availability.cluster-id=
>
> The code style guide states "Think of the configuration as nested
> objects (JSON style)". So such hierarchies cannot be represented in
>
> a
>
> nested JSON style.
>
> Therefore, should we advocate instead:
>
> 'format.kind' = 'json',
> 'format.fail-on-missing-field' = 'false'
>
> What do you think?
>
> Thanks,
> Timo
>
> [1]
>
>
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking UniversityTel:+86-15650713730
> Email: [hidden email]; [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Timo Walther-2
Hi Jark,

yes, in theory every connector can design options as they like. But for
user experience and good coding style we should be consistent in Flink
connectors and configuration. Because implementers of new connectors
will copy the design of existing ones.

Furthermore, I could image that people in the DataStream API would also
like to configure their connector based on options in the near future.
It might be the case that Flink DataStream API connectors will reuse the
ConfigOptions from Table API for consistency.

I'm favoring either:

format.kind = json
format.fail-on-missing-field: true

Or:

format = json
json.fail-on-missing-field: true

Both are valid hierarchies.

Regards,
Timo


On 30.04.20 17:57, Jark Wu wrote:

> Hi Dawid,
>
> I just want to mention one of your response,
>
>> What you described with
>> 'format' = 'csv',
>> 'csv.allow-comments' = 'true',
>> 'csv.ignore-parse-errors' = 'true'
>> would not work though as the `format` prefix is mandatory in the sources
> as only the properties with format
>>   will be passed to the format factory in majority of cases. We already
> have some implicit contracts.
>
> IIUC, in FLIP-95 and FLIP-122, the property key style are totally decided
> by connectors, not the framework.
> So I custom connector can define above properties, and extract the value of
> 'format', i.e. 'csv', to find the format factory.
> And extract the properties with `csv.` prefix and remove the prefix, and
> pass the properties (e.g. 'allow-comments' = 'true')
> into the format factory to create format.
>
> So there is no a strict guarantee to have a "nested JSON style" properties.
> Users can still develop a custom connector with this
> un-hierarchy properties and works well.
>
> 'format' = 'json',
> 'format.fail-on-missing-field' = 'false'
>
> Best,
> Jark
>
>
> On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz <[hidden email]>
> wrote:
>
>> Hi all,
>>
>> I'd like to start with a comment that I am ok with the current state of
>> the FLIP-122 if there is a strong preference for it. Nevertheless I still
>> like the idea of adding `type` to the `format` to have it as `format.type`
>> = `json`.
>>
>> I wanted to clarify a few things though:
>>
>> @Jingsong As far as I see it most of the users copy/paste the properties
>> from the documentation to the SQL, so I don't think additional four
>> characters are too cumbersome. Plus if you force the additional suffix onto
>> all the options of a format you introduce way more boilerplate than if we
>> added the `type/kind/name`
>>
>> @Kurt I agree that we cannot force it, but I think it is more of a
>> question to set standards/implicit contracts on the properties. What you
>> described with
>> 'format' = 'csv',
>> 'csv.allow-comments' = 'true',
>> 'csv.ignore-parse-errors' = 'true'
>>
>> would not work though as the `format` prefix is mandatory in the sources
>> as only the properties with format will be passed to the format factory in
>> majority of cases. We already have some implicit contracts.
>>
>> @Forward I did not necessarily get the example. Aren't json and bson two
>> separate formats? Do you mean you can have those two at the same time? Why
>> do you need to differentiate the options for each? The way I see it is:
>>
>> ‘format(.name)' = 'json',
>> ‘format.fail-on-missing-field' = 'false'
>>
>> or
>>
>> ‘format(.name)' = 'bson',
>> ‘format.fail-on-missing-field' = 'false'
>>
>> @Benchao I'd be fine with any of name, kind, type(this we already had in
>> the past)
>>
>> Best,
>> Dawid
>>
>> On 30/04/2020 04:17, Forward Xu wrote:
>>
>> Here I have a little doubt. At present, our json only supports the
>> conventional json format. If we need to implement json with bson, json with
>> avro, etc., how should we express it?
>> Do you need like the following:
>>
>> ‘format.name' = 'json',
>>
>> ‘format.json.fail-on-missing-field' = 'false'
>>
>>
>> ‘format.name' = 'bson',
>>
>> ‘format.bson.fail-on-missing-field' = ‘false'
>>
>>
>> Best,
>>
>> Forward
>>
>> Benchao Li <[hidden email]> <[hidden email]> 于2020年4月30日周四 上午9:58写道:
>>
>>
>> Thanks Timo for staring the discussion.
>>
>> Generally I like the idea to keep the config align with a standard like
>> json/yaml.
>>
>>  From the user's perspective, I don't use table configs from a config file
>> like yaml or json for now,
>> And it's ok to change it to yaml like style. Actually we didn't know that
>> this could be a yaml like
>> configuration hierarchy. If it has a hierarchy, we maybe consider that in
>> the future to load the
>> config from a yaml/json file.
>>
>> Regarding the name,
>> 'format.kind' looks fine to me. However there is another name from the top
>> of my head:
>> 'format.name', WDYT?
>>
>> Dawid Wysakowicz <[hidden email]> <[hidden email]> 于2020年4月29日周三 下午11:56写道:
>>
>>
>> Hi all,
>>
>> I also wanted to share my opinion.
>>
>> When talking about a ConfigOption hierarchy we use for configuring Flink
>> cluster I would be a strong advocate for keeping a yaml/hocon/json/...
>> compatible style. Those options are primarily read from a file and thus
>> should at least try to follow common practices for nested formats if we
>> ever decide to switch to one.
>>
>> Here the question is about the properties we use in SQL statements. The
>> origin/destination of these usually will be external catalog, usually in
>>
>> a
>>
>> flattened(key/value) representation so I agree it is not as important as
>>
>> in
>>
>> the aforementioned case. Nevertheless having a yaml based catalog or
>>
>> being
>>
>> able to have e.g. yaml based snapshots of a catalog in my opinion is
>> appealing. At the same time cost of being able to have a nice
>> yaml/hocon/json representation is just adding a single suffix to a
>> single(at most 2 key + value) property. The question is between `format`
>>
>> =
>>
>> `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
>> doing it.
>>
>> Just to have a full picture. Both cases can be represented in yaml, but
>> the difference is significant:
>> format: 'json'
>> format.option: 'value'
>>
>> vs
>> format:
>>      kind: 'json'
>>
>>      option: 'value'
>>
>> Best,
>> Dawid
>>
>> On 29/04/2020 17:13, Flavio Pompermaier wrote:
>>
>> Personally I don't have any preference here.  Compliance wih standard
>>
>> YAML
>>
>> parser is probably more important
>>
>> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> <[hidden email]> wrote:
>>
>>
>>  From a user's perspective, I prefer the shorter one "format=json",
>>
>> because
>>
>> it's more concise and straightforward. The "kind" is redundant for
>>
>> users.
>>
>> Is there a real case requires to represent the configuration in JSON
>> style?
>> As far as I can see, I don't see such requirement, and everything works
>> fine by now.
>>
>> So I'm in favor of "format=json". But if the community insist to follow
>> code style on this, I'm also fine with the longer one.
>>
>> Btw, I also CC user mailing list to listen more user's feedback.
>>
>> Because I
>>
>> think this is relative to usability.
>>
>> Best,
>> Jark
>>
>> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> <[hidden email]>
>> wrote:
>>
>>
>>   > Therefore, should we advocate instead:
>>   >
>>   > 'format.kind' = 'json',
>>   > 'format.fail-on-missing-field' = 'false'
>>
>> Yes. That's pretty much it.
>>
>> This is reasonable important to nail down as with such violations I
>> believe we could not actually switch to a standard YAML parser.
>>
>> On 29/04/2020 16:05, Timo Walther wrote:
>>
>> Hi everyone,
>>
>> discussions around ConfigOption seem to be very popular recently.
>>
>> So I
>>
>> would also like to get some opinions on a different topic.
>>
>> How do we represent hierarchies in ConfigOption? In FLIP-122, we
>> agreed on the following DDL syntax:
>>
>> CREATE TABLE fs_table (
>>   ...
>> ) WITH (
>>   'connector' = 'filesystem',
>>   'path' = 'file:///path/to/whatever',
>>   'format' = 'csv',
>>   'format.allow-comments' = 'true',
>>   'format.ignore-parse-errors' = 'true'
>> );
>>
>> Of course this is slightly different from regular Flink core
>> configuration but a connector still needs to be configured based on
>> these options.
>>
>> However, I think this FLIP violates our code style guidelines
>>
>> because
>>
>> 'format' = 'json',
>> 'format.fail-on-missing-field' = 'false'
>>
>> is an invalid hierarchy. `format` cannot be a string and a top-level
>> object at the same time.
>>
>> We have similar problems in our runtime configuration:
>>
>> state.backend=
>> state.backend.incremental=
>> restart-strategy=
>> restart-strategy.fixed-delay.delay=
>> high-availability=
>> high-availability.cluster-id=
>>
>> The code style guide states "Think of the configuration as nested
>> objects (JSON style)". So such hierarchies cannot be represented in
>>
>> a
>>
>> nested JSON style.
>>
>> Therefore, should we advocate instead:
>>
>> 'format.kind' = 'json',
>> 'format.fail-on-missing-field' = 'false'
>>
>> What do you think?
>>
>> Thanks,
>> Timo
>>
>> [1]
>>
>>
>> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking UniversityTel:+86-15650713730
>> Email: [hidden email]; [hidden email]
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Till Rohrmann
Hi everyone,

I like Timo's proposal to organize our configuration more hierarchical
since this is what the coding guide specifies. The benefit I see is that
config options belonging to the same concept will be found in the same
nested object. Moreover, it will be possible to split the configuration
into unrelated parts which are fed to the respective components. That way
one has a much better separation of concern since component A cannot read
the configuration of component B.

Concerning Timo's last two proposals:

If fail-on-missing-field is a common configuration shared by all formats,
then I would go with the first option:

format.kind: json
format.fail-on-missing-field: true

If fail-on-missing-field is specific for json, then one could go with

format: json
json.fail-on-missing-field: true

or

format.kind: json
format.json.fail-on-missing-field: true

Cheers,
Till


On Fri, May 1, 2020 at 11:55 AM Timo Walther <[hidden email]> wrote:

> Hi Jark,
>
> yes, in theory every connector can design options as they like. But for
> user experience and good coding style we should be consistent in Flink
> connectors and configuration. Because implementers of new connectors
> will copy the design of existing ones.
>
> Furthermore, I could image that people in the DataStream API would also
> like to configure their connector based on options in the near future.
> It might be the case that Flink DataStream API connectors will reuse the
> ConfigOptions from Table API for consistency.
>
> I'm favoring either:
>
> format.kind = json
> format.fail-on-missing-field: true
>
> Or:
>
> format = json
> json.fail-on-missing-field: true
>
> Both are valid hierarchies.
>
> Regards,
> Timo
>
>
> On 30.04.20 17:57, Jark Wu wrote:
> > Hi Dawid,
> >
> > I just want to mention one of your response,
> >
> >> What you described with
> >> 'format' = 'csv',
> >> 'csv.allow-comments' = 'true',
> >> 'csv.ignore-parse-errors' = 'true'
> >> would not work though as the `format` prefix is mandatory in the sources
> > as only the properties with format
> >>   will be passed to the format factory in majority of cases. We already
> > have some implicit contracts.
> >
> > IIUC, in FLIP-95 and FLIP-122, the property key style are totally decided
> > by connectors, not the framework.
> > So I custom connector can define above properties, and extract the value
> of
> > 'format', i.e. 'csv', to find the format factory.
> > And extract the properties with `csv.` prefix and remove the prefix, and
> > pass the properties (e.g. 'allow-comments' = 'true')
> > into the format factory to create format.
> >
> > So there is no a strict guarantee to have a "nested JSON style"
> properties.
> > Users can still develop a custom connector with this
> > un-hierarchy properties and works well.
> >
> > 'format' = 'json',
> > 'format.fail-on-missing-field' = 'false'
> >
> > Best,
> > Jark
> >
> >
> > On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz <[hidden email]>
> > wrote:
> >
> >> Hi all,
> >>
> >> I'd like to start with a comment that I am ok with the current state of
> >> the FLIP-122 if there is a strong preference for it. Nevertheless I
> still
> >> like the idea of adding `type` to the `format` to have it as
> `format.type`
> >> = `json`.
> >>
> >> I wanted to clarify a few things though:
> >>
> >> @Jingsong As far as I see it most of the users copy/paste the properties
> >> from the documentation to the SQL, so I don't think additional four
> >> characters are too cumbersome. Plus if you force the additional suffix
> onto
> >> all the options of a format you introduce way more boilerplate than if
> we
> >> added the `type/kind/name`
> >>
> >> @Kurt I agree that we cannot force it, but I think it is more of a
> >> question to set standards/implicit contracts on the properties. What you
> >> described with
> >> 'format' = 'csv',
> >> 'csv.allow-comments' = 'true',
> >> 'csv.ignore-parse-errors' = 'true'
> >>
> >> would not work though as the `format` prefix is mandatory in the sources
> >> as only the properties with format will be passed to the format factory
> in
> >> majority of cases. We already have some implicit contracts.
> >>
> >> @Forward I did not necessarily get the example. Aren't json and bson two
> >> separate formats? Do you mean you can have those two at the same time?
> Why
> >> do you need to differentiate the options for each? The way I see it is:
> >>
> >> ‘format(.name)' = 'json',
> >> ‘format.fail-on-missing-field' = 'false'
> >>
> >> or
> >>
> >> ‘format(.name)' = 'bson',
> >> ‘format.fail-on-missing-field' = 'false'
> >>
> >> @Benchao I'd be fine with any of name, kind, type(this we already had in
> >> the past)
> >>
> >> Best,
> >> Dawid
> >>
> >> On 30/04/2020 04:17, Forward Xu wrote:
> >>
> >> Here I have a little doubt. At present, our json only supports the
> >> conventional json format. If we need to implement json with bson, json
> with
> >> avro, etc., how should we express it?
> >> Do you need like the following:
> >>
> >> ‘format.name' = 'json',
> >>
> >> ‘format.json.fail-on-missing-field' = 'false'
> >>
> >>
> >> ‘format.name' = 'bson',
> >>
> >> ‘format.bson.fail-on-missing-field' = ‘false'
> >>
> >>
> >> Best,
> >>
> >> Forward
> >>
> >> Benchao Li <[hidden email]> <[hidden email]> 于2020年4月30日周四
> 上午9:58写道:
> >>
> >>
> >> Thanks Timo for staring the discussion.
> >>
> >> Generally I like the idea to keep the config align with a standard like
> >> json/yaml.
> >>
> >>  From the user's perspective, I don't use table configs from a config
> file
> >> like yaml or json for now,
> >> And it's ok to change it to yaml like style. Actually we didn't know
> that
> >> this could be a yaml like
> >> configuration hierarchy. If it has a hierarchy, we maybe consider that
> in
> >> the future to load the
> >> config from a yaml/json file.
> >>
> >> Regarding the name,
> >> 'format.kind' looks fine to me. However there is another name from the
> top
> >> of my head:
> >> 'format.name', WDYT?
> >>
> >> Dawid Wysakowicz <[hidden email]> <[hidden email]>
> 于2020年4月29日周三 下午11:56写道:
> >>
> >>
> >> Hi all,
> >>
> >> I also wanted to share my opinion.
> >>
> >> When talking about a ConfigOption hierarchy we use for configuring Flink
> >> cluster I would be a strong advocate for keeping a yaml/hocon/json/...
> >> compatible style. Those options are primarily read from a file and thus
> >> should at least try to follow common practices for nested formats if we
> >> ever decide to switch to one.
> >>
> >> Here the question is about the properties we use in SQL statements. The
> >> origin/destination of these usually will be external catalog, usually in
> >>
> >> a
> >>
> >> flattened(key/value) representation so I agree it is not as important as
> >>
> >> in
> >>
> >> the aforementioned case. Nevertheless having a yaml based catalog or
> >>
> >> being
> >>
> >> able to have e.g. yaml based snapshots of a catalog in my opinion is
> >> appealing. At the same time cost of being able to have a nice
> >> yaml/hocon/json representation is just adding a single suffix to a
> >> single(at most 2 key + value) property. The question is between `format`
> >>
> >> =
> >>
> >> `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
> >> doing it.
> >>
> >> Just to have a full picture. Both cases can be represented in yaml, but
> >> the difference is significant:
> >> format: 'json'
> >> format.option: 'value'
> >>
> >> vs
> >> format:
> >>      kind: 'json'
> >>
> >>      option: 'value'
> >>
> >> Best,
> >> Dawid
> >>
> >> On 29/04/2020 17:13, Flavio Pompermaier wrote:
> >>
> >> Personally I don't have any preference here.  Compliance wih standard
> >>
> >> YAML
> >>
> >> parser is probably more important
> >>
> >> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> <
> [hidden email]> wrote:
> >>
> >>
> >>  From a user's perspective, I prefer the shorter one "format=json",
> >>
> >> because
> >>
> >> it's more concise and straightforward. The "kind" is redundant for
> >>
> >> users.
> >>
> >> Is there a real case requires to represent the configuration in JSON
> >> style?
> >> As far as I can see, I don't see such requirement, and everything works
> >> fine by now.
> >>
> >> So I'm in favor of "format=json". But if the community insist to follow
> >> code style on this, I'm also fine with the longer one.
> >>
> >> Btw, I also CC user mailing list to listen more user's feedback.
> >>
> >> Because I
> >>
> >> think this is relative to usability.
> >>
> >> Best,
> >> Jark
> >>
> >> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> <
> [hidden email]>
> >> wrote:
> >>
> >>
> >>   > Therefore, should we advocate instead:
> >>   >
> >>   > 'format.kind' = 'json',
> >>   > 'format.fail-on-missing-field' = 'false'
> >>
> >> Yes. That's pretty much it.
> >>
> >> This is reasonable important to nail down as with such violations I
> >> believe we could not actually switch to a standard YAML parser.
> >>
> >> On 29/04/2020 16:05, Timo Walther wrote:
> >>
> >> Hi everyone,
> >>
> >> discussions around ConfigOption seem to be very popular recently.
> >>
> >> So I
> >>
> >> would also like to get some opinions on a different topic.
> >>
> >> How do we represent hierarchies in ConfigOption? In FLIP-122, we
> >> agreed on the following DDL syntax:
> >>
> >> CREATE TABLE fs_table (
> >>   ...
> >> ) WITH (
> >>   'connector' = 'filesystem',
> >>   'path' = 'file:///path/to/whatever',
> >>   'format' = 'csv',
> >>   'format.allow-comments' = 'true',
> >>   'format.ignore-parse-errors' = 'true'
> >> );
> >>
> >> Of course this is slightly different from regular Flink core
> >> configuration but a connector still needs to be configured based on
> >> these options.
> >>
> >> However, I think this FLIP violates our code style guidelines
> >>
> >> because
> >>
> >> 'format' = 'json',
> >> 'format.fail-on-missing-field' = 'false'
> >>
> >> is an invalid hierarchy. `format` cannot be a string and a top-level
> >> object at the same time.
> >>
> >> We have similar problems in our runtime configuration:
> >>
> >> state.backend=
> >> state.backend.incremental=
> >> restart-strategy=
> >> restart-strategy.fixed-delay.delay=
> >> high-availability=
> >> high-availability.cluster-id=
> >>
> >> The code style guide states "Think of the configuration as nested
> >> objects (JSON style)". So such hierarchies cannot be represented in
> >>
> >> a
> >>
> >> nested JSON style.
> >>
> >> Therefore, should we advocate instead:
> >>
> >> 'format.kind' = 'json',
> >> 'format.fail-on-missing-field' = 'false'
> >>
> >> What do you think?
> >>
> >> Thanks,
> >> Timo
> >>
> >> [1]
> >>
> >>
> >>
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >>
> >> --
> >>
> >> Benchao Li
> >> School of Electronics Engineering and Computer Science, Peking
> UniversityTel:+86-15650713730
> >> Email: [hidden email]; [hidden email]
> >>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Danny Chan
Hi, everyone ~

Allows me to share some thoughts here.

Personally i think for SQL, "format" is obviously better than "format.name", it is more concise and straight-forward, similar with Presto FORMAT[2] and KSQL VALUE_FORMAT[1]; i think we move from "connector.type" to "connector" for the same reason, the "type" or "name" suffix is implicit, SQL syntax like the DDL is a top-level user API, so from my side keeping good user-friendly syntax is more important.

@Timo I'm big +1 for the a good code style guide, but that does not mean we should go for a json-style table options in the DDL, the DDL could have its own contract. Can we move "represent these config options in YAML" to another topic ? Otherwise, how should we handle the "connector" key, should we prefix all the table options with "connector" ? The original inention of FLIP-122 is to remove some redundant prefix/suffix of the table options because they are obviously implicit there, and the "connector." prefix and the ".type" or ".name" suffix are the ones we most want to delete.

@Dawid Although ".type" is just another 4 characters, but we force the SQL users to do the thing that is obvious reduadant, i know serialize catalog table to YAML or use the options in DataStream has similar keys request, but they are different use cases that i believe many SQL user would not encounter, that means we force many users to obey rules for cases they would never have.


[1] https://docs.ksqldb.io/en/latest/developer-guide/create-a-table/
[2] https://prestodb.io/docs/current/sql/create-table.html

Best,
Danny Chan
在 2020年5月4日 +0800 PM11:34,Till Rohrmann <[hidden email]>,写道:

> Hi everyone,
>
> I like Timo's proposal to organize our configuration more hierarchical
> since this is what the coding guide specifies. The benefit I see is that
> config options belonging to the same concept will be found in the same
> nested object. Moreover, it will be possible to split the configuration
> into unrelated parts which are fed to the respective components. That way
> one has a much better separation of concern since component A cannot read
> the configuration of component B.
>
> Concerning Timo's last two proposals:
>
> If fail-on-missing-field is a common configuration shared by all formats,
> then I would go with the first option:
>
> format.kind: json
> format.fail-on-missing-field: true
>
> If fail-on-missing-field is specific for json, then one could go with
>
> format: json
> json.fail-on-missing-field: true
>
> or
>
> format.kind: json
> format.json.fail-on-missing-field: true
>
> Cheers,
> Till
>
>
> On Fri, May 1, 2020 at 11:55 AM Timo Walther <[hidden email]> wrote:
>
> > Hi Jark,
> >
> > yes, in theory every connector can design options as they like. But for
> > user experience and good coding style we should be consistent in Flink
> > connectors and configuration. Because implementers of new connectors
> > will copy the design of existing ones.
> >
> > Furthermore, I could image that people in the DataStream API would also
> > like to configure their connector based on options in the near future.
> > It might be the case that Flink DataStream API connectors will reuse the
> > ConfigOptions from Table API for consistency.
> >
> > I'm favoring either:
> >
> > format.kind = json
> > format.fail-on-missing-field: true
> >
> > Or:
> >
> > format = json
> > json.fail-on-missing-field: true
> >
> > Both are valid hierarchies.
> >
> > Regards,
> > Timo
> >
> >
> > On 30.04.20 17:57, Jark Wu wrote:
> > > Hi Dawid,
> > >
> > > I just want to mention one of your response,
> > >
> > > > What you described with
> > > > 'format' = 'csv',
> > > > 'csv.allow-comments' = 'true',
> > > > 'csv.ignore-parse-errors' = 'true'
> > > > would not work though as the `format` prefix is mandatory in the sources
> > > as only the properties with format
> > > > will be passed to the format factory in majority of cases. We already
> > > have some implicit contracts.
> > >
> > > IIUC, in FLIP-95 and FLIP-122, the property key style are totally decided
> > > by connectors, not the framework.
> > > So I custom connector can define above properties, and extract the value
> > of
> > > 'format', i.e. 'csv', to find the format factory.
> > > And extract the properties with `csv.` prefix and remove the prefix, and
> > > pass the properties (e.g. 'allow-comments' = 'true')
> > > into the format factory to create format.
> > >
> > > So there is no a strict guarantee to have a "nested JSON style"
> > properties.
> > > Users can still develop a custom connector with this
> > > un-hierarchy properties and works well.
> > >
> > > 'format' = 'json',
> > > 'format.fail-on-missing-field' = 'false'
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz <[hidden email]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to start with a comment that I am ok with the current state of
> > > > the FLIP-122 if there is a strong preference for it. Nevertheless I
> > still
> > > > like the idea of adding `type` to the `format` to have it as
> > `format.type`
> > > > = `json`.
> > > >
> > > > I wanted to clarify a few things though:
> > > >
> > > > @Jingsong As far as I see it most of the users copy/paste the properties
> > > > from the documentation to the SQL, so I don't think additional four
> > > > characters are too cumbersome. Plus if you force the additional suffix
> > onto
> > > > all the options of a format you introduce way more boilerplate than if
> > we
> > > > added the `type/kind/name`
> > > >
> > > > @Kurt I agree that we cannot force it, but I think it is more of a
> > > > question to set standards/implicit contracts on the properties. What you
> > > > described with
> > > > 'format' = 'csv',
> > > > 'csv.allow-comments' = 'true',
> > > > 'csv.ignore-parse-errors' = 'true'
> > > >
> > > > would not work though as the `format` prefix is mandatory in the sources
> > > > as only the properties with format will be passed to the format factory
> > in
> > > > majority of cases. We already have some implicit contracts.
> > > >
> > > > @Forward I did not necessarily get the example. Aren't json and bson two
> > > > separate formats? Do you mean you can have those two at the same time?
> > Why
> > > > do you need to differentiate the options for each? The way I see it is:
> > > >
> > > > ‘format(.name)' = 'json',
> > > > ‘format.fail-on-missing-field' = 'false'
> > > >
> > > > or
> > > >
> > > > ‘format(.name)' = 'bson',
> > > > ‘format.fail-on-missing-field' = 'false'
> > > >
> > > > @Benchao I'd be fine with any of name, kind, type(this we already had in
> > > > the past)
> > > >
> > > > Best,
> > > > Dawid
> > > >
> > > > On 30/04/2020 04:17, Forward Xu wrote:
> > > >
> > > > Here I have a little doubt. At present, our json only supports the
> > > > conventional json format. If we need to implement json with bson, json
> > with
> > > > avro, etc., how should we express it?
> > > > Do you need like the following:
> > > >
> > > > ‘format.name' = 'json',
> > > >
> > > > ‘format.json.fail-on-missing-field' = 'false'
> > > >
> > > >
> > > > ‘format.name' = 'bson',
> > > >
> > > > ‘format.bson.fail-on-missing-field' = ‘false'
> > > >
> > > >
> > > > Best,
> > > >
> > > > Forward
> > > >
> > > > Benchao Li <[hidden email]> <[hidden email]> 于2020年4月30日周四
> > 上午9:58写道:
> > > >
> > > >
> > > > Thanks Timo for staring the discussion.
> > > >
> > > > Generally I like the idea to keep the config align with a standard like
> > > > json/yaml.
> > > >
> > > > From the user's perspective, I don't use table configs from a config
> > file
> > > > like yaml or json for now,
> > > > And it's ok to change it to yaml like style. Actually we didn't know
> > that
> > > > this could be a yaml like
> > > > configuration hierarchy. If it has a hierarchy, we maybe consider that
> > in
> > > > the future to load the
> > > > config from a yaml/json file.
> > > >
> > > > Regarding the name,
> > > > 'format.kind' looks fine to me. However there is another name from the
> > top
> > > > of my head:
> > > > 'format.name', WDYT?
> > > >
> > > > Dawid Wysakowicz <[hidden email]> <[hidden email]>
> > 于2020年4月29日周三 下午11:56写道:
> > > >
> > > >
> > > > Hi all,
> > > >
> > > > I also wanted to share my opinion.
> > > >
> > > > When talking about a ConfigOption hierarchy we use for configuring Flink
> > > > cluster I would be a strong advocate for keeping a yaml/hocon/json/...
> > > > compatible style. Those options are primarily read from a file and thus
> > > > should at least try to follow common practices for nested formats if we
> > > > ever decide to switch to one.
> > > >
> > > > Here the question is about the properties we use in SQL statements. The
> > > > origin/destination of these usually will be external catalog, usually in
> > > >
> > > > a
> > > >
> > > > flattened(key/value) representation so I agree it is not as important as
> > > >
> > > > in
> > > >
> > > > the aforementioned case. Nevertheless having a yaml based catalog or
> > > >
> > > > being
> > > >
> > > > able to have e.g. yaml based snapshots of a catalog in my opinion is
> > > > appealing. At the same time cost of being able to have a nice
> > > > yaml/hocon/json representation is just adding a single suffix to a
> > > > single(at most 2 key + value) property. The question is between `format`
> > > >
> > > > =
> > > >
> > > > `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
> > > > doing it.
> > > >
> > > > Just to have a full picture. Both cases can be represented in yaml, but
> > > > the difference is significant:
> > > > format: 'json'
> > > > format.option: 'value'
> > > >
> > > > vs
> > > > format:
> > > > kind: 'json'
> > > >
> > > > option: 'value'
> > > >
> > > > Best,
> > > > Dawid
> > > >
> > > > On 29/04/2020 17:13, Flavio Pompermaier wrote:
> > > >
> > > > Personally I don't have any preference here. Compliance wih standard
> > > >
> > > > YAML
> > > >
> > > > parser is probably more important
> > > >
> > > > On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> <
> > [hidden email]> wrote:
> > > >
> > > >
> > > > From a user's perspective, I prefer the shorter one "format=json",
> > > >
> > > > because
> > > >
> > > > it's more concise and straightforward. The "kind" is redundant for
> > > >
> > > > users.
> > > >
> > > > Is there a real case requires to represent the configuration in JSON
> > > > style?
> > > > As far as I can see, I don't see such requirement, and everything works
> > > > fine by now.
> > > >
> > > > So I'm in favor of "format=json". But if the community insist to follow
> > > > code style on this, I'm also fine with the longer one.
> > > >
> > > > Btw, I also CC user mailing list to listen more user's feedback.
> > > >
> > > > Because I
> > > >
> > > > think this is relative to usability.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]> <
> > [hidden email]>
> > > > wrote:
> > > >
> > > >
> > > > > Therefore, should we advocate instead:
> > > > >
> > > > > 'format.kind' = 'json',
> > > > > 'format.fail-on-missing-field' = 'false'
> > > >
> > > > Yes. That's pretty much it.
> > > >
> > > > This is reasonable important to nail down as with such violations I
> > > > believe we could not actually switch to a standard YAML parser.
> > > >
> > > > On 29/04/2020 16:05, Timo Walther wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > discussions around ConfigOption seem to be very popular recently.
> > > >
> > > > So I
> > > >
> > > > would also like to get some opinions on a different topic.
> > > >
> > > > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > > > agreed on the following DDL syntax:
> > > >
> > > > CREATE TABLE fs_table (
> > > > ...
> > > > ) WITH (
> > > > 'connector' = 'filesystem',
> > > > 'path' = 'file:///path/to/whatever',
> > > > 'format' = 'csv',
> > > > 'format.allow-comments' = 'true',
> > > > 'format.ignore-parse-errors' = 'true'
> > > > );
> > > >
> > > > Of course this is slightly different from regular Flink core
> > > > configuration but a connector still needs to be configured based on
> > > > these options.
> > > >
> > > > However, I think this FLIP violates our code style guidelines
> > > >
> > > > because
> > > >
> > > > 'format' = 'json',
> > > > 'format.fail-on-missing-field' = 'false'
> > > >
> > > > is an invalid hierarchy. `format` cannot be a string and a top-level
> > > > object at the same time.
> > > >
> > > > We have similar problems in our runtime configuration:
> > > >
> > > > state.backend=
> > > > state.backend.incremental=
> > > > restart-strategy=
> > > > restart-strategy.fixed-delay.delay=
> > > > high-availability=
> > > > high-availability.cluster-id=
> > > >
> > > > The code style guide states "Think of the configuration as nested
> > > > objects (JSON style)". So such hierarchies cannot be represented in
> > > >
> > > > a
> > > >
> > > > nested JSON style.
> > > >
> > > > Therefore, should we advocate instead:
> > > >
> > > > 'format.kind' = 'json',
> > > > 'format.fail-on-missing-field' = 'false'
> > > >
> > > > What do you think?
> > > >
> > > > Thanks,
> > > > Timo
> > > >
> > > > [1]
> > > >
> > > >
> > > >
> > https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> > > >
> > > > --
> > > >
> > > > Benchao Li
> > > > School of Electronics Engineering and Computer Science, Peking
> > UniversityTel:+86-15650713730
> > > > Email: [hidden email]; [hidden email]
> > > >
> > > >
> > >
> >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Jark Wu-2
Hi,

I think Timo proposed a good idea to make both side happy. That is:

format = json
json.fail-on-missing-field = true
json.ignore-parse-error = true

value.format = json
value.json.fail-on-missing-field = true
value.json.ignore-parse-error = true

This is a valid hierarchies. Besides, it's more clear that the option
belongs to a specific component (i.e. json).
This will be more readable when we introducing more formats, e.g. parquet.

format = parquet
parquet.compression = ...
parquet.block.size = ...
parquet.page.size = ...

is more readable than current style:

format = parquet
format.compression = ...
format.block.size = ...
format.page.size = ...

To sum up, I'm +1 to use "format = json",  "json.fail-on-missing-field =
true".

Best,
Jark

On Wed, 6 May 2020 at 17:12, Danny Chan <[hidden email]> wrote:

> Hi, everyone ~
>
> Allows me to share some thoughts here.
>
> Personally i think for SQL, "format" is obviously better than "format.name",
> it is more concise and straight-forward, similar with Presto FORMAT[2] and
> KSQL VALUE_FORMAT[1]; i think we move from "connector.type" to "connector"
> for the same reason, the "type" or "name" suffix is implicit, SQL syntax
> like the DDL is a top-level user API, so from my side keeping good
> user-friendly syntax is more important.
>
> @Timo I'm big +1 for the a good code style guide, but that does not mean
> we should go for a json-style table options in the DDL, the DDL could have
> its own contract. Can we move "represent these config options in YAML" to
> another topic ? Otherwise, how should we handle the "connector" key, should
> we prefix all the table options with "connector" ? The original inention of
> FLIP-122 is to remove some redundant prefix/suffix of the table options
> because they are obviously implicit there, and the "connector." prefix and
> the ".type" or ".name" suffix are the ones we most want to delete.
>
> @Dawid Although ".type" is just another 4 characters, but we force the SQL
> users to do the thing that is obvious reduadant, i know serialize catalog
> table to YAML or use the options in DataStream has similar keys request,
> but they are different use cases that i believe many SQL user would not
> encounter, that means we force many users to obey rules for cases they
> would never have.
>
>
> [1] https://docs.ksqldb.io/en/latest/developer-guide/create-a-table/
> [2] https://prestodb.io/docs/current/sql/create-table.html
>
> Best,
> Danny Chan
> 在 2020年5月4日 +0800 PM11:34,Till Rohrmann <[hidden email]>,写道:
> > Hi everyone,
> >
> > I like Timo's proposal to organize our configuration more hierarchical
> > since this is what the coding guide specifies. The benefit I see is that
> > config options belonging to the same concept will be found in the same
> > nested object. Moreover, it will be possible to split the configuration
> > into unrelated parts which are fed to the respective components. That way
> > one has a much better separation of concern since component A cannot read
> > the configuration of component B.
> >
> > Concerning Timo's last two proposals:
> >
> > If fail-on-missing-field is a common configuration shared by all formats,
> > then I would go with the first option:
> >
> > format.kind: json
> > format.fail-on-missing-field: true
> >
> > If fail-on-missing-field is specific for json, then one could go with
> >
> > format: json
> > json.fail-on-missing-field: true
> >
> > or
> >
> > format.kind: json
> > format.json.fail-on-missing-field: true
> >
> > Cheers,
> > Till
> >
> >
> > On Fri, May 1, 2020 at 11:55 AM Timo Walther <[hidden email]> wrote:
> >
> > > Hi Jark,
> > >
> > > yes, in theory every connector can design options as they like. But for
> > > user experience and good coding style we should be consistent in Flink
> > > connectors and configuration. Because implementers of new connectors
> > > will copy the design of existing ones.
> > >
> > > Furthermore, I could image that people in the DataStream API would also
> > > like to configure their connector based on options in the near future.
> > > It might be the case that Flink DataStream API connectors will reuse
> the
> > > ConfigOptions from Table API for consistency.
> > >
> > > I'm favoring either:
> > >
> > > format.kind = json
> > > format.fail-on-missing-field: true
> > >
> > > Or:
> > >
> > > format = json
> > > json.fail-on-missing-field: true
> > >
> > > Both are valid hierarchies.
> > >
> > > Regards,
> > > Timo
> > >
> > >
> > > On 30.04.20 17:57, Jark Wu wrote:
> > > > Hi Dawid,
> > > >
> > > > I just want to mention one of your response,
> > > >
> > > > > What you described with
> > > > > 'format' = 'csv',
> > > > > 'csv.allow-comments' = 'true',
> > > > > 'csv.ignore-parse-errors' = 'true'
> > > > > would not work though as the `format` prefix is mandatory in the
> sources
> > > > as only the properties with format
> > > > > will be passed to the format factory in majority of cases. We
> already
> > > > have some implicit contracts.
> > > >
> > > > IIUC, in FLIP-95 and FLIP-122, the property key style are totally
> decided
> > > > by connectors, not the framework.
> > > > So I custom connector can define above properties, and extract the
> value
> > > of
> > > > 'format', i.e. 'csv', to find the format factory.
> > > > And extract the properties with `csv.` prefix and remove the prefix,
> and
> > > > pass the properties (e.g. 'allow-comments' = 'true')
> > > > into the format factory to create format.
> > > >
> > > > So there is no a strict guarantee to have a "nested JSON style"
> > > properties.
> > > > Users can still develop a custom connector with this
> > > > un-hierarchy properties and works well.
> > > >
> > > > 'format' = 'json',
> > > > 'format.fail-on-missing-field' = 'false'
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > > On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz <
> [hidden email]>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start with a comment that I am ok with the current
> state of
> > > > > the FLIP-122 if there is a strong preference for it. Nevertheless I
> > > still
> > > > > like the idea of adding `type` to the `format` to have it as
> > > `format.type`
> > > > > = `json`.
> > > > >
> > > > > I wanted to clarify a few things though:
> > > > >
> > > > > @Jingsong As far as I see it most of the users copy/paste the
> properties
> > > > > from the documentation to the SQL, so I don't think additional four
> > > > > characters are too cumbersome. Plus if you force the additional
> suffix
> > > onto
> > > > > all the options of a format you introduce way more boilerplate
> than if
> > > we
> > > > > added the `type/kind/name`
> > > > >
> > > > > @Kurt I agree that we cannot force it, but I think it is more of a
> > > > > question to set standards/implicit contracts on the properties.
> What you
> > > > > described with
> > > > > 'format' = 'csv',
> > > > > 'csv.allow-comments' = 'true',
> > > > > 'csv.ignore-parse-errors' = 'true'
> > > > >
> > > > > would not work though as the `format` prefix is mandatory in the
> sources
> > > > > as only the properties with format will be passed to the format
> factory
> > > in
> > > > > majority of cases. We already have some implicit contracts.
> > > > >
> > > > > @Forward I did not necessarily get the example. Aren't json and
> bson two
> > > > > separate formats? Do you mean you can have those two at the same
> time?
> > > Why
> > > > > do you need to differentiate the options for each? The way I see
> it is:
> > > > >
> > > > > ‘format(.name)' = 'json',
> > > > > ‘format.fail-on-missing-field' = 'false'
> > > > >
> > > > > or
> > > > >
> > > > > ‘format(.name)' = 'bson',
> > > > > ‘format.fail-on-missing-field' = 'false'
> > > > >
> > > > > @Benchao I'd be fine with any of name, kind, type(this we already
> had in
> > > > > the past)
> > > > >
> > > > > Best,
> > > > > Dawid
> > > > >
> > > > > On 30/04/2020 04:17, Forward Xu wrote:
> > > > >
> > > > > Here I have a little doubt. At present, our json only supports the
> > > > > conventional json format. If we need to implement json with bson,
> json
> > > with
> > > > > avro, etc., how should we express it?
> > > > > Do you need like the following:
> > > > >
> > > > > ‘format.name' = 'json',
> > > > >
> > > > > ‘format.json.fail-on-missing-field' = 'false'
> > > > >
> > > > >
> > > > > ‘format.name' = 'bson',
> > > > >
> > > > > ‘format.bson.fail-on-missing-field' = ‘false'
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Forward
> > > > >
> > > > > Benchao Li <[hidden email]> <[hidden email]>
> 于2020年4月30日周四
> > > 上午9:58写道:
> > > > >
> > > > >
> > > > > Thanks Timo for staring the discussion.
> > > > >
> > > > > Generally I like the idea to keep the config align with a standard
> like
> > > > > json/yaml.
> > > > >
> > > > > From the user's perspective, I don't use table configs from a
> config
> > > file
> > > > > like yaml or json for now,
> > > > > And it's ok to change it to yaml like style. Actually we didn't
> know
> > > that
> > > > > this could be a yaml like
> > > > > configuration hierarchy. If it has a hierarchy, we maybe consider
> that
> > > in
> > > > > the future to load the
> > > > > config from a yaml/json file.
> > > > >
> > > > > Regarding the name,
> > > > > 'format.kind' looks fine to me. However there is another name from
> the
> > > top
> > > > > of my head:
> > > > > 'format.name', WDYT?
> > > > >
> > > > > Dawid Wysakowicz <[hidden email]> <[hidden email]>
> > > 于2020年4月29日周三 下午11:56写道:
> > > > >
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I also wanted to share my opinion.
> > > > >
> > > > > When talking about a ConfigOption hierarchy we use for configuring
> Flink
> > > > > cluster I would be a strong advocate for keeping a
> yaml/hocon/json/...
> > > > > compatible style. Those options are primarily read from a file and
> thus
> > > > > should at least try to follow common practices for nested formats
> if we
> > > > > ever decide to switch to one.
> > > > >
> > > > > Here the question is about the properties we use in SQL
> statements. The
> > > > > origin/destination of these usually will be external catalog,
> usually in
> > > > >
> > > > > a
> > > > >
> > > > > flattened(key/value) representation so I agree it is not as
> important as
> > > > >
> > > > > in
> > > > >
> > > > > the aforementioned case. Nevertheless having a yaml based catalog
> or
> > > > >
> > > > > being
> > > > >
> > > > > able to have e.g. yaml based snapshots of a catalog in my opinion
> is
> > > > > appealing. At the same time cost of being able to have a nice
> > > > > yaml/hocon/json representation is just adding a single suffix to a
> > > > > single(at most 2 key + value) property. The question is between
> `format`
> > > > >
> > > > > =
> > > > >
> > > > > `json` vs `format.kind` = `json`. That said I'd be slighty in
> favor of
> > > > > doing it.
> > > > >
> > > > > Just to have a full picture. Both cases can be represented in
> yaml, but
> > > > > the difference is significant:
> > > > > format: 'json'
> > > > > format.option: 'value'
> > > > >
> > > > > vs
> > > > > format:
> > > > > kind: 'json'
> > > > >
> > > > > option: 'value'
> > > > >
> > > > > Best,
> > > > > Dawid
> > > > >
> > > > > On 29/04/2020 17:13, Flavio Pompermaier wrote:
> > > > >
> > > > > Personally I don't have any preference here. Compliance wih
> standard
> > > > >
> > > > > YAML
> > > > >
> > > > > parser is probably more important
> > > > >
> > > > > On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> <
> > > [hidden email]> wrote:
> > > > >
> > > > >
> > > > > From a user's perspective, I prefer the shorter one "format=json",
> > > > >
> > > > > because
> > > > >
> > > > > it's more concise and straightforward. The "kind" is redundant for
> > > > >
> > > > > users.
> > > > >
> > > > > Is there a real case requires to represent the configuration in
> JSON
> > > > > style?
> > > > > As far as I can see, I don't see such requirement, and everything
> works
> > > > > fine by now.
> > > > >
> > > > > So I'm in favor of "format=json". But if the community insist to
> follow
> > > > > code style on this, I'm also fine with the longer one.
> > > > >
> > > > > Btw, I also CC user mailing list to listen more user's feedback.
> > > > >
> > > > > Because I
> > > > >
> > > > > think this is relative to usability.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <[hidden email]>
> <
> > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > >
> > > > > > Therefore, should we advocate instead:
> > > > > >
> > > > > > 'format.kind' = 'json',
> > > > > > 'format.fail-on-missing-field' = 'false'
> > > > >
> > > > > Yes. That's pretty much it.
> > > > >
> > > > > This is reasonable important to nail down as with such violations I
> > > > > believe we could not actually switch to a standard YAML parser.
> > > > >
> > > > > On 29/04/2020 16:05, Timo Walther wrote:
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > discussions around ConfigOption seem to be very popular recently.
> > > > >
> > > > > So I
> > > > >
> > > > > would also like to get some opinions on a different topic.
> > > > >
> > > > > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > > > > agreed on the following DDL syntax:
> > > > >
> > > > > CREATE TABLE fs_table (
> > > > > ...
> > > > > ) WITH (
> > > > > 'connector' = 'filesystem',
> > > > > 'path' = 'file:///path/to/whatever',
> > > > > 'format' = 'csv',
> > > > > 'format.allow-comments' = 'true',
> > > > > 'format.ignore-parse-errors' = 'true'
> > > > > );
> > > > >
> > > > > Of course this is slightly different from regular Flink core
> > > > > configuration but a connector still needs to be configured based on
> > > > > these options.
> > > > >
> > > > > However, I think this FLIP violates our code style guidelines
> > > > >
> > > > > because
> > > > >
> > > > > 'format' = 'json',
> > > > > 'format.fail-on-missing-field' = 'false'
> > > > >
> > > > > is an invalid hierarchy. `format` cannot be a string and a
> top-level
> > > > > object at the same time.
> > > > >
> > > > > We have similar problems in our runtime configuration:
> > > > >
> > > > > state.backend=
> > > > > state.backend.incremental=
> > > > > restart-strategy=
> > > > > restart-strategy.fixed-delay.delay=
> > > > > high-availability=
> > > > > high-availability.cluster-id=
> > > > >
> > > > > The code style guide states "Think of the configuration as nested
> > > > > objects (JSON style)". So such hierarchies cannot be represented in
> > > > >
> > > > > a
> > > > >
> > > > > nested JSON style.
> > > > >
> > > > > Therefore, should we advocate instead:
> > > > >
> > > > > 'format.kind' = 'json',
> > > > > 'format.fail-on-missing-field' = 'false'
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Thanks,
> > > > > Timo
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > > >
> > >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> > > > >
> > > > > --
> > > > >
> > > > > Benchao Li
> > > > > School of Electronics Engineering and Computer Science, Peking
> > > UniversityTel:+86-15650713730
> > > > > Email: [hidden email]; [hidden email]
> > > > >
> > > > >
> > > >
> > >
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Jingsong Li
Hi,

+1 to:
format = parquet
parquet.compression = ...
parquet.block.size = ...
parquet.page.size = ...

For the formats like parquet and orc,
Not just Flink itself, but this way also let Flink keys compatible with the
property keys of Hadoop / Hive / Spark.

And like Jark said, this way works for Kafka key value too.

Best,
Jingsong Lee

On Wed, May 6, 2020 at 6:19 PM Jark Wu <[hidden email]> wrote:

> Hi,
>
> I think Timo proposed a good idea to make both side happy. That is:
>
> format = json
> json.fail-on-missing-field = true
> json.ignore-parse-error = true
>
> value.format = json
> value.json.fail-on-missing-field = true
> value.json.ignore-parse-error = true
>
> This is a valid hierarchies. Besides, it's more clear that the option
> belongs to a specific component (i.e. json).
> This will be more readable when we introducing more formats, e.g. parquet.
>
> format = parquet
> parquet.compression = ...
> parquet.block.size = ...
> parquet.page.size = ...
>
> is more readable than current style:
>
> format = parquet
> format.compression = ...
> format.block.size = ...
> format.page.size = ...
>
> To sum up, I'm +1 to use "format = json",  "json.fail-on-missing-field =
> true".
>
> Best,
> Jark
>
> On Wed, 6 May 2020 at 17:12, Danny Chan <[hidden email]> wrote:
>
> > Hi, everyone ~
> >
> > Allows me to share some thoughts here.
> >
> > Personally i think for SQL, "format" is obviously better than "
> format.name",
> > it is more concise and straight-forward, similar with Presto FORMAT[2]
> and
> > KSQL VALUE_FORMAT[1]; i think we move from "connector.type" to
> "connector"
> > for the same reason, the "type" or "name" suffix is implicit, SQL syntax
> > like the DDL is a top-level user API, so from my side keeping good
> > user-friendly syntax is more important.
> >
> > @Timo I'm big +1 for the a good code style guide, but that does not mean
> > we should go for a json-style table options in the DDL, the DDL could
> have
> > its own contract. Can we move "represent these config options in YAML" to
> > another topic ? Otherwise, how should we handle the "connector" key,
> should
> > we prefix all the table options with "connector" ? The original inention
> of
> > FLIP-122 is to remove some redundant prefix/suffix of the table options
> > because they are obviously implicit there, and the "connector." prefix
> and
> > the ".type" or ".name" suffix are the ones we most want to delete.
> >
> > @Dawid Although ".type" is just another 4 characters, but we force the
> SQL
> > users to do the thing that is obvious reduadant, i know serialize catalog
> > table to YAML or use the options in DataStream has similar keys request,
> > but they are different use cases that i believe many SQL user would not
> > encounter, that means we force many users to obey rules for cases they
> > would never have.
> >
> >
> > [1] https://docs.ksqldb.io/en/latest/developer-guide/create-a-table/
> > [2] https://prestodb.io/docs/current/sql/create-table.html
> >
> > Best,
> > Danny Chan
> > 在 2020年5月4日 +0800 PM11:34,Till Rohrmann <[hidden email]>,写道:
> > > Hi everyone,
> > >
> > > I like Timo's proposal to organize our configuration more hierarchical
> > > since this is what the coding guide specifies. The benefit I see is
> that
> > > config options belonging to the same concept will be found in the same
> > > nested object. Moreover, it will be possible to split the configuration
> > > into unrelated parts which are fed to the respective components. That
> way
> > > one has a much better separation of concern since component A cannot
> read
> > > the configuration of component B.
> > >
> > > Concerning Timo's last two proposals:
> > >
> > > If fail-on-missing-field is a common configuration shared by all
> formats,
> > > then I would go with the first option:
> > >
> > > format.kind: json
> > > format.fail-on-missing-field: true
> > >
> > > If fail-on-missing-field is specific for json, then one could go with
> > >
> > > format: json
> > > json.fail-on-missing-field: true
> > >
> > > or
> > >
> > > format.kind: json
> > > format.json.fail-on-missing-field: true
> > >
> > > Cheers,
> > > Till
> > >
> > >
> > > On Fri, May 1, 2020 at 11:55 AM Timo Walther <[hidden email]>
> wrote:
> > >
> > > > Hi Jark,
> > > >
> > > > yes, in theory every connector can design options as they like. But
> for
> > > > user experience and good coding style we should be consistent in
> Flink
> > > > connectors and configuration. Because implementers of new connectors
> > > > will copy the design of existing ones.
> > > >
> > > > Furthermore, I could image that people in the DataStream API would
> also
> > > > like to configure their connector based on options in the near
> future.
> > > > It might be the case that Flink DataStream API connectors will reuse
> > the
> > > > ConfigOptions from Table API for consistency.
> > > >
> > > > I'm favoring either:
> > > >
> > > > format.kind = json
> > > > format.fail-on-missing-field: true
> > > >
> > > > Or:
> > > >
> > > > format = json
> > > > json.fail-on-missing-field: true
> > > >
> > > > Both are valid hierarchies.
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > >
> > > > On 30.04.20 17:57, Jark Wu wrote:
> > > > > Hi Dawid,
> > > > >
> > > > > I just want to mention one of your response,
> > > > >
> > > > > > What you described with
> > > > > > 'format' = 'csv',
> > > > > > 'csv.allow-comments' = 'true',
> > > > > > 'csv.ignore-parse-errors' = 'true'
> > > > > > would not work though as the `format` prefix is mandatory in the
> > sources
> > > > > as only the properties with format
> > > > > > will be passed to the format factory in majority of cases. We
> > already
> > > > > have some implicit contracts.
> > > > >
> > > > > IIUC, in FLIP-95 and FLIP-122, the property key style are totally
> > decided
> > > > > by connectors, not the framework.
> > > > > So I custom connector can define above properties, and extract the
> > value
> > > > of
> > > > > 'format', i.e. 'csv', to find the format factory.
> > > > > And extract the properties with `csv.` prefix and remove the
> prefix,
> > and
> > > > > pass the properties (e.g. 'allow-comments' = 'true')
> > > > > into the format factory to create format.
> > > > >
> > > > > So there is no a strict guarantee to have a "nested JSON style"
> > > > properties.
> > > > > Users can still develop a custom connector with this
> > > > > un-hierarchy properties and works well.
> > > > >
> > > > > 'format' = 'json',
> > > > > 'format.fail-on-missing-field' = 'false'
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > >
> > > > > On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz <
> > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to start with a comment that I am ok with the current
> > state of
> > > > > > the FLIP-122 if there is a strong preference for it.
> Nevertheless I
> > > > still
> > > > > > like the idea of adding `type` to the `format` to have it as
> > > > `format.type`
> > > > > > = `json`.
> > > > > >
> > > > > > I wanted to clarify a few things though:
> > > > > >
> > > > > > @Jingsong As far as I see it most of the users copy/paste the
> > properties
> > > > > > from the documentation to the SQL, so I don't think additional
> four
> > > > > > characters are too cumbersome. Plus if you force the additional
> > suffix
> > > > onto
> > > > > > all the options of a format you introduce way more boilerplate
> > than if
> > > > we
> > > > > > added the `type/kind/name`
> > > > > >
> > > > > > @Kurt I agree that we cannot force it, but I think it is more of
> a
> > > > > > question to set standards/implicit contracts on the properties.
> > What you
> > > > > > described with
> > > > > > 'format' = 'csv',
> > > > > > 'csv.allow-comments' = 'true',
> > > > > > 'csv.ignore-parse-errors' = 'true'
> > > > > >
> > > > > > would not work though as the `format` prefix is mandatory in the
> > sources
> > > > > > as only the properties with format will be passed to the format
> > factory
> > > > in
> > > > > > majority of cases. We already have some implicit contracts.
> > > > > >
> > > > > > @Forward I did not necessarily get the example. Aren't json and
> > bson two
> > > > > > separate formats? Do you mean you can have those two at the same
> > time?
> > > > Why
> > > > > > do you need to differentiate the options for each? The way I see
> > it is:
> > > > > >
> > > > > > ‘format(.name)' = 'json',
> > > > > > ‘format.fail-on-missing-field' = 'false'
> > > > > >
> > > > > > or
> > > > > >
> > > > > > ‘format(.name)' = 'bson',
> > > > > > ‘format.fail-on-missing-field' = 'false'
> > > > > >
> > > > > > @Benchao I'd be fine with any of name, kind, type(this we already
> > had in
> > > > > > the past)
> > > > > >
> > > > > > Best,
> > > > > > Dawid
> > > > > >
> > > > > > On 30/04/2020 04:17, Forward Xu wrote:
> > > > > >
> > > > > > Here I have a little doubt. At present, our json only supports
> the
> > > > > > conventional json format. If we need to implement json with bson,
> > json
> > > > with
> > > > > > avro, etc., how should we express it?
> > > > > > Do you need like the following:
> > > > > >
> > > > > > ‘format.name' = 'json',
> > > > > >
> > > > > > ‘format.json.fail-on-missing-field' = 'false'
> > > > > >
> > > > > >
> > > > > > ‘format.name' = 'bson',
> > > > > >
> > > > > > ‘format.bson.fail-on-missing-field' = ‘false'
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Forward
> > > > > >
> > > > > > Benchao Li <[hidden email]> <[hidden email]>
> > 于2020年4月30日周四
> > > > 上午9:58写道:
> > > > > >
> > > > > >
> > > > > > Thanks Timo for staring the discussion.
> > > > > >
> > > > > > Generally I like the idea to keep the config align with a
> standard
> > like
> > > > > > json/yaml.
> > > > > >
> > > > > > From the user's perspective, I don't use table configs from a
> > config
> > > > file
> > > > > > like yaml or json for now,
> > > > > > And it's ok to change it to yaml like style. Actually we didn't
> > know
> > > > that
> > > > > > this could be a yaml like
> > > > > > configuration hierarchy. If it has a hierarchy, we maybe consider
> > that
> > > > in
> > > > > > the future to load the
> > > > > > config from a yaml/json file.
> > > > > >
> > > > > > Regarding the name,
> > > > > > 'format.kind' looks fine to me. However there is another name
> from
> > the
> > > > top
> > > > > > of my head:
> > > > > > 'format.name', WDYT?
> > > > > >
> > > > > > Dawid Wysakowicz <[hidden email]> <
> [hidden email]>
> > > > 于2020年4月29日周三 下午11:56写道:
> > > > > >
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I also wanted to share my opinion.
> > > > > >
> > > > > > When talking about a ConfigOption hierarchy we use for
> configuring
> > Flink
> > > > > > cluster I would be a strong advocate for keeping a
> > yaml/hocon/json/...
> > > > > > compatible style. Those options are primarily read from a file
> and
> > thus
> > > > > > should at least try to follow common practices for nested formats
> > if we
> > > > > > ever decide to switch to one.
> > > > > >
> > > > > > Here the question is about the properties we use in SQL
> > statements. The
> > > > > > origin/destination of these usually will be external catalog,
> > usually in
> > > > > >
> > > > > > a
> > > > > >
> > > > > > flattened(key/value) representation so I agree it is not as
> > important as
> > > > > >
> > > > > > in
> > > > > >
> > > > > > the aforementioned case. Nevertheless having a yaml based catalog
> > or
> > > > > >
> > > > > > being
> > > > > >
> > > > > > able to have e.g. yaml based snapshots of a catalog in my opinion
> > is
> > > > > > appealing. At the same time cost of being able to have a nice
> > > > > > yaml/hocon/json representation is just adding a single suffix to
> a
> > > > > > single(at most 2 key + value) property. The question is between
> > `format`
> > > > > >
> > > > > > =
> > > > > >
> > > > > > `json` vs `format.kind` = `json`. That said I'd be slighty in
> > favor of
> > > > > > doing it.
> > > > > >
> > > > > > Just to have a full picture. Both cases can be represented in
> > yaml, but
> > > > > > the difference is significant:
> > > > > > format: 'json'
> > > > > > format.option: 'value'
> > > > > >
> > > > > > vs
> > > > > > format:
> > > > > > kind: 'json'
> > > > > >
> > > > > > option: 'value'
> > > > > >
> > > > > > Best,
> > > > > > Dawid
> > > > > >
> > > > > > On 29/04/2020 17:13, Flavio Pompermaier wrote:
> > > > > >
> > > > > > Personally I don't have any preference here. Compliance wih
> > standard
> > > > > >
> > > > > > YAML
> > > > > >
> > > > > > parser is probably more important
> > > > > >
> > > > > > On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> <
> > > > [hidden email]> wrote:
> > > > > >
> > > > > >
> > > > > > From a user's perspective, I prefer the shorter one
> "format=json",
> > > > > >
> > > > > > because
> > > > > >
> > > > > > it's more concise and straightforward. The "kind" is redundant
> for
> > > > > >
> > > > > > users.
> > > > > >
> > > > > > Is there a real case requires to represent the configuration in
> > JSON
> > > > > > style?
> > > > > > As far as I can see, I don't see such requirement, and everything
> > works
> > > > > > fine by now.
> > > > > >
> > > > > > So I'm in favor of "format=json". But if the community insist to
> > follow
> > > > > > code style on this, I'm also fine with the longer one.
> > > > > >
> > > > > > Btw, I also CC user mailing list to listen more user's feedback.
> > > > > >
> > > > > > Because I
> > > > > >
> > > > > > think this is relative to usability.
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <
> [hidden email]>
> > <
> > > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > >
> > > > > > > Therefore, should we advocate instead:
> > > > > > >
> > > > > > > 'format.kind' = 'json',
> > > > > > > 'format.fail-on-missing-field' = 'false'
> > > > > >
> > > > > > Yes. That's pretty much it.
> > > > > >
> > > > > > This is reasonable important to nail down as with such
> violations I
> > > > > > believe we could not actually switch to a standard YAML parser.
> > > > > >
> > > > > > On 29/04/2020 16:05, Timo Walther wrote:
> > > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > discussions around ConfigOption seem to be very popular recently.
> > > > > >
> > > > > > So I
> > > > > >
> > > > > > would also like to get some opinions on a different topic.
> > > > > >
> > > > > > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > > > > > agreed on the following DDL syntax:
> > > > > >
> > > > > > CREATE TABLE fs_table (
> > > > > > ...
> > > > > > ) WITH (
> > > > > > 'connector' = 'filesystem',
> > > > > > 'path' = 'file:///path/to/whatever',
> > > > > > 'format' = 'csv',
> > > > > > 'format.allow-comments' = 'true',
> > > > > > 'format.ignore-parse-errors' = 'true'
> > > > > > );
> > > > > >
> > > > > > Of course this is slightly different from regular Flink core
> > > > > > configuration but a connector still needs to be configured based
> on
> > > > > > these options.
> > > > > >
> > > > > > However, I think this FLIP violates our code style guidelines
> > > > > >
> > > > > > because
> > > > > >
> > > > > > 'format' = 'json',
> > > > > > 'format.fail-on-missing-field' = 'false'
> > > > > >
> > > > > > is an invalid hierarchy. `format` cannot be a string and a
> > top-level
> > > > > > object at the same time.
> > > > > >
> > > > > > We have similar problems in our runtime configuration:
> > > > > >
> > > > > > state.backend=
> > > > > > state.backend.incremental=
> > > > > > restart-strategy=
> > > > > > restart-strategy.fixed-delay.delay=
> > > > > > high-availability=
> > > > > > high-availability.cluster-id=
> > > > > >
> > > > > > The code style guide states "Think of the configuration as nested
> > > > > > objects (JSON style)". So such hierarchies cannot be represented
> in
> > > > > >
> > > > > > a
> > > > > >
> > > > > > nested JSON style.
> > > > > >
> > > > > > Therefore, should we advocate instead:
> > > > > >
> > > > > > 'format.kind' = 'json',
> > > > > > 'format.fail-on-missing-field' = 'false'
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > Thanks,
> > > > > > Timo
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > > >
> > > >
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Benchao Li
> > > > > > School of Electronics Engineering and Computer Science, Peking
> > > > UniversityTel:+86-15650713730
> > > > > > Email: [hidden email]; [hidden email]
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> >
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Timo Walther-2
Cool, so let's go with:

format = json
json.fail-on-missing-field = true
json.ignore-parse-error = true

value.format = json
value.json.fail-on-missing-field = true
value.json.ignore-parse-error = true

Regards,
Timo


On 06.05.20 12:39, Jingsong Li wrote:

> Hi,
>
> +1 to:
> format = parquet
> parquet.compression = ...
> parquet.block.size = ...
> parquet.page.size = ...
>
> For the formats like parquet and orc,
> Not just Flink itself, but this way also let Flink keys compatible with the
> property keys of Hadoop / Hive / Spark.
>
> And like Jark said, this way works for Kafka key value too.
>
> Best,
> Jingsong Lee
>
> On Wed, May 6, 2020 at 6:19 PM Jark Wu <[hidden email]> wrote:
>
>> Hi,
>>
>> I think Timo proposed a good idea to make both side happy. That is:
>>
>> format = json
>> json.fail-on-missing-field = true
>> json.ignore-parse-error = true
>>
>> value.format = json
>> value.json.fail-on-missing-field = true
>> value.json.ignore-parse-error = true
>>
>> This is a valid hierarchies. Besides, it's more clear that the option
>> belongs to a specific component (i.e. json).
>> This will be more readable when we introducing more formats, e.g. parquet.
>>
>> format = parquet
>> parquet.compression = ...
>> parquet.block.size = ...
>> parquet.page.size = ...
>>
>> is more readable than current style:
>>
>> format = parquet
>> format.compression = ...
>> format.block.size = ...
>> format.page.size = ...
>>
>> To sum up, I'm +1 to use "format = json",  "json.fail-on-missing-field =
>> true".
>>
>> Best,
>> Jark
>>
>> On Wed, 6 May 2020 at 17:12, Danny Chan <[hidden email]> wrote:
>>
>>> Hi, everyone ~
>>>
>>> Allows me to share some thoughts here.
>>>
>>> Personally i think for SQL, "format" is obviously better than "
>> format.name",
>>> it is more concise and straight-forward, similar with Presto FORMAT[2]
>> and
>>> KSQL VALUE_FORMAT[1]; i think we move from "connector.type" to
>> "connector"
>>> for the same reason, the "type" or "name" suffix is implicit, SQL syntax
>>> like the DDL is a top-level user API, so from my side keeping good
>>> user-friendly syntax is more important.
>>>
>>> @Timo I'm big +1 for the a good code style guide, but that does not mean
>>> we should go for a json-style table options in the DDL, the DDL could
>> have
>>> its own contract. Can we move "represent these config options in YAML" to
>>> another topic ? Otherwise, how should we handle the "connector" key,
>> should
>>> we prefix all the table options with "connector" ? The original inention
>> of
>>> FLIP-122 is to remove some redundant prefix/suffix of the table options
>>> because they are obviously implicit there, and the "connector." prefix
>> and
>>> the ".type" or ".name" suffix are the ones we most want to delete.
>>>
>>> @Dawid Although ".type" is just another 4 characters, but we force the
>> SQL
>>> users to do the thing that is obvious reduadant, i know serialize catalog
>>> table to YAML or use the options in DataStream has similar keys request,
>>> but they are different use cases that i believe many SQL user would not
>>> encounter, that means we force many users to obey rules for cases they
>>> would never have.
>>>
>>>
>>> [1] https://docs.ksqldb.io/en/latest/developer-guide/create-a-table/
>>> [2] https://prestodb.io/docs/current/sql/create-table.html
>>>
>>> Best,
>>> Danny Chan
>>> 在 2020年5月4日 +0800 PM11:34,Till Rohrmann <[hidden email]>,写道:
>>>> Hi everyone,
>>>>
>>>> I like Timo's proposal to organize our configuration more hierarchical
>>>> since this is what the coding guide specifies. The benefit I see is
>> that
>>>> config options belonging to the same concept will be found in the same
>>>> nested object. Moreover, it will be possible to split the configuration
>>>> into unrelated parts which are fed to the respective components. That
>> way
>>>> one has a much better separation of concern since component A cannot
>> read
>>>> the configuration of component B.
>>>>
>>>> Concerning Timo's last two proposals:
>>>>
>>>> If fail-on-missing-field is a common configuration shared by all
>> formats,
>>>> then I would go with the first option:
>>>>
>>>> format.kind: json
>>>> format.fail-on-missing-field: true
>>>>
>>>> If fail-on-missing-field is specific for json, then one could go with
>>>>
>>>> format: json
>>>> json.fail-on-missing-field: true
>>>>
>>>> or
>>>>
>>>> format.kind: json
>>>> format.json.fail-on-missing-field: true
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>>
>>>> On Fri, May 1, 2020 at 11:55 AM Timo Walther <[hidden email]>
>> wrote:
>>>>
>>>>> Hi Jark,
>>>>>
>>>>> yes, in theory every connector can design options as they like. But
>> for
>>>>> user experience and good coding style we should be consistent in
>> Flink
>>>>> connectors and configuration. Because implementers of new connectors
>>>>> will copy the design of existing ones.
>>>>>
>>>>> Furthermore, I could image that people in the DataStream API would
>> also
>>>>> like to configure their connector based on options in the near
>> future.
>>>>> It might be the case that Flink DataStream API connectors will reuse
>>> the
>>>>> ConfigOptions from Table API for consistency.
>>>>>
>>>>> I'm favoring either:
>>>>>
>>>>> format.kind = json
>>>>> format.fail-on-missing-field: true
>>>>>
>>>>> Or:
>>>>>
>>>>> format = json
>>>>> json.fail-on-missing-field: true
>>>>>
>>>>> Both are valid hierarchies.
>>>>>
>>>>> Regards,
>>>>> Timo
>>>>>
>>>>>
>>>>> On 30.04.20 17:57, Jark Wu wrote:
>>>>>> Hi Dawid,
>>>>>>
>>>>>> I just want to mention one of your response,
>>>>>>
>>>>>>> What you described with
>>>>>>> 'format' = 'csv',
>>>>>>> 'csv.allow-comments' = 'true',
>>>>>>> 'csv.ignore-parse-errors' = 'true'
>>>>>>> would not work though as the `format` prefix is mandatory in the
>>> sources
>>>>>> as only the properties with format
>>>>>>> will be passed to the format factory in majority of cases. We
>>> already
>>>>>> have some implicit contracts.
>>>>>>
>>>>>> IIUC, in FLIP-95 and FLIP-122, the property key style are totally
>>> decided
>>>>>> by connectors, not the framework.
>>>>>> So I custom connector can define above properties, and extract the
>>> value
>>>>> of
>>>>>> 'format', i.e. 'csv', to find the format factory.
>>>>>> And extract the properties with `csv.` prefix and remove the
>> prefix,
>>> and
>>>>>> pass the properties (e.g. 'allow-comments' = 'true')
>>>>>> into the format factory to create format.
>>>>>>
>>>>>> So there is no a strict guarantee to have a "nested JSON style"
>>>>> properties.
>>>>>> Users can still develop a custom connector with this
>>>>>> un-hierarchy properties and works well.
>>>>>>
>>>>>> 'format' = 'json',
>>>>>> 'format.fail-on-missing-field' = 'false'
>>>>>>
>>>>>> Best,
>>>>>> Jark
>>>>>>
>>>>>>
>>>>>> On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz <
>>> [hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'd like to start with a comment that I am ok with the current
>>> state of
>>>>>>> the FLIP-122 if there is a strong preference for it.
>> Nevertheless I
>>>>> still
>>>>>>> like the idea of adding `type` to the `format` to have it as
>>>>> `format.type`
>>>>>>> = `json`.
>>>>>>>
>>>>>>> I wanted to clarify a few things though:
>>>>>>>
>>>>>>> @Jingsong As far as I see it most of the users copy/paste the
>>> properties
>>>>>>> from the documentation to the SQL, so I don't think additional
>> four
>>>>>>> characters are too cumbersome. Plus if you force the additional
>>> suffix
>>>>> onto
>>>>>>> all the options of a format you introduce way more boilerplate
>>> than if
>>>>> we
>>>>>>> added the `type/kind/name`
>>>>>>>
>>>>>>> @Kurt I agree that we cannot force it, but I think it is more of
>> a
>>>>>>> question to set standards/implicit contracts on the properties.
>>> What you
>>>>>>> described with
>>>>>>> 'format' = 'csv',
>>>>>>> 'csv.allow-comments' = 'true',
>>>>>>> 'csv.ignore-parse-errors' = 'true'
>>>>>>>
>>>>>>> would not work though as the `format` prefix is mandatory in the
>>> sources
>>>>>>> as only the properties with format will be passed to the format
>>> factory
>>>>> in
>>>>>>> majority of cases. We already have some implicit contracts.
>>>>>>>
>>>>>>> @Forward I did not necessarily get the example. Aren't json and
>>> bson two
>>>>>>> separate formats? Do you mean you can have those two at the same
>>> time?
>>>>> Why
>>>>>>> do you need to differentiate the options for each? The way I see
>>> it is:
>>>>>>>
>>>>>>> ‘format(.name)' = 'json',
>>>>>>> ‘format.fail-on-missing-field' = 'false'
>>>>>>>
>>>>>>> or
>>>>>>>
>>>>>>> ‘format(.name)' = 'bson',
>>>>>>> ‘format.fail-on-missing-field' = 'false'
>>>>>>>
>>>>>>> @Benchao I'd be fine with any of name, kind, type(this we already
>>> had in
>>>>>>> the past)
>>>>>>>
>>>>>>> Best,
>>>>>>> Dawid
>>>>>>>
>>>>>>> On 30/04/2020 04:17, Forward Xu wrote:
>>>>>>>
>>>>>>> Here I have a little doubt. At present, our json only supports
>> the
>>>>>>> conventional json format. If we need to implement json with bson,
>>> json
>>>>> with
>>>>>>> avro, etc., how should we express it?
>>>>>>> Do you need like the following:
>>>>>>>
>>>>>>> ‘format.name' = 'json',
>>>>>>>
>>>>>>> ‘format.json.fail-on-missing-field' = 'false'
>>>>>>>
>>>>>>>
>>>>>>> ‘format.name' = 'bson',
>>>>>>>
>>>>>>> ‘format.bson.fail-on-missing-field' = ‘false'
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Forward
>>>>>>>
>>>>>>> Benchao Li <[hidden email]> <[hidden email]>
>>> 于2020年4月30日周四
>>>>> 上午9:58写道:
>>>>>>>
>>>>>>>
>>>>>>> Thanks Timo for staring the discussion.
>>>>>>>
>>>>>>> Generally I like the idea to keep the config align with a
>> standard
>>> like
>>>>>>> json/yaml.
>>>>>>>
>>>>>>>  From the user's perspective, I don't use table configs from a
>>> config
>>>>> file
>>>>>>> like yaml or json for now,
>>>>>>> And it's ok to change it to yaml like style. Actually we didn't
>>> know
>>>>> that
>>>>>>> this could be a yaml like
>>>>>>> configuration hierarchy. If it has a hierarchy, we maybe consider
>>> that
>>>>> in
>>>>>>> the future to load the
>>>>>>> config from a yaml/json file.
>>>>>>>
>>>>>>> Regarding the name,
>>>>>>> 'format.kind' looks fine to me. However there is another name
>> from
>>> the
>>>>> top
>>>>>>> of my head:
>>>>>>> 'format.name', WDYT?
>>>>>>>
>>>>>>> Dawid Wysakowicz <[hidden email]> <
>> [hidden email]>
>>>>> 于2020年4月29日周三 下午11:56写道:
>>>>>>>
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I also wanted to share my opinion.
>>>>>>>
>>>>>>> When talking about a ConfigOption hierarchy we use for
>> configuring
>>> Flink
>>>>>>> cluster I would be a strong advocate for keeping a
>>> yaml/hocon/json/...
>>>>>>> compatible style. Those options are primarily read from a file
>> and
>>> thus
>>>>>>> should at least try to follow common practices for nested formats
>>> if we
>>>>>>> ever decide to switch to one.
>>>>>>>
>>>>>>> Here the question is about the properties we use in SQL
>>> statements. The
>>>>>>> origin/destination of these usually will be external catalog,
>>> usually in
>>>>>>>
>>>>>>> a
>>>>>>>
>>>>>>> flattened(key/value) representation so I agree it is not as
>>> important as
>>>>>>>
>>>>>>> in
>>>>>>>
>>>>>>> the aforementioned case. Nevertheless having a yaml based catalog
>>> or
>>>>>>>
>>>>>>> being
>>>>>>>
>>>>>>> able to have e.g. yaml based snapshots of a catalog in my opinion
>>> is
>>>>>>> appealing. At the same time cost of being able to have a nice
>>>>>>> yaml/hocon/json representation is just adding a single suffix to
>> a
>>>>>>> single(at most 2 key + value) property. The question is between
>>> `format`
>>>>>>>
>>>>>>> =
>>>>>>>
>>>>>>> `json` vs `format.kind` = `json`. That said I'd be slighty in
>>> favor of
>>>>>>> doing it.
>>>>>>>
>>>>>>> Just to have a full picture. Both cases can be represented in
>>> yaml, but
>>>>>>> the difference is significant:
>>>>>>> format: 'json'
>>>>>>> format.option: 'value'
>>>>>>>
>>>>>>> vs
>>>>>>> format:
>>>>>>> kind: 'json'
>>>>>>>
>>>>>>> option: 'value'
>>>>>>>
>>>>>>> Best,
>>>>>>> Dawid
>>>>>>>
>>>>>>> On 29/04/2020 17:13, Flavio Pompermaier wrote:
>>>>>>>
>>>>>>> Personally I don't have any preference here. Compliance wih
>>> standard
>>>>>>>
>>>>>>> YAML
>>>>>>>
>>>>>>> parser is probably more important
>>>>>>>
>>>>>>> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> <
>>>>> [hidden email]> wrote:
>>>>>>>
>>>>>>>
>>>>>>>  From a user's perspective, I prefer the shorter one
>> "format=json",
>>>>>>>
>>>>>>> because
>>>>>>>
>>>>>>> it's more concise and straightforward. The "kind" is redundant
>> for
>>>>>>>
>>>>>>> users.
>>>>>>>
>>>>>>> Is there a real case requires to represent the configuration in
>>> JSON
>>>>>>> style?
>>>>>>> As far as I can see, I don't see such requirement, and everything
>>> works
>>>>>>> fine by now.
>>>>>>>
>>>>>>> So I'm in favor of "format=json". But if the community insist to
>>> follow
>>>>>>> code style on this, I'm also fine with the longer one.
>>>>>>>
>>>>>>> Btw, I also CC user mailing list to listen more user's feedback.
>>>>>>>
>>>>>>> Because I
>>>>>>>
>>>>>>> think this is relative to usability.
>>>>>>>
>>>>>>> Best,
>>>>>>> Jark
>>>>>>>
>>>>>>> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <
>> [hidden email]>
>>> <
>>>>> [hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Therefore, should we advocate instead:
>>>>>>>>
>>>>>>>> 'format.kind' = 'json',
>>>>>>>> 'format.fail-on-missing-field' = 'false'
>>>>>>>
>>>>>>> Yes. That's pretty much it.
>>>>>>>
>>>>>>> This is reasonable important to nail down as with such
>> violations I
>>>>>>> believe we could not actually switch to a standard YAML parser.
>>>>>>>
>>>>>>> On 29/04/2020 16:05, Timo Walther wrote:
>>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> discussions around ConfigOption seem to be very popular recently.
>>>>>>>
>>>>>>> So I
>>>>>>>
>>>>>>> would also like to get some opinions on a different topic.
>>>>>>>
>>>>>>> How do we represent hierarchies in ConfigOption? In FLIP-122, we
>>>>>>> agreed on the following DDL syntax:
>>>>>>>
>>>>>>> CREATE TABLE fs_table (
>>>>>>> ...
>>>>>>> ) WITH (
>>>>>>> 'connector' = 'filesystem',
>>>>>>> 'path' = 'file:///path/to/whatever',
>>>>>>> 'format' = 'csv',
>>>>>>> 'format.allow-comments' = 'true',
>>>>>>> 'format.ignore-parse-errors' = 'true'
>>>>>>> );
>>>>>>>
>>>>>>> Of course this is slightly different from regular Flink core
>>>>>>> configuration but a connector still needs to be configured based
>> on
>>>>>>> these options.
>>>>>>>
>>>>>>> However, I think this FLIP violates our code style guidelines
>>>>>>>
>>>>>>> because
>>>>>>>
>>>>>>> 'format' = 'json',
>>>>>>> 'format.fail-on-missing-field' = 'false'
>>>>>>>
>>>>>>> is an invalid hierarchy. `format` cannot be a string and a
>>> top-level
>>>>>>> object at the same time.
>>>>>>>
>>>>>>> We have similar problems in our runtime configuration:
>>>>>>>
>>>>>>> state.backend=
>>>>>>> state.backend.incremental=
>>>>>>> restart-strategy=
>>>>>>> restart-strategy.fixed-delay.delay=
>>>>>>> high-availability=
>>>>>>> high-availability.cluster-id=
>>>>>>>
>>>>>>> The code style guide states "Think of the configuration as nested
>>>>>>> objects (JSON style)". So such hierarchies cannot be represented
>> in
>>>>>>>
>>>>>>> a
>>>>>>>
>>>>>>> nested JSON style.
>>>>>>>
>>>>>>> Therefore, should we advocate instead:
>>>>>>>
>>>>>>> 'format.kind' = 'json',
>>>>>>> 'format.fail-on-missing-field' = 'false'
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Timo
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>
>> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Benchao Li
>>>>>>> School of Electronics Engineering and Computer Science, Peking
>>>>> UniversityTel:+86-15650713730
>>>>>>> Email: [hidden email]; [hidden email]
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Hierarchies in ConfigOption

Jark Wu-2
Thanks all for the discussion, I have updated FLIP-105 and FLIP-122 to use
the new format option keys.

FLIP-105:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427289
FLIP-122:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory

Best,
Jark

On Wed, 6 May 2020 at 20:37, Timo Walther <[hidden email]> wrote:

> Cool, so let's go with:
>
> format = json
> json.fail-on-missing-field = true
> json.ignore-parse-error = true
>
> value.format = json
> value.json.fail-on-missing-field = true
> value.json.ignore-parse-error = true
>
> Regards,
> Timo
>
>
> On 06.05.20 12:39, Jingsong Li wrote:
> > Hi,
> >
> > +1 to:
> > format = parquet
> > parquet.compression = ...
> > parquet.block.size = ...
> > parquet.page.size = ...
> >
> > For the formats like parquet and orc,
> > Not just Flink itself, but this way also let Flink keys compatible with
> the
> > property keys of Hadoop / Hive / Spark.
> >
> > And like Jark said, this way works for Kafka key value too.
> >
> > Best,
> > Jingsong Lee
> >
> > On Wed, May 6, 2020 at 6:19 PM Jark Wu <[hidden email]> wrote:
> >
> >> Hi,
> >>
> >> I think Timo proposed a good idea to make both side happy. That is:
> >>
> >> format = json
> >> json.fail-on-missing-field = true
> >> json.ignore-parse-error = true
> >>
> >> value.format = json
> >> value.json.fail-on-missing-field = true
> >> value.json.ignore-parse-error = true
> >>
> >> This is a valid hierarchies. Besides, it's more clear that the option
> >> belongs to a specific component (i.e. json).
> >> This will be more readable when we introducing more formats, e.g.
> parquet.
> >>
> >> format = parquet
> >> parquet.compression = ...
> >> parquet.block.size = ...
> >> parquet.page.size = ...
> >>
> >> is more readable than current style:
> >>
> >> format = parquet
> >> format.compression = ...
> >> format.block.size = ...
> >> format.page.size = ...
> >>
> >> To sum up, I'm +1 to use "format = json",  "json.fail-on-missing-field =
> >> true".
> >>
> >> Best,
> >> Jark
> >>
> >> On Wed, 6 May 2020 at 17:12, Danny Chan <[hidden email]> wrote:
> >>
> >>> Hi, everyone ~
> >>>
> >>> Allows me to share some thoughts here.
> >>>
> >>> Personally i think for SQL, "format" is obviously better than "
> >> format.name",
> >>> it is more concise and straight-forward, similar with Presto FORMAT[2]
> >> and
> >>> KSQL VALUE_FORMAT[1]; i think we move from "connector.type" to
> >> "connector"
> >>> for the same reason, the "type" or "name" suffix is implicit, SQL
> syntax
> >>> like the DDL is a top-level user API, so from my side keeping good
> >>> user-friendly syntax is more important.
> >>>
> >>> @Timo I'm big +1 for the a good code style guide, but that does not
> mean
> >>> we should go for a json-style table options in the DDL, the DDL could
> >> have
> >>> its own contract. Can we move "represent these config options in YAML"
> to
> >>> another topic ? Otherwise, how should we handle the "connector" key,
> >> should
> >>> we prefix all the table options with "connector" ? The original
> inention
> >> of
> >>> FLIP-122 is to remove some redundant prefix/suffix of the table options
> >>> because they are obviously implicit there, and the "connector." prefix
> >> and
> >>> the ".type" or ".name" suffix are the ones we most want to delete.
> >>>
> >>> @Dawid Although ".type" is just another 4 characters, but we force the
> >> SQL
> >>> users to do the thing that is obvious reduadant, i know serialize
> catalog
> >>> table to YAML or use the options in DataStream has similar keys
> request,
> >>> but they are different use cases that i believe many SQL user would not
> >>> encounter, that means we force many users to obey rules for cases they
> >>> would never have.
> >>>
> >>>
> >>> [1] https://docs.ksqldb.io/en/latest/developer-guide/create-a-table/
> >>> [2] https://prestodb.io/docs/current/sql/create-table.html
> >>>
> >>> Best,
> >>> Danny Chan
> >>> 在 2020年5月4日 +0800 PM11:34,Till Rohrmann <[hidden email]>,写道:
> >>>> Hi everyone,
> >>>>
> >>>> I like Timo's proposal to organize our configuration more hierarchical
> >>>> since this is what the coding guide specifies. The benefit I see is
> >> that
> >>>> config options belonging to the same concept will be found in the same
> >>>> nested object. Moreover, it will be possible to split the
> configuration
> >>>> into unrelated parts which are fed to the respective components. That
> >> way
> >>>> one has a much better separation of concern since component A cannot
> >> read
> >>>> the configuration of component B.
> >>>>
> >>>> Concerning Timo's last two proposals:
> >>>>
> >>>> If fail-on-missing-field is a common configuration shared by all
> >> formats,
> >>>> then I would go with the first option:
> >>>>
> >>>> format.kind: json
> >>>> format.fail-on-missing-field: true
> >>>>
> >>>> If fail-on-missing-field is specific for json, then one could go with
> >>>>
> >>>> format: json
> >>>> json.fail-on-missing-field: true
> >>>>
> >>>> or
> >>>>
> >>>> format.kind: json
> >>>> format.json.fail-on-missing-field: true
> >>>>
> >>>> Cheers,
> >>>> Till
> >>>>
> >>>>
> >>>> On Fri, May 1, 2020 at 11:55 AM Timo Walther <[hidden email]>
> >> wrote:
> >>>>
> >>>>> Hi Jark,
> >>>>>
> >>>>> yes, in theory every connector can design options as they like. But
> >> for
> >>>>> user experience and good coding style we should be consistent in
> >> Flink
> >>>>> connectors and configuration. Because implementers of new connectors
> >>>>> will copy the design of existing ones.
> >>>>>
> >>>>> Furthermore, I could image that people in the DataStream API would
> >> also
> >>>>> like to configure their connector based on options in the near
> >> future.
> >>>>> It might be the case that Flink DataStream API connectors will reuse
> >>> the
> >>>>> ConfigOptions from Table API for consistency.
> >>>>>
> >>>>> I'm favoring either:
> >>>>>
> >>>>> format.kind = json
> >>>>> format.fail-on-missing-field: true
> >>>>>
> >>>>> Or:
> >>>>>
> >>>>> format = json
> >>>>> json.fail-on-missing-field: true
> >>>>>
> >>>>> Both are valid hierarchies.
> >>>>>
> >>>>> Regards,
> >>>>> Timo
> >>>>>
> >>>>>
> >>>>> On 30.04.20 17:57, Jark Wu wrote:
> >>>>>> Hi Dawid,
> >>>>>>
> >>>>>> I just want to mention one of your response,
> >>>>>>
> >>>>>>> What you described with
> >>>>>>> 'format' = 'csv',
> >>>>>>> 'csv.allow-comments' = 'true',
> >>>>>>> 'csv.ignore-parse-errors' = 'true'
> >>>>>>> would not work though as the `format` prefix is mandatory in the
> >>> sources
> >>>>>> as only the properties with format
> >>>>>>> will be passed to the format factory in majority of cases. We
> >>> already
> >>>>>> have some implicit contracts.
> >>>>>>
> >>>>>> IIUC, in FLIP-95 and FLIP-122, the property key style are totally
> >>> decided
> >>>>>> by connectors, not the framework.
> >>>>>> So I custom connector can define above properties, and extract the
> >>> value
> >>>>> of
> >>>>>> 'format', i.e. 'csv', to find the format factory.
> >>>>>> And extract the properties with `csv.` prefix and remove the
> >> prefix,
> >>> and
> >>>>>> pass the properties (e.g. 'allow-comments' = 'true')
> >>>>>> into the format factory to create format.
> >>>>>>
> >>>>>> So there is no a strict guarantee to have a "nested JSON style"
> >>>>> properties.
> >>>>>> Users can still develop a custom connector with this
> >>>>>> un-hierarchy properties and works well.
> >>>>>>
> >>>>>> 'format' = 'json',
> >>>>>> 'format.fail-on-missing-field' = 'false'
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>>
> >>>>>> On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz <
> >>> [hidden email]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I'd like to start with a comment that I am ok with the current
> >>> state of
> >>>>>>> the FLIP-122 if there is a strong preference for it.
> >> Nevertheless I
> >>>>> still
> >>>>>>> like the idea of adding `type` to the `format` to have it as
> >>>>> `format.type`
> >>>>>>> = `json`.
> >>>>>>>
> >>>>>>> I wanted to clarify a few things though:
> >>>>>>>
> >>>>>>> @Jingsong As far as I see it most of the users copy/paste the
> >>> properties
> >>>>>>> from the documentation to the SQL, so I don't think additional
> >> four
> >>>>>>> characters are too cumbersome. Plus if you force the additional
> >>> suffix
> >>>>> onto
> >>>>>>> all the options of a format you introduce way more boilerplate
> >>> than if
> >>>>> we
> >>>>>>> added the `type/kind/name`
> >>>>>>>
> >>>>>>> @Kurt I agree that we cannot force it, but I think it is more of
> >> a
> >>>>>>> question to set standards/implicit contracts on the properties.
> >>> What you
> >>>>>>> described with
> >>>>>>> 'format' = 'csv',
> >>>>>>> 'csv.allow-comments' = 'true',
> >>>>>>> 'csv.ignore-parse-errors' = 'true'
> >>>>>>>
> >>>>>>> would not work though as the `format` prefix is mandatory in the
> >>> sources
> >>>>>>> as only the properties with format will be passed to the format
> >>> factory
> >>>>> in
> >>>>>>> majority of cases. We already have some implicit contracts.
> >>>>>>>
> >>>>>>> @Forward I did not necessarily get the example. Aren't json and
> >>> bson two
> >>>>>>> separate formats? Do you mean you can have those two at the same
> >>> time?
> >>>>> Why
> >>>>>>> do you need to differentiate the options for each? The way I see
> >>> it is:
> >>>>>>>
> >>>>>>> ‘format(.name)' = 'json',
> >>>>>>> ‘format.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>> or
> >>>>>>>
> >>>>>>> ‘format(.name)' = 'bson',
> >>>>>>> ‘format.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>> @Benchao I'd be fine with any of name, kind, type(this we already
> >>> had in
> >>>>>>> the past)
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Dawid
> >>>>>>>
> >>>>>>> On 30/04/2020 04:17, Forward Xu wrote:
> >>>>>>>
> >>>>>>> Here I have a little doubt. At present, our json only supports
> >> the
> >>>>>>> conventional json format. If we need to implement json with bson,
> >>> json
> >>>>> with
> >>>>>>> avro, etc., how should we express it?
> >>>>>>> Do you need like the following:
> >>>>>>>
> >>>>>>> ‘format.name' = 'json',
> >>>>>>>
> >>>>>>> ‘format.json.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>>
> >>>>>>> ‘format.name' = 'bson',
> >>>>>>>
> >>>>>>> ‘format.bson.fail-on-missing-field' = ‘false'
> >>>>>>>
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Forward
> >>>>>>>
> >>>>>>> Benchao Li <[hidden email]> <[hidden email]>
> >>> 于2020年4月30日周四
> >>>>> 上午9:58写道:
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks Timo for staring the discussion.
> >>>>>>>
> >>>>>>> Generally I like the idea to keep the config align with a
> >> standard
> >>> like
> >>>>>>> json/yaml.
> >>>>>>>
> >>>>>>>  From the user's perspective, I don't use table configs from a
> >>> config
> >>>>> file
> >>>>>>> like yaml or json for now,
> >>>>>>> And it's ok to change it to yaml like style. Actually we didn't
> >>> know
> >>>>> that
> >>>>>>> this could be a yaml like
> >>>>>>> configuration hierarchy. If it has a hierarchy, we maybe consider
> >>> that
> >>>>> in
> >>>>>>> the future to load the
> >>>>>>> config from a yaml/json file.
> >>>>>>>
> >>>>>>> Regarding the name,
> >>>>>>> 'format.kind' looks fine to me. However there is another name
> >> from
> >>> the
> >>>>> top
> >>>>>>> of my head:
> >>>>>>> 'format.name', WDYT?
> >>>>>>>
> >>>>>>> Dawid Wysakowicz <[hidden email]> <
> >> [hidden email]>
> >>>>> 于2020年4月29日周三 下午11:56写道:
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I also wanted to share my opinion.
> >>>>>>>
> >>>>>>> When talking about a ConfigOption hierarchy we use for
> >> configuring
> >>> Flink
> >>>>>>> cluster I would be a strong advocate for keeping a
> >>> yaml/hocon/json/...
> >>>>>>> compatible style. Those options are primarily read from a file
> >> and
> >>> thus
> >>>>>>> should at least try to follow common practices for nested formats
> >>> if we
> >>>>>>> ever decide to switch to one.
> >>>>>>>
> >>>>>>> Here the question is about the properties we use in SQL
> >>> statements. The
> >>>>>>> origin/destination of these usually will be external catalog,
> >>> usually in
> >>>>>>>
> >>>>>>> a
> >>>>>>>
> >>>>>>> flattened(key/value) representation so I agree it is not as
> >>> important as
> >>>>>>>
> >>>>>>> in
> >>>>>>>
> >>>>>>> the aforementioned case. Nevertheless having a yaml based catalog
> >>> or
> >>>>>>>
> >>>>>>> being
> >>>>>>>
> >>>>>>> able to have e.g. yaml based snapshots of a catalog in my opinion
> >>> is
> >>>>>>> appealing. At the same time cost of being able to have a nice
> >>>>>>> yaml/hocon/json representation is just adding a single suffix to
> >> a
> >>>>>>> single(at most 2 key + value) property. The question is between
> >>> `format`
> >>>>>>>
> >>>>>>> =
> >>>>>>>
> >>>>>>> `json` vs `format.kind` = `json`. That said I'd be slighty in
> >>> favor of
> >>>>>>> doing it.
> >>>>>>>
> >>>>>>> Just to have a full picture. Both cases can be represented in
> >>> yaml, but
> >>>>>>> the difference is significant:
> >>>>>>> format: 'json'
> >>>>>>> format.option: 'value'
> >>>>>>>
> >>>>>>> vs
> >>>>>>> format:
> >>>>>>> kind: 'json'
> >>>>>>>
> >>>>>>> option: 'value'
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Dawid
> >>>>>>>
> >>>>>>> On 29/04/2020 17:13, Flavio Pompermaier wrote:
> >>>>>>>
> >>>>>>> Personally I don't have any preference here. Compliance wih
> >>> standard
> >>>>>>>
> >>>>>>> YAML
> >>>>>>>
> >>>>>>> parser is probably more important
> >>>>>>>
> >>>>>>> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <[hidden email]> <
> >>>>> [hidden email]> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>  From a user's perspective, I prefer the shorter one
> >> "format=json",
> >>>>>>>
> >>>>>>> because
> >>>>>>>
> >>>>>>> it's more concise and straightforward. The "kind" is redundant
> >> for
> >>>>>>>
> >>>>>>> users.
> >>>>>>>
> >>>>>>> Is there a real case requires to represent the configuration in
> >>> JSON
> >>>>>>> style?
> >>>>>>> As far as I can see, I don't see such requirement, and everything
> >>> works
> >>>>>>> fine by now.
> >>>>>>>
> >>>>>>> So I'm in favor of "format=json". But if the community insist to
> >>> follow
> >>>>>>> code style on this, I'm also fine with the longer one.
> >>>>>>>
> >>>>>>> Btw, I also CC user mailing list to listen more user's feedback.
> >>>>>>>
> >>>>>>> Because I
> >>>>>>>
> >>>>>>> think this is relative to usability.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jark
> >>>>>>>
> >>>>>>> On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <
> >> [hidden email]>
> >>> <
> >>>>> [hidden email]>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>> Therefore, should we advocate instead:
> >>>>>>>>
> >>>>>>>> 'format.kind' = 'json',
> >>>>>>>> 'format.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>> Yes. That's pretty much it.
> >>>>>>>
> >>>>>>> This is reasonable important to nail down as with such
> >> violations I
> >>>>>>> believe we could not actually switch to a standard YAML parser.
> >>>>>>>
> >>>>>>> On 29/04/2020 16:05, Timo Walther wrote:
> >>>>>>>
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> discussions around ConfigOption seem to be very popular recently.
> >>>>>>>
> >>>>>>> So I
> >>>>>>>
> >>>>>>> would also like to get some opinions on a different topic.
> >>>>>>>
> >>>>>>> How do we represent hierarchies in ConfigOption? In FLIP-122, we
> >>>>>>> agreed on the following DDL syntax:
> >>>>>>>
> >>>>>>> CREATE TABLE fs_table (
> >>>>>>> ...
> >>>>>>> ) WITH (
> >>>>>>> 'connector' = 'filesystem',
> >>>>>>> 'path' = 'file:///path/to/whatever',
> >>>>>>> 'format' = 'csv',
> >>>>>>> 'format.allow-comments' = 'true',
> >>>>>>> 'format.ignore-parse-errors' = 'true'
> >>>>>>> );
> >>>>>>>
> >>>>>>> Of course this is slightly different from regular Flink core
> >>>>>>> configuration but a connector still needs to be configured based
> >> on
> >>>>>>> these options.
> >>>>>>>
> >>>>>>> However, I think this FLIP violates our code style guidelines
> >>>>>>>
> >>>>>>> because
> >>>>>>>
> >>>>>>> 'format' = 'json',
> >>>>>>> 'format.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>> is an invalid hierarchy. `format` cannot be a string and a
> >>> top-level
> >>>>>>> object at the same time.
> >>>>>>>
> >>>>>>> We have similar problems in our runtime configuration:
> >>>>>>>
> >>>>>>> state.backend=
> >>>>>>> state.backend.incremental=
> >>>>>>> restart-strategy=
> >>>>>>> restart-strategy.fixed-delay.delay=
> >>>>>>> high-availability=
> >>>>>>> high-availability.cluster-id=
> >>>>>>>
> >>>>>>> The code style guide states "Think of the configuration as nested
> >>>>>>> objects (JSON style)". So such hierarchies cannot be represented
> >> in
> >>>>>>>
> >>>>>>> a
> >>>>>>>
> >>>>>>> nested JSON style.
> >>>>>>>
> >>>>>>> Therefore, should we advocate instead:
> >>>>>>>
> >>>>>>> 'format.kind' = 'json',
> >>>>>>> 'format.fail-on-missing-field' = 'false'
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Timo
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Benchao Li
> >>>>>>> School of Electronics Engineering and Computer Science, Peking
> >>>>> UniversityTel:+86-15650713730
> >>>>>>> Email: [hidden email]; [hidden email]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >
> >
>
>