(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Flink SQL DDL Design

Classic

List

Threaded

46 messages Options

123

Shuyi Chen

[DISCUSS] Flink SQL DDL Design

Hi everyone,

SQL DDL support has been a long-time ask from the community. Current Flink
SQL support only DML (e.g. SELECT and INSERT statements). In its current
form, Flink SQL users still need to define/create table sources and sinks
programmatically in Java/Scala. Also, in SQL Client, without DDL support,
the current implementation does not allow dynamical creation of table, type
or functions with SQL, this adds friction for its adoption.

I drafted a design doc [1] with a few other community members that proposes
the design and implementation for adding DDL support in Flink. The initial
design considers DDL for table, view, type, library and function. It will
be great to get feedback on the design from the community, and align with
latest effort in unified SQL connector API [2] and Flink Hive integration
[3].

Any feedback is highly appreciated.

Thanks
Shuyi Chen

[1]
https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
[2]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
[3]
https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
--
"So you have to trust that the dots will somehow connect in your future."

bowen.li

Re: [DISCUSS] Flink SQL DDL Design

Thanks Shuyi!

I left some comments there. I think the design of SQL DDL and Flink-Hive
integration/External catalog enhancements will work closely with each
other. Hope we are well aligned on the directions of the two designs, and I
look forward to working with you guys on both!

Bowen

On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <[hidden email]> wrote:

> Hi everyone,
>
> SQL DDL support has been a long-time ask from the community. Current Flink
> SQL support only DML (e.g. SELECT and INSERT statements). In its current
> form, Flink SQL users still need to define/create table sources and sinks
> programmatically in Java/Scala. Also, in SQL Client, without DDL support,
> the current implementation does not allow dynamical creation of table, type
> or functions with SQL, this adds friction for its adoption.
>
> I drafted a design doc [1] with a few other community members that proposes
> the design and implementation for adding DDL support in Flink. The initial
> design considers DDL for table, view, type, library and function. It will
> be great to get feedback on the design from the community, and align with
> latest effort in unified SQL connector API [2] and Flink Hive integration
> [3].
>
> Any feedback is highly appreciated.
>
> Thanks
> Shuyi Chen
>
> [1]
>
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> [2]
>
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> [3]
>
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> --
> "So you have to trust that the dots will somehow connect in your future."
>

Rong Rong

Re: [DISCUSS] Flink SQL DDL Design

+1. Thanks for putting the proposal together Shuyi.

DDL has been brought up in a couple of times previously [1,2]. Utilizing
DDL will definitely be a great extension to the current Flink SQL to
systematically support some of the previously brought up features such as
[3]. And it will also be beneficial to see the document closely aligned
with the previous discussion for unified SQL connector API [4].

I also left a few comments on the doc. Looking forward to the alignment
with the other couple of efforts and contributing to them!

Best,
Rong

[1]
http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
[2]
http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E

[3] https://issues.apache.org/jira/browse/FLINK-8003
[4]
http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E

On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <[hidden email]> wrote:

> Thanks Shuyi!
>
> I left some comments there. I think the design of SQL DDL and Flink-Hive
> integration/External catalog enhancements will work closely with each
> other. Hope we are well aligned on the directions of the two designs, and I
> look forward to working with you guys on both!
>
> Bowen
>
>
> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <[hidden email]> wrote:
>
> > Hi everyone,
> >
> > SQL DDL support has been a long-time ask from the community. Current
> Flink
> > SQL support only DML (e.g. SELECT and INSERT statements). In its current
> > form, Flink SQL users still need to define/create table sources and sinks
> > programmatically in Java/Scala. Also, in SQL Client, without DDL support,
> > the current implementation does not allow dynamical creation of table,
> type
> > or functions with SQL, this adds friction for its adoption.
> >
> > I drafted a design doc [1] with a few other community members that
> proposes
> > the design and implementation for adding DDL support in Flink. The
> initial
> > design considers DDL for table, view, type, library and function. It will
> > be great to get feedback on the design from the community, and align with
> > latest effort in unified SQL connector API [2] and Flink Hive
> integration
> > [3].
> >
> > Any feedback is highly appreciated.
> >
> > Thanks
> > Shuyi Chen
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > [2]
> >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > [3]
> >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>

Dominik Wosiński

Re: [DISCUSS] Flink SQL DDL Design

+1, Thanks for the proposal.

I guess this is a long-awaited change. This can vastly increase the
functionalities of the SQL Client as it will be possible to use complex
extensions like for example those provided by Apache Bahir[1].

Best Regards,
Dom.

[1]
https://github.com/apache/bahir-flink

sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]> napisał(a):

> +1. Thanks for putting the proposal together Shuyi.
>
> DDL has been brought up in a couple of times previously [1,2]. Utilizing
> DDL will definitely be a great extension to the current Flink SQL to
> systematically support some of the previously brought up features such as
> [3]. And it will also be beneficial to see the document closely aligned
> with the previous discussion for unified SQL connector API [4].
>
> I also left a few comments on the doc. Looking forward to the alignment
> with the other couple of efforts and contributing to them!
>
> Best,
> Rong
>
> [1]
>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> [2]
>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
>
> [3] https://issues.apache.org/jira/browse/FLINK-8003
> [4]
>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
>
>
> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <[hidden email]> wrote:
>
> > Thanks Shuyi!
> >
> > I left some comments there. I think the design of SQL DDL and Flink-Hive
> > integration/External catalog enhancements will work closely with each
> > other. Hope we are well aligned on the directions of the two designs,
> and I
> > look forward to working with you guys on both!
> >
> > Bowen
> >
> >
> > On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <[hidden email]> wrote:
> >
> > > Hi everyone,
> > >
> > > SQL DDL support has been a long-time ask from the community. Current
> > Flink
> > > SQL support only DML (e.g. SELECT and INSERT statements). In its
> current
> > > form, Flink SQL users still need to define/create table sources and
> sinks
> > > programmatically in Java/Scala. Also, in SQL Client, without DDL
> support,
> > > the current implementation does not allow dynamical creation of table,
> > type
> > > or functions with SQL, this adds friction for its adoption.
> > >
> > > I drafted a design doc [1] with a few other community members that
> > proposes
> > > the design and implementation for adding DDL support in Flink. The
> > initial
> > > design considers DDL for table, view, type, library and function. It
> will
> > > be great to get feedback on the design from the community, and align
> with
> > > latest effort in unified SQL connector API [2] and Flink Hive
> > integration
> > > [3].
> > >
> > > Any feedback is highly appreciated.
> > >
> > > Thanks
> > > Shuyi Chen
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > [2]
> > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > [3]
> > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > --
> > > "So you have to trust that the dots will somehow connect in your
> future."
> > >
> >
>

wenlong.lwl

Re: [DISCUSS] Flink SQL DDL Design

Hi, Shuyi, thanks for the proposal.

I have two concerns about the table ddl:

1. how about remove the source/sink mark from the ddl, because it is not
necessary, the framework determine the table referred is a source or a sink
according to the context of the query using the table. it will be more
convenient for use defining a table which can be both a source and sink,
and more convenient for catalog to persistent and manage the meta infos.

2. how about just keeping one pure string map as parameters for table, like
create tabe Kafka10SourceTable (
intField INTEGER,
stringField VARCHAR(128),
longField BIGINT,
rowTimeField TIMESTAMP
) with (
connector.type = ’kafka’,
connector.property-version = ’1’,
connector.version = ’0.10’,
connector.properties.topic = ‘test-kafka-topic’,
connector.properties.startup-mode = ‘latest-offset’,
connector.properties.specific-offset = ‘offset’,
format.type = 'json'
format.prperties.version=’1’,
format.derive-schema = 'true'
);
Because:
1. in TableFactory, what user use is a string map properties, defining
parameters by string-map can be the closest way to mapping how user use the
parameters.
2. The table descriptor can be extended by user, like what is done in Kafka
and Json, it means that the parameter keys in connector or format can be
different in different implementation, we can not restrict the key in a
specified set, so we need a map in connector scope and a map in
connector.properties scope. why not just give user a single map, let them
put parameters in a format they like, which is also the simplest way to
implement DDL parser.
3. whether we can define a format clause or not, depends on the
implementation of the connector, using different clause in DDL may make a
misunderstanding that we can combine the connectors with arbitrary formats,
which may not work actually.

On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <[hidden email]> wrote:

> +1, Thanks for the proposal.
>
> I guess this is a long-awaited change. This can vastly increase the
> functionalities of the SQL Client as it will be possible to use complex
> extensions like for example those provided by Apache Bahir[1].
>
> Best Regards,
> Dom.
>
> [1]
> https://github.com/apache/bahir-flink
>
> sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]> napisał(a):
>
> > +1. Thanks for putting the proposal together Shuyi.
> >
> > DDL has been brought up in a couple of times previously [1,2]. Utilizing
> > DDL will definitely be a great extension to the current Flink SQL to
> > systematically support some of the previously brought up features such as
> > [3]. And it will also be beneficial to see the document closely aligned
> > with the previous discussion for unified SQL connector API [4].
> >
> > I also left a few comments on the doc. Looking forward to the alignment
> > with the other couple of efforts and contributing to them!
> >
> > Best,
> > Rong
> >
> > [1]
> >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > [2]
> >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> >
> > [3] https://issues.apache.org/jira/browse/FLINK-8003
> > [4]
> >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
> >
> >
> > On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <[hidden email]> wrote:
> >
> > > Thanks Shuyi!
> > >
> > > I left some comments there. I think the design of SQL DDL and
> Flink-Hive
> > > integration/External catalog enhancements will work closely with each
> > > other. Hope we are well aligned on the directions of the two designs,
> > and I
> > > look forward to working with you guys on both!
> > >
> > > Bowen
> > >
> > >
> > > On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <[hidden email]> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > SQL DDL support has been a long-time ask from the community. Current
> > > Flink
> > > > SQL support only DML (e.g. SELECT and INSERT statements). In its
> > current
> > > > form, Flink SQL users still need to define/create table sources and
> > sinks
> > > > programmatically in Java/Scala. Also, in SQL Client, without DDL
> > support,
> > > > the current implementation does not allow dynamical creation of
> table,
> > > type
> > > > or functions with SQL, this adds friction for its adoption.
> > > >
> > > > I drafted a design doc [1] with a few other community members that
> > > proposes
> > > > the design and implementation for adding DDL support in Flink. The
> > > initial
> > > > design considers DDL for table, view, type, library and function. It
> > will
> > > > be great to get feedback on the design from the community, and align
> > with
> > > > latest effort in unified SQL connector API [2] and Flink Hive
> > > integration
> > > > [3].
> > > >
> > > > Any feedback is highly appreciated.
> > > >
> > > > Thanks
> > > > Shuyi Chen
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > > [2]
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > [3]
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > > --
> > > > "So you have to trust that the dots will somehow connect in your
> > future."
> > > >
> > >
> >
>

Zhang, Xuefu

Re: [DISCUSS] Flink SQL DDL Design

Hi Shuyi,

I'm wondering if you folks still have the bandwidth working on this.

We have some dedicated resource and like to move this forward. We can collaborate.

Thanks,

Xuefu

------------------------------------------------------------------
发件人：wenlong.lwl<[hidden email]>
日　期：2018年11月05日 11:15:35
收件人：<[hidden email]>
主　题：Re: [DISCUSS] Flink SQL DDL Design

Hi, Shuyi, thanks for the proposal.

I have two concerns about the table ddl:

1. how about remove the source/sink mark from the ddl, because it is not
necessary, the framework determine the table referred is a source or a sink
according to the context of the query using the table. it will be more
convenient for use defining a table which can be both a source and sink,
and more convenient for catalog to persistent and manage the meta infos.

2. how about just keeping one pure string map as parameters for table, like
create tabe Kafka10SourceTable (
intField INTEGER,
stringField VARCHAR(128),
longField BIGINT,
rowTimeField TIMESTAMP
) with (
connector.type = ’kafka’,
connector.property-version = ’1’,
connector.version = ’0.10’,
connector.properties.topic = ‘test-kafka-topic’,
connector.properties.startup-mode = ‘latest-offset’,
connector.properties.specific-offset = ‘offset’,
format.type = 'json'
format.prperties.version=’1’,
format.derive-schema = 'true'
);
Because:
1. in TableFactory, what user use is a string map properties, defining
parameters by string-map can be the closest way to mapping how user use the
parameters.
2. The table descriptor can be extended by user, like what is done in Kafka
and Json, it means that the parameter keys in connector or format can be
different in different implementation, we can not restrict the key in a
specified set, so we need a map in connector scope and a map in
connector.properties scope. why not just give user a single map, let them
put parameters in a format they like, which is also the simplest way to
implement DDL parser.
3. whether we can define a format clause or not, depends on the
implementation of the connector, using different clause in DDL may make a
misunderstanding that we can combine the connectors with arbitrary formats,
which may not work actually.

On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <[hidden email]> wrote:

Shuyi Chen

Re: [DISCUSS] Flink SQL DDL Design

In reply to this post by wenlong.lwl

Hi Wenlong, thanks a lot for the comments.

1) I agree we can infer the table type from the queries if the Flink job is
static. However, for SQL client cases, the query is adhoc, dynamic, and not
known beforehand. In such case, we might want to enforce the table open
mode at startup time, so users won't accidentally write to a Kafka topic
that is supposed to be written only by producers outside of the Flink world.
2) as in [1], currently, format and connector are first class concept in
Flink table, and it's required by most table creations, so I think adding
specific keyword to it makes it more organized and readable. But I do agree
a flattened key-value pair makes it simpler for parser, and easier to
extend. So maybe something like the following make more sense:

CREATE SOURCE TABLE Kafka10SourceTable (

intField INTEGER,

stringField VARCHAR(128) COMMENT ‘User IP address’,

longField BIGINT,

rowTimeField TIMESTAMP

TIMESTAMPS FROM ‘longField’

WATERMARKS PERIODIC-BOUNDED WITH DELAY '60’

)

COMMENT ‘Kafka Source Table of topic user_ip_address’

CONNECTOR (

type = ’kafka’,

property-version = ’1’,
version = ’0.10’,
properties.topic = ‘test-kafka-topic’,
properties.startup-mode = ‘latest-offset’,
properties.specific-offset = ‘offset’

)

FORMAT (

format.type = 'json',

format.prperties.version=’1’,

format.derive-schema = 'true'

)

Shuyi

[1]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit#heading=h.41fd6rs7b3cf

On Sun, Nov 4, 2018 at 7:15 PM wenlong.lwl <[hidden email]> wrote:

> Hi, Shuyi, thanks for the proposal.
>
> I have two concerns about the table ddl:
>
> 1. how about remove the source/sink mark from the ddl, because it is not
> necessary, the framework determine the table referred is a source or a sink
> according to the context of the query using the table. it will be more
> convenient for use defining a table which can be both a source and sink,
> and more convenient for catalog to persistent and manage the meta infos.
>
> 2. how about just keeping one pure string map as parameters for table, like
> create tabe Kafka10SourceTable (
> intField INTEGER,
> stringField VARCHAR(128),
> longField BIGINT,
> rowTimeField TIMESTAMP
> ) with (
> connector.type = ’kafka’,
> connector.property-version = ’1’,
> connector.version = ’0.10’,
> connector.properties.topic = ‘test-kafka-topic’,
> connector.properties.startup-mode = ‘latest-offset’,
> connector.properties.specific-offset = ‘offset’,
> format.type = 'json'
> format.prperties.version=’1’,
> format.derive-schema = 'true'
> );
> Because:
> 1. in TableFactory, what user use is a string map properties, defining
> parameters by string-map can be the closest way to mapping how user use the
> parameters.
> 2. The table descriptor can be extended by user, like what is done in Kafka
> and Json, it means that the parameter keys in connector or format can be
> different in different implementation, we can not restrict the key in a
> specified set, so we need a map in connector scope and a map in
> connector.properties scope. why not just give user a single map, let them
> put parameters in a format they like, which is also the simplest way to
> implement DDL parser.
> 3. whether we can define a format clause or not, depends on the
> implementation of the connector, using different clause in DDL may make a
> misunderstanding that we can combine the connectors with arbitrary formats,
> which may not work actually.
>
> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <[hidden email]> wrote:
>
> > +1, Thanks for the proposal.
> >
> > I guess this is a long-awaited change. This can vastly increase the
> > functionalities of the SQL Client as it will be possible to use complex
> > extensions like for example those provided by Apache Bahir[1].
> >
> > Best Regards,
> > Dom.
> >
> > [1]
> > https://github.com/apache/bahir-flink
> >
> > sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]> napisał(a):
> >
> > > +1. Thanks for putting the proposal together Shuyi.
> > >
> > > DDL has been brought up in a couple of times previously [1,2].
> Utilizing
> > > DDL will definitely be a great extension to the current Flink SQL to
> > > systematically support some of the previously brought up features such
> as
> > > [3]. And it will also be beneficial to see the document closely aligned
> > > with the previous discussion for unified SQL connector API [4].
> > >
> > > I also left a few comments on the doc. Looking forward to the alignment
> > > with the other couple of efforts and contributing to them!
> > >
> > > Best,
> > > Rong
> > >
> > > [1]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > [2]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > >
> > > [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > [4]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
> > >
> > >
> > > On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <[hidden email]> wrote:
> > >
> > > > Thanks Shuyi!
> > > >
> > > > I left some comments there. I think the design of SQL DDL and
> > Flink-Hive
> > > > integration/External catalog enhancements will work closely with each
> > > > other. Hope we are well aligned on the directions of the two designs,
> > > and I
> > > > look forward to working with you guys on both!
> > > >
> > > > Bowen
> > > >
> > > >
> > > > On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <[hidden email]>
> wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > SQL DDL support has been a long-time ask from the community.
> Current
> > > > Flink
> > > > > SQL support only DML (e.g. SELECT and INSERT statements). In its
> > > current
> > > > > form, Flink SQL users still need to define/create table sources and
> > > sinks
> > > > > programmatically in Java/Scala. Also, in SQL Client, without DDL
> > > support,
> > > > > the current implementation does not allow dynamical creation of
> > table,
> > > > type
> > > > > or functions with SQL, this adds friction for its adoption.
> > > > >
> > > > > I drafted a design doc [1] with a few other community members that
> > > > proposes
> > > > > the design and implementation for adding DDL support in Flink. The
> > > > initial
> > > > > design considers DDL for table, view, type, library and function.
> It
> > > will
> > > > > be great to get feedback on the design from the community, and
> align
> > > with
> > > > > latest effort in unified SQL connector API [2] and Flink Hive
> > > > integration
> > > > > [3].
> > > > >
> > > > > Any feedback is highly appreciated.
> > > > >
> > > > > Thanks
> > > > > Shuyi Chen
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > [3]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > > > --
> > > > > "So you have to trust that the dots will somehow connect in your
> > > future."
> > > > >
> > > >
> > >
> >
>

--
"So you have to trust that the dots will somehow connect in your future."

Shuyi Chen

Re: [DISCUSS] Flink SQL DDL Design

In reply to this post by Zhang, Xuefu

Thanks a lot, Xuefu. I was busy for some other stuff for the last 2 weeks,
but we are definitely interested in moving this forward. I think once the
unified connector API design [1] is done, we can finalize the DDL design as
well and start creating concrete subtasks to collaborate on the
implementation with the community.

Shuyi

[1]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <[hidden email]>
wrote:

> Hi Shuyi,
>
> I'm wondering if you folks still have the bandwidth working on this.
>
> We have some dedicated resource and like to move this forward. We can
> collaborate.
>
> Thanks,
>
> Xuefu
>
>
> ------------------------------------------------------------------
> 发件人：wenlong.lwl<[hidden email]>
> 日期：2018年11月05日 11:15:35
> 收件人：<[hidden email]>
> 主题：Re: [DISCUSS] Flink SQL DDL Design
>
> Hi, Shuyi, thanks for the proposal.
>
> I have two concerns about the table ddl:
>
> 1. how about remove the source/sink mark from the ddl, because it is not
> necessary, the framework determine the table referred is a source or a sink
> according to the context of the query using the table. it will be more
> convenient for use defining a table which can be both a source and sink,
> and more convenient for catalog to persistent and manage the meta infos.
>
> 2. how about just keeping one pure string map as parameters for table, like
> create tabe Kafka10SourceTable (
> intField INTEGER,
> stringField VARCHAR(128),
> longField BIGINT,
> rowTimeField TIMESTAMP
> ) with (
> connector.type = ’kafka’,
> connector.property-version = ’1’,
> connector.version = ’0.10’,
> connector.properties.topic = ‘test-kafka-topic’,
> connector.properties.startup-mode = ‘latest-offset’,
> connector.properties.specific-offset = ‘offset’,
> format.type = 'json'
> format.prperties.version=’1’,
> format.derive-schema = 'true'
> );
> Because:
> 1. in TableFactory, what user use is a string map properties, defining
> parameters by string-map can be the closest way to mapping how user use the
> parameters.
> 2. The table descriptor can be extended by user, like what is done in Kafka
> and Json, it means that the parameter keys in connector or format can be
> different in different implementation, we can not restrict the key in a
> specified set, so we need a map in connector scope and a map in
> connector.properties scope. why not just give user a single map, let them
> put parameters in a format they like, which is also the simplest way to
> implement DDL parser.
> 3. whether we can define a format clause or not, depends on the
> implementation of the connector, using different clause in DDL may make a
> misunderstanding that we can combine the connectors with arbitrary formats,
> which may not work actually.
>
> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <[hidden email]> wrote:
>
> > +1, Thanks for the proposal.
> >
> > I guess this is a long-awaited change. This can vastly increase the
> > functionalities of the SQL Client as it will be possible to use complex
> > extensions like for example those provided by Apache Bahir[1].
> >
> > Best Regards,
> > Dom.
> >
> > [1]
> > https://github.com/apache/bahir-flink
> >
> > sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]> napisał(a):
> >
> > > +1. Thanks for putting the proposal together Shuyi.
> > >
> > > DDL has been brought up in a couple of times previously [1,2].
> Utilizing
> > > DDL will definitely be a great extension to the current Flink SQL to
> > > systematically support some of the previously brought up features such
> as
> > > [3]. And it will also be beneficial to see the document closely aligned
> > > with the previous discussion for unified SQL connector API [4].
> > >
> > > I also left a few comments on the doc. Looking forward to the alignment
> > > with the other couple of efforts and contributing to them!
> > >
> > > Best,
> > > Rong
> > >
> > > [1]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > [2]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > >
> > > [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > [4]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
> > >
> > >
> > > On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <[hidden email]> wrote:
> > >
> > > > Thanks Shuyi!
> > > >
> > > > I left some comments there. I think the design of SQL DDL and
> > Flink-Hive
> > > > integration/External catalog enhancements will work closely with each
> > > > other. Hope we are well aligned on the directions of the two designs,
> > > and I
> > > > look forward to working with you guys on both!
> > > >
> > > > Bowen
> > > >
> > > >
> > > > On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <[hidden email]>
> wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > SQL DDL support has been a long-time ask from the community.
> Current
> > > > Flink
> > > > > SQL support only DML (e.g. SELECT and INSERT statements). In its
> > > current
> > > > > form, Flink SQL users still need to define/create table sources and
> > > sinks
> > > > > programmatically in Java/Scala. Also, in SQL Client, without DDL
> > > support,
> > > > > the current implementation does not allow dynamical creation of
> > table,
> > > > type
> > > > > or functions with SQL, this adds friction for its adoption.
> > > > >
> > > > > I drafted a design doc [1] with a few other community members that
> > > > proposes
> > > > > the design and implementation for adding DDL support in Flink. The
> > > > initial
> > > > > design considers DDL for table, view, type, library and function.
> It
> > > will
> > > > > be great to get feedback on the design from the community, and
> align
> > > with
> > > > > latest effort in unified SQL connector API [2] and Flink Hive
> > > > integration
> > > > > [3].
> > > > >
> > > > > Any feedback is highly appreciated.
> > > > >
> > > > > Thanks
> > > > > Shuyi Chen
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > [3]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > > > --
> > > > > "So you have to trust that the dots will somehow connect in your
> > > future."
> > > > >
> > > >
> > >
> >
>
>

--
"So you have to trust that the dots will somehow connect in your future."

Timo Walther-2

Re: [DISCUSS] Flink SQL DDL Design

Thanks for offering your help here, Xuefu. It would be great to move
these efforts forward. I agree that the DDL is somehow releated to the
unified connector API design but we can also start with the basic
functionality now and evolve the DDL during this release and next releases.

For example, we could identify the MVP DDL syntax that skips defining
key constraints and maybe even time attributes. This DDL could be used
for batch usecases, ETL, and materializing SQL queries (no time
operations like windows).

The unified connector API is high on our priority list for the 1.8
release. I will try to update the document until mid of next week.

Regards,

Timo

Am 27.11.18 um 08:08 schrieb Shuyi Chen:

> Thanks a lot, Xuefu. I was busy for some other stuff for the last 2 weeks,
> but we are definitely interested in moving this forward. I think once the
> unified connector API design [1] is done, we can finalize the DDL design as
> well and start creating concrete subtasks to collaborate on the
> implementation with the community.
>
> Shuyi
>
> [1]
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>
> On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <[hidden email]>
> wrote:
>
>> Hi Shuyi,
>>
>> I'm wondering if you folks still have the bandwidth working on this.
>>
>> We have some dedicated resource and like to move this forward. We can
>> collaborate.
>>
>> Thanks,
>>
>> Xuefu
>>
>>
>> ------------------------------------------------------------------
>> 发件人：wenlong.lwl<[hidden email]>
>> 日期：2018年11月05日 11:15:35
>> 收件人：<[hidden email]>
>> 主题：Re: [DISCUSS] Flink SQL DDL Design
>>
>> Hi, Shuyi, thanks for the proposal.
>>
>> I have two concerns about the table ddl:
>>
>> 1. how about remove the source/sink mark from the ddl, because it is not
>> necessary, the framework determine the table referred is a source or a sink
>> according to the context of the query using the table. it will be more
>> convenient for use defining a table which can be both a source and sink,
>> and more convenient for catalog to persistent and manage the meta infos.
>>
>> 2. how about just keeping one pure string map as parameters for table, like
>> create tabe Kafka10SourceTable (
>> intField INTEGER,
>> stringField VARCHAR(128),
>> longField BIGINT,
>> rowTimeField TIMESTAMP
>> ) with (
>> connector.type = ’kafka’,
>> connector.property-version = ’1’,
>> connector.version = ’0.10’,
>> connector.properties.topic = ‘test-kafka-topic’,
>> connector.properties.startup-mode = ‘latest-offset’,
>> connector.properties.specific-offset = ‘offset’,
>> format.type = 'json'
>> format.prperties.version=’1’,
>> format.derive-schema = 'true'
>> );
>> Because:
>> 1. in TableFactory, what user use is a string map properties, defining
>> parameters by string-map can be the closest way to mapping how user use the
>> parameters.
>> 2. The table descriptor can be extended by user, like what is done in Kafka
>> and Json, it means that the parameter keys in connector or format can be
>> different in different implementation, we can not restrict the key in a
>> specified set, so we need a map in connector scope and a map in
>> connector.properties scope. why not just give user a single map, let them
>> put parameters in a format they like, which is also the simplest way to
>> implement DDL parser.
>> 3. whether we can define a format clause or not, depends on the
>> implementation of the connector, using different clause in DDL may make a
>> misunderstanding that we can combine the connectors with arbitrary formats,
>> which may not work actually.
>>
>> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <[hidden email]> wrote:
>>
>>> +1, Thanks for the proposal.
>>>
>>> I guess this is a long-awaited change. This can vastly increase the
>>> functionalities of the SQL Client as it will be possible to use complex
>>> extensions like for example those provided by Apache Bahir[1].
>>>
>>> Best Regards,
>>> Dom.
>>>
>>> [1]
>>> https://github.com/apache/bahir-flink
>>>
>>> sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]> napisał(a):
>>>
>>>> +1. Thanks for putting the proposal together Shuyi.
>>>>
>>>> DDL has been brought up in a couple of times previously [1,2].
>> Utilizing
>>>> DDL will definitely be a great extension to the current Flink SQL to
>>>> systematically support some of the previously brought up features such
>> as
>>>> [3]. And it will also be beneficial to see the document closely aligned
>>>> with the previous discussion for unified SQL connector API [4].
>>>>
>>>> I also left a few comments on the doc. Looking forward to the alignment
>>>> with the other couple of efforts and contributing to them!
>>>>
>>>> Best,
>>>> Rong
>>>>
>>>> [1]
>>>>
>>>>
>> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
>>>> [2]
>>>>
>>>>
>> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
>>>> [3] https://issues.apache.org/jira/browse/FLINK-8003
>>>> [4]
>>>>
>>>>
>> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
>>>>
>>>> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <[hidden email]> wrote:
>>>>
>>>>> Thanks Shuyi!
>>>>>
>>>>> I left some comments there. I think the design of SQL DDL and
>>> Flink-Hive
>>>>> integration/External catalog enhancements will work closely with each
>>>>> other. Hope we are well aligned on the directions of the two designs,
>>>> and I
>>>>> look forward to working with you guys on both!
>>>>>
>>>>> Bowen
>>>>>
>>>>>
>>>>> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <[hidden email]>
>> wrote:
>>>>>> Hi everyone,
>>>>>>
>>>>>> SQL DDL support has been a long-time ask from the community.
>> Current
>>>>> Flink
>>>>>> SQL support only DML (e.g. SELECT and INSERT statements). In its
>>>> current
>>>>>> form, Flink SQL users still need to define/create table sources and
>>>> sinks
>>>>>> programmatically in Java/Scala. Also, in SQL Client, without DDL
>>>> support,
>>>>>> the current implementation does not allow dynamical creation of
>>> table,
>>>>> type
>>>>>> or functions with SQL, this adds friction for its adoption.
>>>>>>
>>>>>> I drafted a design doc [1] with a few other community members that
>>>>> proposes
>>>>>> the design and implementation for adding DDL support in Flink. The
>>>>> initial
>>>>>> design considers DDL for table, view, type, library and function.
>> It
>>>> will
>>>>>> be great to get feedback on the design from the community, and
>> align
>>>> with
>>>>>> latest effort in unified SQL connector API [2] and Flink Hive
>>>>> integration
>>>>>> [3].
>>>>>>
>>>>>> Any feedback is highly appreciated.
>>>>>>
>>>>>> Thanks
>>>>>> Shuyi Chen
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
>>>>>> [2]
>>>>>>
>>>>>>
>> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>>>>>> [3]
>>>>>>
>>>>>>
>> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
>>>>>> --
>>>>>> "So you have to trust that the dots will somehow connect in your
>>>> future."
>>

Zhang, Xuefu

Re: [DISCUSS] Flink SQL DDL Design

Yeah! I agree with Timo that DDL can actually proceed w/o being blocked by connector API. We can leave the unknown out while defining the basic syntax.

@Shuyi

As commented in the doc, I think we can probably stick with simple syntax with general properties, without extending the syntax too much that it mimics the descriptor API.

Part of our effort on Flink-Hive integration is also to make DDL syntax compatible with Hive's. The one in the current proposal seems making our effort more challenging.

We can help and collaborate. At this moment, I think we can finalize on the proposal and then we can divide the tasks for better collaboration.

Please let me know if there are any questions or suggestions.

Thanks,
Xuefu

------------------------------------------------------------------
Sender:Timo Walther <[hidden email]>
Sent at:2018 Nov 27 (Tue) 16:21
Recipient:dev <[hidden email]>
Subject:Re: [DISCUSS] Flink SQL DDL Design

Thanks for offering your help here, Xuefu. It would be great to move
these efforts forward. I agree that the DDL is somehow releated to the
unified connector API design but we can also start with the basic
functionality now and evolve the DDL during this release and next releases.

For example, we could identify the MVP DDL syntax that skips defining
key constraints and maybe even time attributes. This DDL could be used
for batch usecases, ETL, and materializing SQL queries (no time
operations like windows).

The unified connector API is high on our priority list for the 1.8
release. I will try to update the document until mid of next week.

Regards,

Timo

Am 27.11.18 um 08:08 schrieb Shuyi Chen:

Shuyi Chen

Re: [DISCUSS] Flink SQL DDL Design

Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the design doc
first and start implementation w/o the unified connector API ready by
skipping some featue.

Xuefu, I like the idea of making Flink specific properties into generic
key-value pairs, so that it will make integration with Hive DDL (or others,
e.g. Beam DDL) easier.

I'll run a final pass over the design doc and finalize the design in the
next few days. And we can start creating tasks and collaborate on the
implementation. Thanks a lot for all the comments and inputs.

Cheers!
Shuyi

On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <[hidden email]>
wrote:

> Yeah! I agree with Timo that DDL can actually proceed w/o being blocked by
> connector API. We can leave the unknown out while defining the basic syntax.
>
> @Shuyi
>
> As commented in the doc, I think we can probably stick with simple syntax
> with general properties, without extending the syntax too much that it
> mimics the descriptor API.
>
> Part of our effort on Flink-Hive integration is also to make DDL syntax
> compatible with Hive's. The one in the current proposal seems making our
> effort more challenging.
>
> We can help and collaborate. At this moment, I think we can finalize on
> the proposal and then we can divide the tasks for better collaboration.
>
> Please let me know if there are any questions or suggestions.
>
> Thanks,
> Xuefu
>
>
>
>
> ------------------------------------------------------------------
> Sender:Timo Walther <[hidden email]>
> Sent at:2018 Nov 27 (Tue) 16:21
> Recipient:dev <[hidden email]>
> Subject:Re: [DISCUSS] Flink SQL DDL Design
>
> Thanks for offering your help here, Xuefu. It would be great to move
> these efforts forward. I agree that the DDL is somehow releated to the
> unified connector API design but we can also start with the basic
> functionality now and evolve the DDL during this release and next releases.
>
> For example, we could identify the MVP DDL syntax that skips defining
> key constraints and maybe even time attributes. This DDL could be used
> for batch usecases, ETL, and materializing SQL queries (no time
> operations like windows).
>
> The unified connector API is high on our priority list for the 1.8
> release. I will try to update the document until mid of next week.
>
>
> Regards,
>
> Timo
>
>
> Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > Thanks a lot, Xuefu. I was busy for some other stuff for the last 2
> weeks,
> > but we are definitely interested in moving this forward. I think once the
> > unified connector API design [1] is done, we can finalize the DDL design
> as
> > well and start creating concrete subtasks to collaborate on the
> > implementation with the community.
> >
> > Shuyi
> >
> > [1]
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> >
> > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <[hidden email]>
> > wrote:
> >
> >> Hi Shuyi,
> >>
> >> I'm wondering if you folks still have the bandwidth working on this.
> >>
> >> We have some dedicated resource and like to move this forward. We can
> >> collaborate.
> >>
> >> Thanks,
> >>
> >> Xuefu
> >>
> >>
> >> ------------------------------------------------------------------
> >> 发件人：wenlong.lwl<[hidden email]>
> >> 日期：2018年11月05日 11:15:35
> >> 收件人：<[hidden email]>
> >> 主题：Re: [DISCUSS] Flink SQL DDL Design
> >>
> >> Hi, Shuyi, thanks for the proposal.
> >>
> >> I have two concerns about the table ddl:
> >>
> >> 1. how about remove the source/sink mark from the ddl, because it is not
> >> necessary, the framework determine the table referred is a source or a
> sink
> >> according to the context of the query using the table. it will be more
> >> convenient for use defining a table which can be both a source and sink,
> >> and more convenient for catalog to persistent and manage the meta infos.
> >>
> >> 2. how about just keeping one pure string map as parameters for table,
> like
> >> create tabe Kafka10SourceTable (
> >> intField INTEGER,
> >> stringField VARCHAR(128),
> >> longField BIGINT,
> >> rowTimeField TIMESTAMP
> >> ) with (
> >> connector.type = ’kafka’,
> >> connector.property-version = ’1’,
> >> connector.version = ’0.10’,
> >> connector.properties.topic = ‘test-kafka-topic’,
> >> connector.properties.startup-mode = ‘latest-offset’,
> >> connector.properties.specific-offset = ‘offset’,
> >> format.type = 'json'
> >> format.prperties.version=’1’,
> >> format.derive-schema = 'true'
> >> );
> >> Because:
> >> 1. in TableFactory, what user use is a string map properties, defining
> >> parameters by string-map can be the closest way to mapping how user use
> the
> >> parameters.
> >> 2. The table descriptor can be extended by user, like what is done in
> Kafka
> >> and Json, it means that the parameter keys in connector or format can be
> >> different in different implementation, we can not restrict the key in a
> >> specified set, so we need a map in connector scope and a map in
> >> connector.properties scope. why not just give user a single map, let
> them
> >> put parameters in a format they like, which is also the simplest way to
> >> implement DDL parser.
> >> 3. whether we can define a format clause or not, depends on the
> >> implementation of the connector, using different clause in DDL may make
> a
> >> misunderstanding that we can combine the connectors with arbitrary
> formats,
> >> which may not work actually.
> >>
> >> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <[hidden email]> wrote:
> >>
> >>> +1, Thanks for the proposal.
> >>>
> >>> I guess this is a long-awaited change. This can vastly increase the
> >>> functionalities of the SQL Client as it will be possible to use complex
> >>> extensions like for example those provided by Apache Bahir[1].
> >>>
> >>> Best Regards,
> >>> Dom.
> >>>
> >>> [1]
> >>> https://github.com/apache/bahir-flink
> >>>
> >>> sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]> napisał(a):
> >>>
> >>>> +1. Thanks for putting the proposal together Shuyi.
> >>>>
> >>>> DDL has been brought up in a couple of times previously [1,2].
> >> Utilizing
> >>>> DDL will definitely be a great extension to the current Flink SQL to
> >>>> systematically support some of the previously brought up features such
> >> as
> >>>> [3]. And it will also be beneficial to see the document closely
> aligned
> >>>> with the previous discussion for unified SQL connector API [4].
> >>>>
> >>>> I also left a few comments on the doc. Looking forward to the
> alignment
> >>>> with the other couple of efforts and contributing to them!
> >>>>
> >>>> Best,
> >>>> Rong
> >>>>
> >>>> [1]
> >>>>
> >>>>
> >>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> >>>> [2]
> >>>>
> >>>>
> >>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> >>>> [3] https://issues.apache.org/jira/browse/FLINK-8003
> >>>> [4]
> >>>>
> >>>>
> >>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
> >>>>
> >>>> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <[hidden email]> wrote:
> >>>>
> >>>>> Thanks Shuyi!
> >>>>>
> >>>>> I left some comments there. I think the design of SQL DDL and
> >>> Flink-Hive
> >>>>> integration/External catalog enhancements will work closely with each
> >>>>> other. Hope we are well aligned on the directions of the two designs,
> >>>> and I
> >>>>> look forward to working with you guys on both!
> >>>>>
> >>>>> Bowen
> >>>>>
> >>>>>
> >>>>> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <[hidden email]>
> >> wrote:
> >>>>>> Hi everyone,
> >>>>>>
> >>>>>> SQL DDL support has been a long-time ask from the community.
> >> Current
> >>>>> Flink
> >>>>>> SQL support only DML (e.g. SELECT and INSERT statements). In its
> >>>> current
> >>>>>> form, Flink SQL users still need to define/create table sources and
> >>>> sinks
> >>>>>> programmatically in Java/Scala. Also, in SQL Client, without DDL
> >>>> support,
> >>>>>> the current implementation does not allow dynamical creation of
> >>> table,
> >>>>> type
> >>>>>> or functions with SQL, this adds friction for its adoption.
> >>>>>>
> >>>>>> I drafted a design doc [1] with a few other community members that
> >>>>> proposes
> >>>>>> the design and implementation for adding DDL support in Flink. The
> >>>>> initial
> >>>>>> design considers DDL for table, view, type, library and function.
> >> It
> >>>> will
> >>>>>> be great to get feedback on the design from the community, and
> >> align
> >>>> with
> >>>>>> latest effort in unified SQL connector API [2] and Flink Hive
> >>>>> integration
> >>>>>> [3].
> >>>>>>
> >>>>>> Any feedback is highly appreciated.
> >>>>>>
> >>>>>> Thanks
> >>>>>> Shuyi Chen
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>>>
> >>
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> >>>>>> [2]
> >>>>>>
> >>>>>>
> >>
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> >>>>>> [3]
> >>>>>>
> >>>>>>
> >>
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> >>>>>> --
> >>>>>> "So you have to trust that the dots will somehow connect in your
> >>>> future."
> >>

--
"So you have to trust that the dots will somehow connect in your future."

Lin Li

Re: [DISCUSS] Flink SQL DDL Design

@Shuyi
Thanks for the proposal! We have a simple DDL implementation (extends
Calcite's parser) which been running for almost two years on production and
works well.
I think the most valued things we'd learned is keeping simplicity and
standard compliance.
Here's the approximate grammar, FYI
CREATE TABLE

CREATE TABLE tableName(
columnDefinition [, columnDefinition]*
[ computedColumnDefinition [, computedColumnDefinition]* ]
[ tableConstraint [, tableConstraint]* ]
[ tableIndex [, tableIndex]* ]
[ PERIOD FOR SYSTEM_TIME ]
[ WATERMARK watermarkName FOR rowTimeColumn AS
withOffset(rowTimeColumn, offset) ] ) [ WITH ( tableOption [ ,
tableOption]* ) ] [ ; ]

columnDefinition ::=
columnName dataType [ NOT NULL ]

dataType ::=
{
[ VARCHAR ]
| [ BOOLEAN ]
| [ TINYINT ]
| [ SMALLINT ]
| [ INT ]
| [ BIGINT ]
| [ FLOAT ]
| [ DECIMAL ]
| [ DOUBLE ]
| [ DATE ]
| [ TIME ]
| [ TIMESTAMP ]
| [ VARBINARY ]
}

computedColumnDefinition ::=
columnName AS computedColumnExpression

tableConstraint ::=
{ PRIMARY KEY | UNIQUE }
(columnName [, columnName]* )

tableIndex ::=
[ UNIQUE ] INDEX indexName
(columnName [, columnName]* )

rowTimeColumn ::=
columnName

tableOption ::=
property=value
offset ::=
positive integer (unit: ms)

CREATE VIEW

CREATE VIEW viewName
[
( columnName [, columnName]* )
]
AS queryStatement;

CREATE FUNCTION

CREATE FUNCTION functionName
AS 'className';

className ::=
fully qualified name

Shuyi Chen <[hidden email]> 于2018年11月28日周三上午3:28写道：

> Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the design doc
> first and start implementation w/o the unified connector API ready by
> skipping some featue.
>
> Xuefu, I like the idea of making Flink specific properties into generic
> key-value pairs, so that it will make integration with Hive DDL (or others,
> e.g. Beam DDL) easier.
>
> I'll run a final pass over the design doc and finalize the design in the
> next few days. And we can start creating tasks and collaborate on the
> implementation. Thanks a lot for all the comments and inputs.
>
> Cheers!
> Shuyi
>
> On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <[hidden email]>
> wrote:
>
> > Yeah! I agree with Timo that DDL can actually proceed w/o being blocked
> by
> > connector API. We can leave the unknown out while defining the basic
> syntax.
> >
> > @Shuyi
> >
> > As commented in the doc, I think we can probably stick with simple syntax
> > with general properties, without extending the syntax too much that it
> > mimics the descriptor API.
> >
> > Part of our effort on Flink-Hive integration is also to make DDL syntax
> > compatible with Hive's. The one in the current proposal seems making our
> > effort more challenging.
> >
> > We can help and collaborate. At this moment, I think we can finalize on
> > the proposal and then we can divide the tasks for better collaboration.
> >
> > Please let me know if there are any questions or suggestions.
> >
> > Thanks,
> > Xuefu
> >
> >
> >
> >
> > ------------------------------------------------------------------
> > Sender:Timo Walther <[hidden email]>
> > Sent at:2018 Nov 27 (Tue) 16:21
> > Recipient:dev <[hidden email]>
> > Subject:Re: [DISCUSS] Flink SQL DDL Design
> >
> > Thanks for offering your help here, Xuefu. It would be great to move
> > these efforts forward. I agree that the DDL is somehow releated to the
> > unified connector API design but we can also start with the basic
> > functionality now and evolve the DDL during this release and next
> releases.
> >
> > For example, we could identify the MVP DDL syntax that skips defining
> > key constraints and maybe even time attributes. This DDL could be used
> > for batch usecases, ETL, and materializing SQL queries (no time
> > operations like windows).
> >
> > The unified connector API is high on our priority list for the 1.8
> > release. I will try to update the document until mid of next week.
> >
> >
> > Regards,
> >
> > Timo
> >
> >
> > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > Thanks a lot, Xuefu. I was busy for some other stuff for the last 2
> > weeks,
> > > but we are definitely interested in moving this forward. I think once
> the
> > > unified connector API design [1] is done, we can finalize the DDL
> design
> > as
> > > well and start creating concrete subtasks to collaborate on the
> > > implementation with the community.
> > >
> > > Shuyi
> > >
> > > [1]
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > >
> > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <[hidden email]>
> > > wrote:
> > >
> > >> Hi Shuyi,
> > >>
> > >> I'm wondering if you folks still have the bandwidth working on this.
> > >>
> > >> We have some dedicated resource and like to move this forward. We can
> > >> collaborate.
> > >>
> > >> Thanks,
> > >>
> > >> Xuefu
> > >>
> > >>
> > >> ------------------------------------------------------------------
> > >> 发件人：wenlong.lwl<[hidden email]>
> > >> 日期：2018年11月05日 11:15:35
> > >> 收件人：<[hidden email]>
> > >> 主题：Re: [DISCUSS] Flink SQL DDL Design
> > >>
> > >> Hi, Shuyi, thanks for the proposal.
> > >>
> > >> I have two concerns about the table ddl:
> > >>
> > >> 1. how about remove the source/sink mark from the ddl, because it is
> not
> > >> necessary, the framework determine the table referred is a source or a
> > sink
> > >> according to the context of the query using the table. it will be more
> > >> convenient for use defining a table which can be both a source and
> sink,
> > >> and more convenient for catalog to persistent and manage the meta
> infos.
> > >>
> > >> 2. how about just keeping one pure string map as parameters for table,
> > like
> > >> create tabe Kafka10SourceTable (
> > >> intField INTEGER,
> > >> stringField VARCHAR(128),
> > >> longField BIGINT,
> > >> rowTimeField TIMESTAMP
> > >> ) with (
> > >> connector.type = ’kafka’,
> > >> connector.property-version = ’1’,
> > >> connector.version = ’0.10’,
> > >> connector.properties.topic = ‘test-kafka-topic’,
> > >> connector.properties.startup-mode = ‘latest-offset’,
> > >> connector.properties.specific-offset = ‘offset’,
> > >> format.type = 'json'
> > >> format.prperties.version=’1’,
> > >> format.derive-schema = 'true'
> > >> );
> > >> Because:
> > >> 1. in TableFactory, what user use is a string map properties, defining
> > >> parameters by string-map can be the closest way to mapping how user
> use
> > the
> > >> parameters.
> > >> 2. The table descriptor can be extended by user, like what is done in
> > Kafka
> > >> and Json, it means that the parameter keys in connector or format can
> be
> > >> different in different implementation, we can not restrict the key in
> a
> > >> specified set, so we need a map in connector scope and a map in
> > >> connector.properties scope. why not just give user a single map, let
> > them
> > >> put parameters in a format they like, which is also the simplest way
> to
> > >> implement DDL parser.
> > >> 3. whether we can define a format clause or not, depends on the
> > >> implementation of the connector, using different clause in DDL may
> make
> > a
> > >> misunderstanding that we can combine the connectors with arbitrary
> > formats,
> > >> which may not work actually.
> > >>
> > >> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <[hidden email]>
> wrote:
> > >>
> > >>> +1, Thanks for the proposal.
> > >>>
> > >>> I guess this is a long-awaited change. This can vastly increase the
> > >>> functionalities of the SQL Client as it will be possible to use
> complex
> > >>> extensions like for example those provided by Apache Bahir[1].
> > >>>
> > >>> Best Regards,
> > >>> Dom.
> > >>>
> > >>> [1]
> > >>> https://github.com/apache/bahir-flink
> > >>>
> > >>> sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]> napisał(a):
> > >>>
> > >>>> +1. Thanks for putting the proposal together Shuyi.
> > >>>>
> > >>>> DDL has been brought up in a couple of times previously [1,2].
> > >> Utilizing
> > >>>> DDL will definitely be a great extension to the current Flink SQL to
> > >>>> systematically support some of the previously brought up features
> such
> > >> as
> > >>>> [3]. And it will also be beneficial to see the document closely
> > aligned
> > >>>> with the previous discussion for unified SQL connector API [4].
> > >>>>
> > >>>> I also left a few comments on the doc. Looking forward to the
> > alignment
> > >>>> with the other couple of efforts and contributing to them!
> > >>>>
> > >>>> Best,
> > >>>> Rong
> > >>>>
> > >>>> [1]
> > >>>>
> > >>>>
> > >>
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > >>>> [2]
> > >>>>
> > >>>>
> > >>
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > >>>> [3] https://issues.apache.org/jira/browse/FLINK-8003
> > >>>> [4]
> > >>>>
> > >>>>
> > >>
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
> > >>>>
> > >>>> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <[hidden email]>
> wrote:
> > >>>>
> > >>>>> Thanks Shuyi!
> > >>>>>
> > >>>>> I left some comments there. I think the design of SQL DDL and
> > >>> Flink-Hive
> > >>>>> integration/External catalog enhancements will work closely with
> each
> > >>>>> other. Hope we are well aligned on the directions of the two
> designs,
> > >>>> and I
> > >>>>> look forward to working with you guys on both!
> > >>>>>
> > >>>>> Bowen
> > >>>>>
> > >>>>>
> > >>>>> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <[hidden email]>
> > >> wrote:
> > >>>>>> Hi everyone,
> > >>>>>>
> > >>>>>> SQL DDL support has been a long-time ask from the community.
> > >> Current
> > >>>>> Flink
> > >>>>>> SQL support only DML (e.g. SELECT and INSERT statements). In its
> > >>>> current
> > >>>>>> form, Flink SQL users still need to define/create table sources
> and
> > >>>> sinks
> > >>>>>> programmatically in Java/Scala. Also, in SQL Client, without DDL
> > >>>> support,
> > >>>>>> the current implementation does not allow dynamical creation of
> > >>> table,
> > >>>>> type
> > >>>>>> or functions with SQL, this adds friction for its adoption.
> > >>>>>>
> > >>>>>> I drafted a design doc [1] with a few other community members that
> > >>>>> proposes
> > >>>>>> the design and implementation for adding DDL support in Flink. The
> > >>>>> initial
> > >>>>>> design considers DDL for table, view, type, library and function.
> > >> It
> > >>>> will
> > >>>>>> be great to get feedback on the design from the community, and
> > >> align
> > >>>> with
> > >>>>>> latest effort in unified SQL connector API [2] and Flink Hive
> > >>>>> integration
> > >>>>>> [3].
> > >>>>>>
> > >>>>>> Any feedback is highly appreciated.
> > >>>>>>
> > >>>>>> Thanks
> > >>>>>> Shuyi Chen
> > >>>>>>
> > >>>>>> [1]
> > >>>>>>
> > >>>>>>
> > >>
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > >>>>>> [2]
> > >>>>>>
> > >>>>>>
> > >>
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > >>>>>> [3]
> > >>>>>>
> > >>>>>>
> > >>
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > >>>>>> --
> > >>>>>> "So you have to trust that the dots will somehow connect in your
> > >>>> future."
> > >>
>
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>

Jark Wu-2

Re: [DISCUSS] Flink SQL DDL Design

Hi Shuyi,

Thanks for bringing up this discussion and the awesome work! I have left
some comments in the doc.

I want to share something more about the watermark definition learned from
Alibaba.

1.

Table should be able to accept multiple watermark definition.

Because a table may have more than one rowtime field. For example, one
rowtime field is from existing field but missing in some records, another
is the ingestion timestamp in Kafka but not very accurate. In this case,
user may define two rowtime fields with watermarks in the Table and choose
one in different situation.
2.

Watermark stragety always work with rowtime field together.

Based on the two points metioned above, I think we should combine the
watermark strategy and rowtime field selection (i.e. which existing field
used to generate watermark) in one clause, so that we can define multiple
watermarks in one Table.

Here I will share the watermark syntax used in Alibaba (simply modified):

watermarkDefinition:
WATERMARK [watermarkName] FOR <rowtime_field> AS wm_strategy

wm_strategy:
BOUNDED WITH OFFSET 'string' timeUnit
|
ASCENDING

The “WATERMARK” keyword starts a watermark definition. The “FOR” keyword
defines which existing field used to generate watermark, this field should
already exist in the schema (we can use computed-column to derive from
other fields). The “AS” keyword defines watermark strategy, such as BOUNDED
WITH OFFSET (covers almost all the requirements) and ASCENDING.

When the expected rowtime field does not exist in the schema, we can use
computed-column syntax to derive it from other existing fields using
built-in functions or user defined functions. So the rowtime/watermark
definition doesn’t need to care about “field-change” strategy
(replace/add/from-field). And the proctime field definition can also be
defined using computed-column. Such as pt as PROCTIME() which defines a
proctime field named “pt” in the schema.

Looking forward to working with you guys!

Best,
Jark Wu

Lin Li <[hidden email]> 于2018年11月28日周三下午6:33写道：

> @Shuyi
> Thanks for the proposal! We have a simple DDL implementation (extends
> Calcite's parser) which been running for almost two years on production and
> works well.
> I think the most valued things we'd learned is keeping simplicity and
> standard compliance.
> Here's the approximate grammar, FYI
> CREATE TABLE
>
> CREATE TABLE tableName(
> columnDefinition [, columnDefinition]*
> [ computedColumnDefinition [, computedColumnDefinition]* ]
> [ tableConstraint [, tableConstraint]* ]
> [ tableIndex [, tableIndex]* ]
> [ PERIOD FOR SYSTEM_TIME ]
> [ WATERMARK watermarkName FOR rowTimeColumn AS
> withOffset(rowTimeColumn, offset) ] ) [ WITH ( tableOption [ ,
> tableOption]* ) ] [ ; ]
>
> columnDefinition ::=
> columnName dataType [ NOT NULL ]
>
> dataType ::=
> {
> [ VARCHAR ]
> | [ BOOLEAN ]
> | [ TINYINT ]
> | [ SMALLINT ]
> | [ INT ]
> | [ BIGINT ]
> | [ FLOAT ]
> | [ DECIMAL ]
> | [ DOUBLE ]
> | [ DATE ]
> | [ TIME ]
> | [ TIMESTAMP ]
> | [ VARBINARY ]
> }
>
> computedColumnDefinition ::=
> columnName AS computedColumnExpression
>
> tableConstraint ::=
> { PRIMARY KEY | UNIQUE }
> (columnName [, columnName]* )
>
> tableIndex ::=
> [ UNIQUE ] INDEX indexName
> (columnName [, columnName]* )
>
> rowTimeColumn ::=
> columnName
>
> tableOption ::=
> property=value
> offset ::=
> positive integer (unit: ms)
>
> CREATE VIEW
>
> CREATE VIEW viewName
> [
> ( columnName [, columnName]* )
> ]
> AS queryStatement;
>
> CREATE FUNCTION
>
> CREATE FUNCTION functionName
> AS 'className';
>
> className ::=
> fully qualified name
>
>
> Shuyi Chen <[hidden email]> 于2018年11月28日周三上午3:28写道：
>
> > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the design doc
> > first and start implementation w/o the unified connector API ready by
> > skipping some featue.
> >
> > Xuefu, I like the idea of making Flink specific properties into generic
> > key-value pairs, so that it will make integration with Hive DDL (or
> others,
> > e.g. Beam DDL) easier.
> >
> > I'll run a final pass over the design doc and finalize the design in the
> > next few days. And we can start creating tasks and collaborate on the
> > implementation. Thanks a lot for all the comments and inputs.
> >
> > Cheers!
> > Shuyi
> >
> > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <[hidden email]>
> > wrote:
> >
> > > Yeah! I agree with Timo that DDL can actually proceed w/o being blocked
> > by
> > > connector API. We can leave the unknown out while defining the basic
> > syntax.
> > >
> > > @Shuyi
> > >
> > > As commented in the doc, I think we can probably stick with simple
> syntax
> > > with general properties, without extending the syntax too much that it
> > > mimics the descriptor API.
> > >
> > > Part of our effort on Flink-Hive integration is also to make DDL syntax
> > > compatible with Hive's. The one in the current proposal seems making
> our
> > > effort more challenging.
> > >
> > > We can help and collaborate. At this moment, I think we can finalize on
> > > the proposal and then we can divide the tasks for better collaboration.
> > >
> > > Please let me know if there are any questions or suggestions.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > >
> > >
> > >
> > > ------------------------------------------------------------------
> > > Sender:Timo Walther <[hidden email]>
> > > Sent at:2018 Nov 27 (Tue) 16:21
> > > Recipient:dev <[hidden email]>
> > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > >
> > > Thanks for offering your help here, Xuefu. It would be great to move
> > > these efforts forward. I agree that the DDL is somehow releated to the
> > > unified connector API design but we can also start with the basic
> > > functionality now and evolve the DDL during this release and next
> > releases.
> > >
> > > For example, we could identify the MVP DDL syntax that skips defining
> > > key constraints and maybe even time attributes. This DDL could be used
> > > for batch usecases, ETL, and materializing SQL queries (no time
> > > operations like windows).
> > >
> > > The unified connector API is high on our priority list for the 1.8
> > > release. I will try to update the document until mid of next week.
> > >
> > >
> > > Regards,
> > >
> > > Timo
> > >
> > >
> > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > Thanks a lot, Xuefu. I was busy for some other stuff for the last 2
> > > weeks,
> > > > but we are definitely interested in moving this forward. I think once
> > the
> > > > unified connector API design [1] is done, we can finalize the DDL
> > design
> > > as
> > > > well and start creating concrete subtasks to collaborate on the
> > > > implementation with the community.
> > > >
> > > > Shuyi
> > > >
> > > > [1]
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > >
> > > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <
> [hidden email]>
> > > > wrote:
> > > >
> > > >> Hi Shuyi,
> > > >>
> > > >> I'm wondering if you folks still have the bandwidth working on this.
> > > >>
> > > >> We have some dedicated resource and like to move this forward. We
> can
> > > >> collaborate.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Xuefu
> > > >>
> > > >>
> > > >> ------------------------------------------------------------------
> > > >> 发件人：wenlong.lwl<[hidden email]>
> > > >> 日期：2018年11月05日 11:15:35
> > > >> 收件人：<[hidden email]>
> > > >> 主题：Re: [DISCUSS] Flink SQL DDL Design
> > > >>
> > > >> Hi, Shuyi, thanks for the proposal.
> > > >>
> > > >> I have two concerns about the table ddl:
> > > >>
> > > >> 1. how about remove the source/sink mark from the ddl, because it is
> > not
> > > >> necessary, the framework determine the table referred is a source
> or a
> > > sink
> > > >> according to the context of the query using the table. it will be
> more
> > > >> convenient for use defining a table which can be both a source and
> > sink,
> > > >> and more convenient for catalog to persistent and manage the meta
> > infos.
> > > >>
> > > >> 2. how about just keeping one pure string map as parameters for
> table,
> > > like
> > > >> create tabe Kafka10SourceTable (
> > > >> intField INTEGER,
> > > >> stringField VARCHAR(128),
> > > >> longField BIGINT,
> > > >> rowTimeField TIMESTAMP
> > > >> ) with (
> > > >> connector.type = ’kafka’,
> > > >> connector.property-version = ’1’,
> > > >> connector.version = ’0.10’,
> > > >> connector.properties.topic = ‘test-kafka-topic’,
> > > >> connector.properties.startup-mode = ‘latest-offset’,
> > > >> connector.properties.specific-offset = ‘offset’,
> > > >> format.type = 'json'
> > > >> format.prperties.version=’1’,
> > > >> format.derive-schema = 'true'
> > > >> );
> > > >> Because:
> > > >> 1. in TableFactory, what user use is a string map properties,
> defining
> > > >> parameters by string-map can be the closest way to mapping how user
> > use
> > > the
> > > >> parameters.
> > > >> 2. The table descriptor can be extended by user, like what is done
> in
> > > Kafka
> > > >> and Json, it means that the parameter keys in connector or format
> can
> > be
> > > >> different in different implementation, we can not restrict the key
> in
> > a
> > > >> specified set, so we need a map in connector scope and a map in
> > > >> connector.properties scope. why not just give user a single map, let
> > > them
> > > >> put parameters in a format they like, which is also the simplest way
> > to
> > > >> implement DDL parser.
> > > >> 3. whether we can define a format clause or not, depends on the
> > > >> implementation of the connector, using different clause in DDL may
> > make
> > > a
> > > >> misunderstanding that we can combine the connectors with arbitrary
> > > formats,
> > > >> which may not work actually.
> > > >>
> > > >> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <[hidden email]>
> > wrote:
> > > >>
> > > >>> +1, Thanks for the proposal.
> > > >>>
> > > >>> I guess this is a long-awaited change. This can vastly increase the
> > > >>> functionalities of the SQL Client as it will be possible to use
> > complex
> > > >>> extensions like for example those provided by Apache Bahir[1].
> > > >>>
> > > >>> Best Regards,
> > > >>> Dom.
> > > >>>
> > > >>> [1]
> > > >>> https://github.com/apache/bahir-flink
> > > >>>
> > > >>> sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]>
> napisał(a):
> > > >>>
> > > >>>> +1. Thanks for putting the proposal together Shuyi.
> > > >>>>
> > > >>>> DDL has been brought up in a couple of times previously [1,2].
> > > >> Utilizing
> > > >>>> DDL will definitely be a great extension to the current Flink SQL
> to
> > > >>>> systematically support some of the previously brought up features
> > such
> > > >> as
> > > >>>> [3]. And it will also be beneficial to see the document closely
> > > aligned
> > > >>>> with the previous discussion for unified SQL connector API [4].
> > > >>>>
> > > >>>> I also left a few comments on the doc. Looking forward to the
> > > alignment
> > > >>>> with the other couple of efforts and contributing to them!
> > > >>>>
> > > >>>> Best,
> > > >>>> Rong
> > > >>>>
> > > >>>> [1]
> > > >>>>
> > > >>>>
> > > >>
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > >>>> [2]
> > > >>>>
> > > >>>>
> > > >>
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > > >>>> [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > >>>> [4]
> > > >>>>
> > > >>>>
> > > >>
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
> > > >>>>
> > > >>>> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <[hidden email]>
> > wrote:
> > > >>>>
> > > >>>>> Thanks Shuyi!
> > > >>>>>
> > > >>>>> I left some comments there. I think the design of SQL DDL and
> > > >>> Flink-Hive
> > > >>>>> integration/External catalog enhancements will work closely with
> > each
> > > >>>>> other. Hope we are well aligned on the directions of the two
> > designs,
> > > >>>> and I
> > > >>>>> look forward to working with you guys on both!
> > > >>>>>
> > > >>>>> Bowen
> > > >>>>>
> > > >>>>>
> > > >>>>> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <[hidden email]>
> > > >> wrote:
> > > >>>>>> Hi everyone,
> > > >>>>>>
> > > >>>>>> SQL DDL support has been a long-time ask from the community.
> > > >> Current
> > > >>>>> Flink
> > > >>>>>> SQL support only DML (e.g. SELECT and INSERT statements). In its
> > > >>>> current
> > > >>>>>> form, Flink SQL users still need to define/create table sources
> > and
> > > >>>> sinks
> > > >>>>>> programmatically in Java/Scala. Also, in SQL Client, without DDL
> > > >>>> support,
> > > >>>>>> the current implementation does not allow dynamical creation of
> > > >>> table,
> > > >>>>> type
> > > >>>>>> or functions with SQL, this adds friction for its adoption.
> > > >>>>>>
> > > >>>>>> I drafted a design doc [1] with a few other community members
> that
> > > >>>>> proposes
> > > >>>>>> the design and implementation for adding DDL support in Flink.
> The
> > > >>>>> initial
> > > >>>>>> design considers DDL for table, view, type, library and
> function.
> > > >> It
> > > >>>> will
> > > >>>>>> be great to get feedback on the design from the community, and
> > > >> align
> > > >>>> with
> > > >>>>>> latest effort in unified SQL connector API [2] and Flink Hive
> > > >>>>> integration
> > > >>>>>> [3].
> > > >>>>>>
> > > >>>>>> Any feedback is highly appreciated.
> > > >>>>>>
> > > >>>>>> Thanks
> > > >>>>>> Shuyi Chen
> > > >>>>>>
> > > >>>>>> [1]
> > > >>>>>>
> > > >>>>>>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > >>>>>> [2]
> > > >>>>>>
> > > >>>>>>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > >>>>>> [3]
> > > >>>>>>
> > > >>>>>>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > >>>>>> --
> > > >>>>>> "So you have to trust that the dots will somehow connect in your
> > > >>>> future."
> > > >>
> >
> >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>

Shaoxuan Wang

Re: [DISCUSS] Flink SQL DDL Design

Hi Lin and Jark,
Thanks for sharing those details. Can you please consider summarizing your
DDL design into a google doc.
We can still continue the discussions on Shuyi's proposal. But having a
separate google doc will be easy for the DEV to understand/comment/discuss
on your proposed DDL implementation.

Regards,
Shaoxuan

On Wed, Nov 28, 2018 at 7:39 PM Jark Wu <[hidden email]> wrote:

> Hi Shuyi,
>
> Thanks for bringing up this discussion and the awesome work! I have left
> some comments in the doc.
>
> I want to share something more about the watermark definition learned from
> Alibaba.
>
> 1.
>
> Table should be able to accept multiple watermark definition.
>
> Because a table may have more than one rowtime field. For example, one
> rowtime field is from existing field but missing in some records,
> another
> is the ingestion timestamp in Kafka but not very accurate. In this case,
> user may define two rowtime fields with watermarks in the Table and
> choose
> one in different situation.
> 2.
>
> Watermark stragety always work with rowtime field together.
>
> Based on the two points metioned above, I think we should combine the
> watermark strategy and rowtime field selection (i.e. which existing field
> used to generate watermark) in one clause, so that we can define multiple
> watermarks in one Table.
>
> Here I will share the watermark syntax used in Alibaba (simply modified):
>
> watermarkDefinition:
> WATERMARK [watermarkName] FOR <rowtime_field> AS wm_strategy
>
> wm_strategy:
> BOUNDED WITH OFFSET 'string' timeUnit
> |
> ASCENDING
>
> The “WATERMARK” keyword starts a watermark definition. The “FOR” keyword
> defines which existing field used to generate watermark, this field should
> already exist in the schema (we can use computed-column to derive from
> other fields). The “AS” keyword defines watermark strategy, such as BOUNDED
> WITH OFFSET (covers almost all the requirements) and ASCENDING.
>
> When the expected rowtime field does not exist in the schema, we can use
> computed-column syntax to derive it from other existing fields using
> built-in functions or user defined functions. So the rowtime/watermark
> definition doesn’t need to care about “field-change” strategy
> (replace/add/from-field). And the proctime field definition can also be
> defined using computed-column. Such as pt as PROCTIME() which defines a
> proctime field named “pt” in the schema.
>
> Looking forward to working with you guys!
>
> Best,
> Jark Wu
>
>
> Lin Li <[hidden email]> 于2018年11月28日周三下午6:33写道：
>
> > @Shuyi
> > Thanks for the proposal! We have a simple DDL implementation (extends
> > Calcite's parser) which been running for almost two years on production
> and
> > works well.
> > I think the most valued things we'd learned is keeping simplicity and
> > standard compliance.
> > Here's the approximate grammar, FYI
> > CREATE TABLE
> >
> > CREATE TABLE tableName(
> > columnDefinition [, columnDefinition]*
> > [ computedColumnDefinition [, computedColumnDefinition]* ]
> > [ tableConstraint [, tableConstraint]* ]
> > [ tableIndex [, tableIndex]* ]
> > [ PERIOD FOR SYSTEM_TIME ]
> > [ WATERMARK watermarkName FOR rowTimeColumn AS
> > withOffset(rowTimeColumn, offset) ] ) [ WITH ( tableOption [ ,
> > tableOption]* ) ] [ ; ]
> >
> > columnDefinition ::=
> > columnName dataType [ NOT NULL ]
> >
> > dataType ::=
> > {
> > [ VARCHAR ]
> > | [ BOOLEAN ]
> > | [ TINYINT ]
> > | [ SMALLINT ]
> > | [ INT ]
> > | [ BIGINT ]
> > | [ FLOAT ]
> > | [ DECIMAL ]
> > | [ DOUBLE ]
> > | [ DATE ]
> > | [ TIME ]
> > | [ TIMESTAMP ]
> > | [ VARBINARY ]
> > }
> >
> > computedColumnDefinition ::=
> > columnName AS computedColumnExpression
> >
> > tableConstraint ::=
> > { PRIMARY KEY | UNIQUE }
> > (columnName [, columnName]* )
> >
> > tableIndex ::=
> > [ UNIQUE ] INDEX indexName
> > (columnName [, columnName]* )
> >
> > rowTimeColumn ::=
> > columnName
> >
> > tableOption ::=
> > property=value
> > offset ::=
> > positive integer (unit: ms)
> >
> > CREATE VIEW
> >
> > CREATE VIEW viewName
> > [
> > ( columnName [, columnName]* )
> > ]
> > AS queryStatement;
> >
> > CREATE FUNCTION
> >
> > CREATE FUNCTION functionName
> > AS 'className';
> >
> > className ::=
> > fully qualified name
> >
> >
> > Shuyi Chen <[hidden email]> 于2018年11月28日周三上午3:28写道：
> >
> > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the design
> doc
> > > first and start implementation w/o the unified connector API ready by
> > > skipping some featue.
> > >
> > > Xuefu, I like the idea of making Flink specific properties into generic
> > > key-value pairs, so that it will make integration with Hive DDL (or
> > others,
> > > e.g. Beam DDL) easier.
> > >
> > > I'll run a final pass over the design doc and finalize the design in
> the
> > > next few days. And we can start creating tasks and collaborate on the
> > > implementation. Thanks a lot for all the comments and inputs.
> > >
> > > Cheers!
> > > Shuyi
> > >
> > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <[hidden email]>
> > > wrote:
> > >
> > > > Yeah! I agree with Timo that DDL can actually proceed w/o being
> blocked
> > > by
> > > > connector API. We can leave the unknown out while defining the basic
> > > syntax.
> > > >
> > > > @Shuyi
> > > >
> > > > As commented in the doc, I think we can probably stick with simple
> > syntax
> > > > with general properties, without extending the syntax too much that
> it
> > > > mimics the descriptor API.
> > > >
> > > > Part of our effort on Flink-Hive integration is also to make DDL
> syntax
> > > > compatible with Hive's. The one in the current proposal seems making
> > our
> > > > effort more challenging.
> > > >
> > > > We can help and collaborate. At this moment, I think we can finalize
> on
> > > > the proposal and then we can divide the tasks for better
> collaboration.
> > > >
> > > > Please let me know if there are any questions or suggestions.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > >
> > > >
> > > >
> > > > ------------------------------------------------------------------
> > > > Sender:Timo Walther <[hidden email]>
> > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > Recipient:dev <[hidden email]>
> > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > >
> > > > Thanks for offering your help here, Xuefu. It would be great to move
> > > > these efforts forward. I agree that the DDL is somehow releated to
> the
> > > > unified connector API design but we can also start with the basic
> > > > functionality now and evolve the DDL during this release and next
> > > releases.
> > > >
> > > > For example, we could identify the MVP DDL syntax that skips defining
> > > > key constraints and maybe even time attributes. This DDL could be
> used
> > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > operations like windows).
> > > >
> > > > The unified connector API is high on our priority list for the 1.8
> > > > release. I will try to update the document until mid of next week.
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Timo
> > > >
> > > >
> > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > Thanks a lot, Xuefu. I was busy for some other stuff for the last 2
> > > > weeks,
> > > > > but we are definitely interested in moving this forward. I think
> once
> > > the
> > > > > unified connector API design [1] is done, we can finalize the DDL
> > > design
> > > > as
> > > > > well and start creating concrete subtasks to collaborate on the
> > > > > implementation with the community.
> > > > >
> > > > > Shuyi
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > >
> > > > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <
> > [hidden email]>
> > > > > wrote:
> > > > >
> > > > >> Hi Shuyi,
> > > > >>
> > > > >> I'm wondering if you folks still have the bandwidth working on
> this.
> > > > >>
> > > > >> We have some dedicated resource and like to move this forward. We
> > can
> > > > >> collaborate.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Xuefu
> > > > >>
> > > > >>
> > > > >> ------------------------------------------------------------------
> > > > >> 发件人：wenlong.lwl<[hidden email]>
> > > > >> 日期：2018年11月05日 11:15:35
> > > > >> 收件人：<[hidden email]>
> > > > >> 主题：Re: [DISCUSS] Flink SQL DDL Design
> > > > >>
> > > > >> Hi, Shuyi, thanks for the proposal.
> > > > >>
> > > > >> I have two concerns about the table ddl:
> > > > >>
> > > > >> 1. how about remove the source/sink mark from the ddl, because it
> is
> > > not
> > > > >> necessary, the framework determine the table referred is a source
> > or a
> > > > sink
> > > > >> according to the context of the query using the table. it will be
> > more
> > > > >> convenient for use defining a table which can be both a source and
> > > sink,
> > > > >> and more convenient for catalog to persistent and manage the meta
> > > infos.
> > > > >>
> > > > >> 2. how about just keeping one pure string map as parameters for
> > table,
> > > > like
> > > > >> create tabe Kafka10SourceTable (
> > > > >> intField INTEGER,
> > > > >> stringField VARCHAR(128),
> > > > >> longField BIGINT,
> > > > >> rowTimeField TIMESTAMP
> > > > >> ) with (
> > > > >> connector.type = ’kafka’,
> > > > >> connector.property-version = ’1’,
> > > > >> connector.version = ’0.10’,
> > > > >> connector.properties.topic = ‘test-kafka-topic’,
> > > > >> connector.properties.startup-mode = ‘latest-offset’,
> > > > >> connector.properties.specific-offset = ‘offset’,
> > > > >> format.type = 'json'
> > > > >> format.prperties.version=’1’,
> > > > >> format.derive-schema = 'true'
> > > > >> );
> > > > >> Because:
> > > > >> 1. in TableFactory, what user use is a string map properties,
> > defining
> > > > >> parameters by string-map can be the closest way to mapping how
> user
> > > use
> > > > the
> > > > >> parameters.
> > > > >> 2. The table descriptor can be extended by user, like what is done
> > in
> > > > Kafka
> > > > >> and Json, it means that the parameter keys in connector or format
> > can
> > > be
> > > > >> different in different implementation, we can not restrict the key
> > in
> > > a
> > > > >> specified set, so we need a map in connector scope and a map in
> > > > >> connector.properties scope. why not just give user a single map,
> let
> > > > them
> > > > >> put parameters in a format they like, which is also the simplest
> way
> > > to
> > > > >> implement DDL parser.
> > > > >> 3. whether we can define a format clause or not, depends on the
> > > > >> implementation of the connector, using different clause in DDL may
> > > make
> > > > a
> > > > >> misunderstanding that we can combine the connectors with arbitrary
> > > > formats,
> > > > >> which may not work actually.
> > > > >>
> > > > >> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <[hidden email]>
> > > wrote:
> > > > >>
> > > > >>> +1, Thanks for the proposal.
> > > > >>>
> > > > >>> I guess this is a long-awaited change. This can vastly increase
> the
> > > > >>> functionalities of the SQL Client as it will be possible to use
> > > complex
> > > > >>> extensions like for example those provided by Apache Bahir[1].
> > > > >>>
> > > > >>> Best Regards,
> > > > >>> Dom.
> > > > >>>
> > > > >>> [1]
> > > > >>> https://github.com/apache/bahir-flink
> > > > >>>
> > > > >>> sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]>
> > napisał(a):
> > > > >>>
> > > > >>>> +1. Thanks for putting the proposal together Shuyi.
> > > > >>>>
> > > > >>>> DDL has been brought up in a couple of times previously [1,2].
> > > > >> Utilizing
> > > > >>>> DDL will definitely be a great extension to the current Flink
> SQL
> > to
> > > > >>>> systematically support some of the previously brought up
> features
> > > such
> > > > >> as
> > > > >>>> [3]. And it will also be beneficial to see the document closely
> > > > aligned
> > > > >>>> with the previous discussion for unified SQL connector API [4].
> > > > >>>>
> > > > >>>> I also left a few comments on the doc. Looking forward to the
> > > > alignment
> > > > >>>> with the other couple of efforts and contributing to them!
> > > > >>>>
> > > > >>>> Best,
> > > > >>>> Rong
> > > > >>>>
> > > > >>>> [1]
> > > > >>>>
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > > >>>> [2]
> > > > >>>>
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > > > >>>> [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > > >>>> [4]
> > > > >>>>
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
> > > > >>>>
> > > > >>>> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <[hidden email]>
> > > wrote:
> > > > >>>>
> > > > >>>>> Thanks Shuyi!
> > > > >>>>>
> > > > >>>>> I left some comments there. I think the design of SQL DDL and
> > > > >>> Flink-Hive
> > > > >>>>> integration/External catalog enhancements will work closely
> with
> > > each
> > > > >>>>> other. Hope we are well aligned on the directions of the two
> > > designs,
> > > > >>>> and I
> > > > >>>>> look forward to working with you guys on both!
> > > > >>>>>
> > > > >>>>> Bowen
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <[hidden email]
> >
> > > > >> wrote:
> > > > >>>>>> Hi everyone,
> > > > >>>>>>
> > > > >>>>>> SQL DDL support has been a long-time ask from the community.
> > > > >> Current
> > > > >>>>> Flink
> > > > >>>>>> SQL support only DML (e.g. SELECT and INSERT statements). In
> its
> > > > >>>> current
> > > > >>>>>> form, Flink SQL users still need to define/create table
> sources
> > > and
> > > > >>>> sinks
> > > > >>>>>> programmatically in Java/Scala. Also, in SQL Client, without
> DDL
> > > > >>>> support,
> > > > >>>>>> the current implementation does not allow dynamical creation
> of
> > > > >>> table,
> > > > >>>>> type
> > > > >>>>>> or functions with SQL, this adds friction for its adoption.
> > > > >>>>>>
> > > > >>>>>> I drafted a design doc [1] with a few other community members
> > that
> > > > >>>>> proposes
> > > > >>>>>> the design and implementation for adding DDL support in Flink.
> > The
> > > > >>>>> initial
> > > > >>>>>> design considers DDL for table, view, type, library and
> > function.
> > > > >> It
> > > > >>>> will
> > > > >>>>>> be great to get feedback on the design from the community, and
> > > > >> align
> > > > >>>> with
> > > > >>>>>> latest effort in unified SQL connector API [2] and Flink Hive
> > > > >>>>> integration
> > > > >>>>>> [3].
> > > > >>>>>>
> > > > >>>>>> Any feedback is highly appreciated.
> > > > >>>>>>
> > > > >>>>>> Thanks
> > > > >>>>>> Shuyi Chen
> > > > >>>>>>
> > > > >>>>>> [1]
> > > > >>>>>>
> > > > >>>>>>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > > >>>>>> [2]
> > > > >>>>>>
> > > > >>>>>>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > >>>>>> [3]
> > > > >>>>>>
> > > > >>>>>>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > > >>>>>> --
> > > > >>>>>> "So you have to trust that the dots will somehow connect in
> your
> > > > >>>> future."
> > > > >>
> > >
> > >
> > >
> > > --
> > > "So you have to trust that the dots will somehow connect in your
> future."
> > >
> >
>

Jark Wu-2

Re: [DISCUSS] Flink SQL DDL Design

Hi Shaoxuan,

I think summarizing it into a google doc is a good idea. We will prepare it
in the next few days.

Thanks,
Jark

Shaoxuan Wang <[hidden email]> 于2018年11月28日周三下午9:17写道：

> Hi Lin and Jark,
> Thanks for sharing those details. Can you please consider summarizing your
> DDL design into a google doc.
> We can still continue the discussions on Shuyi's proposal. But having a
> separate google doc will be easy for the DEV to understand/comment/discuss
> on your proposed DDL implementation.
>
> Regards,
> Shaoxuan
>
>
> On Wed, Nov 28, 2018 at 7:39 PM Jark Wu <[hidden email]> wrote:
>
> > Hi Shuyi,
> >
> > Thanks for bringing up this discussion and the awesome work! I have left
> > some comments in the doc.
> >
> > I want to share something more about the watermark definition learned
> from
> > Alibaba.
> >
> > 1.
> >
> > Table should be able to accept multiple watermark definition.
> >
> > Because a table may have more than one rowtime field. For example, one
> > rowtime field is from existing field but missing in some records,
> > another
> > is the ingestion timestamp in Kafka but not very accurate. In this
> case,
> > user may define two rowtime fields with watermarks in the Table and
> > choose
> > one in different situation.
> > 2.
> >
> > Watermark stragety always work with rowtime field together.
> >
> > Based on the two points metioned above, I think we should combine the
> > watermark strategy and rowtime field selection (i.e. which existing field
> > used to generate watermark) in one clause, so that we can define multiple
> > watermarks in one Table.
> >
> > Here I will share the watermark syntax used in Alibaba (simply modified):
> >
> > watermarkDefinition:
> > WATERMARK [watermarkName] FOR <rowtime_field> AS wm_strategy
> >
> > wm_strategy:
> > BOUNDED WITH OFFSET 'string' timeUnit
> > |
> > ASCENDING
> >
> > The “WATERMARK” keyword starts a watermark definition. The “FOR” keyword
> > defines which existing field used to generate watermark, this field
> should
> > already exist in the schema (we can use computed-column to derive from
> > other fields). The “AS” keyword defines watermark strategy, such as
> BOUNDED
> > WITH OFFSET (covers almost all the requirements) and ASCENDING.
> >
> > When the expected rowtime field does not exist in the schema, we can use
> > computed-column syntax to derive it from other existing fields using
> > built-in functions or user defined functions. So the rowtime/watermark
> > definition doesn’t need to care about “field-change” strategy
> > (replace/add/from-field). And the proctime field definition can also be
> > defined using computed-column. Such as pt as PROCTIME() which defines a
> > proctime field named “pt” in the schema.
> >
> > Looking forward to working with you guys!
> >
> > Best,
> > Jark Wu
> >
> >
> > Lin Li <[hidden email]> 于2018年11月28日周三下午6:33写道：
> >
> > > @Shuyi
> > > Thanks for the proposal! We have a simple DDL implementation (extends
> > > Calcite's parser) which been running for almost two years on production
> > and
> > > works well.
> > > I think the most valued things we'd learned is keeping simplicity and
> > > standard compliance.
> > > Here's the approximate grammar, FYI
> > > CREATE TABLE
> > >
> > > CREATE TABLE tableName(
> > > columnDefinition [, columnDefinition]*
> > > [ computedColumnDefinition [, computedColumnDefinition]* ]
> > > [ tableConstraint [, tableConstraint]* ]
> > > [ tableIndex [, tableIndex]* ]
> > > [ PERIOD FOR SYSTEM_TIME ]
> > > [ WATERMARK watermarkName FOR rowTimeColumn AS
> > > withOffset(rowTimeColumn, offset) ] ) [ WITH ( tableOption [ ,
> > > tableOption]* ) ] [ ; ]
> > >
> > > columnDefinition ::=
> > > columnName dataType [ NOT NULL ]
> > >
> > > dataType ::=
> > > {
> > > [ VARCHAR ]
> > > | [ BOOLEAN ]
> > > | [ TINYINT ]
> > > | [ SMALLINT ]
> > > | [ INT ]
> > > | [ BIGINT ]
> > > | [ FLOAT ]
> > > | [ DECIMAL ]
> > > | [ DOUBLE ]
> > > | [ DATE ]
> > > | [ TIME ]
> > > | [ TIMESTAMP ]
> > > | [ VARBINARY ]
> > > }
> > >
> > > computedColumnDefinition ::=
> > > columnName AS computedColumnExpression
> > >
> > > tableConstraint ::=
> > > { PRIMARY KEY | UNIQUE }
> > > (columnName [, columnName]* )
> > >
> > > tableIndex ::=
> > > [ UNIQUE ] INDEX indexName
> > > (columnName [, columnName]* )
> > >
> > > rowTimeColumn ::=
> > > columnName
> > >
> > > tableOption ::=
> > > property=value
> > > offset ::=
> > > positive integer (unit: ms)
> > >
> > > CREATE VIEW
> > >
> > > CREATE VIEW viewName
> > > [
> > > ( columnName [, columnName]* )
> > > ]
> > > AS queryStatement;
> > >
> > > CREATE FUNCTION
> > >
> > > CREATE FUNCTION functionName
> > > AS 'className';
> > >
> > > className ::=
> > > fully qualified name
> > >
> > >
> > > Shuyi Chen <[hidden email]> 于2018年11月28日周三上午3:28写道：
> > >
> > > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the design
> > doc
> > > > first and start implementation w/o the unified connector API ready by
> > > > skipping some featue.
> > > >
> > > > Xuefu, I like the idea of making Flink specific properties into
> generic
> > > > key-value pairs, so that it will make integration with Hive DDL (or
> > > others,
> > > > e.g. Beam DDL) easier.
> > > >
> > > > I'll run a final pass over the design doc and finalize the design in
> > the
> > > > next few days. And we can start creating tasks and collaborate on the
> > > > implementation. Thanks a lot for all the comments and inputs.
> > > >
> > > > Cheers!
> > > > Shuyi
> > > >
> > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> [hidden email]>
> > > > wrote:
> > > >
> > > > > Yeah! I agree with Timo that DDL can actually proceed w/o being
> > blocked
> > > > by
> > > > > connector API. We can leave the unknown out while defining the
> basic
> > > > syntax.
> > > > >
> > > > > @Shuyi
> > > > >
> > > > > As commented in the doc, I think we can probably stick with simple
> > > syntax
> > > > > with general properties, without extending the syntax too much that
> > it
> > > > > mimics the descriptor API.
> > > > >
> > > > > Part of our effort on Flink-Hive integration is also to make DDL
> > syntax
> > > > > compatible with Hive's. The one in the current proposal seems
> making
> > > our
> > > > > effort more challenging.
> > > > >
> > > > > We can help and collaborate. At this moment, I think we can
> finalize
> > on
> > > > > the proposal and then we can divide the tasks for better
> > collaboration.
> > > > >
> > > > > Please let me know if there are any questions or suggestions.
> > > > >
> > > > > Thanks,
> > > > > Xuefu
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > Sender:Timo Walther <[hidden email]>
> > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > Recipient:dev <[hidden email]>
> > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > >
> > > > > Thanks for offering your help here, Xuefu. It would be great to
> move
> > > > > these efforts forward. I agree that the DDL is somehow releated to
> > the
> > > > > unified connector API design but we can also start with the basic
> > > > > functionality now and evolve the DDL during this release and next
> > > > releases.
> > > > >
> > > > > For example, we could identify the MVP DDL syntax that skips
> defining
> > > > > key constraints and maybe even time attributes. This DDL could be
> > used
> > > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > > operations like windows).
> > > > >
> > > > > The unified connector API is high on our priority list for the 1.8
> > > > > release. I will try to update the document until mid of next week.
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Timo
> > > > >
> > > > >
> > > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > > Thanks a lot, Xuefu. I was busy for some other stuff for the
> last 2
> > > > > weeks,
> > > > > > but we are definitely interested in moving this forward. I think
> > once
> > > > the
> > > > > > unified connector API design [1] is done, we can finalize the DDL
> > > > design
> > > > > as
> > > > > > well and start creating concrete subtasks to collaborate on the
> > > > > > implementation with the community.
> > > > > >
> > > > > > Shuyi
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > >
> > > > > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <
> > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Shuyi,
> > > > > >>
> > > > > >> I'm wondering if you folks still have the bandwidth working on
> > this.
> > > > > >>
> > > > > >> We have some dedicated resource and like to move this forward.
> We
> > > can
> > > > > >> collaborate.
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Xuefu
> > > > > >>
> > > > > >>
> > > > > >>
> ------------------------------------------------------------------
> > > > > >> 发件人：wenlong.lwl<[hidden email]>
> > > > > >> 日期：2018年11月05日 11:15:35
> > > > > >> 收件人：<[hidden email]>
> > > > > >> 主题：Re: [DISCUSS] Flink SQL DDL Design
> > > > > >>
> > > > > >> Hi, Shuyi, thanks for the proposal.
> > > > > >>
> > > > > >> I have two concerns about the table ddl:
> > > > > >>
> > > > > >> 1. how about remove the source/sink mark from the ddl, because
> it
> > is
> > > > not
> > > > > >> necessary, the framework determine the table referred is a
> source
> > > or a
> > > > > sink
> > > > > >> according to the context of the query using the table. it will
> be
> > > more
> > > > > >> convenient for use defining a table which can be both a source
> and
> > > > sink,
> > > > > >> and more convenient for catalog to persistent and manage the
> meta
> > > > infos.
> > > > > >>
> > > > > >> 2. how about just keeping one pure string map as parameters for
> > > table,
> > > > > like
> > > > > >> create tabe Kafka10SourceTable (
> > > > > >> intField INTEGER,
> > > > > >> stringField VARCHAR(128),
> > > > > >> longField BIGINT,
> > > > > >> rowTimeField TIMESTAMP
> > > > > >> ) with (
> > > > > >> connector.type = ’kafka’,
> > > > > >> connector.property-version = ’1’,
> > > > > >> connector.version = ’0.10’,
> > > > > >> connector.properties.topic = ‘test-kafka-topic’,
> > > > > >> connector.properties.startup-mode = ‘latest-offset’,
> > > > > >> connector.properties.specific-offset = ‘offset’,
> > > > > >> format.type = 'json'
> > > > > >> format.prperties.version=’1’,
> > > > > >> format.derive-schema = 'true'
> > > > > >> );
> > > > > >> Because:
> > > > > >> 1. in TableFactory, what user use is a string map properties,
> > > defining
> > > > > >> parameters by string-map can be the closest way to mapping how
> > user
> > > > use
> > > > > the
> > > > > >> parameters.
> > > > > >> 2. The table descriptor can be extended by user, like what is
> done
> > > in
> > > > > Kafka
> > > > > >> and Json, it means that the parameter keys in connector or
> format
> > > can
> > > > be
> > > > > >> different in different implementation, we can not restrict the
> key
> > > in
> > > > a
> > > > > >> specified set, so we need a map in connector scope and a map in
> > > > > >> connector.properties scope. why not just give user a single map,
> > let
> > > > > them
> > > > > >> put parameters in a format they like, which is also the simplest
> > way
> > > > to
> > > > > >> implement DDL parser.
> > > > > >> 3. whether we can define a format clause or not, depends on the
> > > > > >> implementation of the connector, using different clause in DDL
> may
> > > > make
> > > > > a
> > > > > >> misunderstanding that we can combine the connectors with
> arbitrary
> > > > > formats,
> > > > > >> which may not work actually.
> > > > > >>
> > > > > >> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <[hidden email]
> >
> > > > wrote:
> > > > > >>
> > > > > >>> +1, Thanks for the proposal.
> > > > > >>>
> > > > > >>> I guess this is a long-awaited change. This can vastly increase
> > the
> > > > > >>> functionalities of the SQL Client as it will be possible to use
> > > > complex
> > > > > >>> extensions like for example those provided by Apache Bahir[1].
> > > > > >>>
> > > > > >>> Best Regards,
> > > > > >>> Dom.
> > > > > >>>
> > > > > >>> [1]
> > > > > >>> https://github.com/apache/bahir-flink
> > > > > >>>
> > > > > >>> sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]>
> > > napisał(a):
> > > > > >>>
> > > > > >>>> +1. Thanks for putting the proposal together Shuyi.
> > > > > >>>>
> > > > > >>>> DDL has been brought up in a couple of times previously [1,2].
> > > > > >> Utilizing
> > > > > >>>> DDL will definitely be a great extension to the current Flink
> > SQL
> > > to
> > > > > >>>> systematically support some of the previously brought up
> > features
> > > > such
> > > > > >> as
> > > > > >>>> [3]. And it will also be beneficial to see the document
> closely
> > > > > aligned
> > > > > >>>> with the previous discussion for unified SQL connector API
> [4].
> > > > > >>>>
> > > > > >>>> I also left a few comments on the doc. Looking forward to the
> > > > > alignment
> > > > > >>>> with the other couple of efforts and contributing to them!
> > > > > >>>>
> > > > > >>>> Best,
> > > > > >>>> Rong
> > > > > >>>>
> > > > > >>>> [1]
> > > > > >>>>
> > > > > >>>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > > > >>>> [2]
> > > > > >>>>
> > > > > >>>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > > > > >>>> [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > > > >>>> [4]
> > > > > >>>>
> > > > > >>>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
> > > > > >>>>
> > > > > >>>> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <[hidden email]
> >
> > > > wrote:
> > > > > >>>>
> > > > > >>>>> Thanks Shuyi!
> > > > > >>>>>
> > > > > >>>>> I left some comments there. I think the design of SQL DDL and
> > > > > >>> Flink-Hive
> > > > > >>>>> integration/External catalog enhancements will work closely
> > with
> > > > each
> > > > > >>>>> other. Hope we are well aligned on the directions of the two
> > > > designs,
> > > > > >>>> and I
> > > > > >>>>> look forward to working with you guys on both!
> > > > > >>>>>
> > > > > >>>>> Bowen
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <
> [hidden email]
> > >
> > > > > >> wrote:
> > > > > >>>>>> Hi everyone,
> > > > > >>>>>>
> > > > > >>>>>> SQL DDL support has been a long-time ask from the community.
> > > > > >> Current
> > > > > >>>>> Flink
> > > > > >>>>>> SQL support only DML (e.g. SELECT and INSERT statements). In
> > its
> > > > > >>>> current
> > > > > >>>>>> form, Flink SQL users still need to define/create table
> > sources
> > > > and
> > > > > >>>> sinks
> > > > > >>>>>> programmatically in Java/Scala. Also, in SQL Client, without
> > DDL
> > > > > >>>> support,
> > > > > >>>>>> the current implementation does not allow dynamical creation
> > of
> > > > > >>> table,
> > > > > >>>>> type
> > > > > >>>>>> or functions with SQL, this adds friction for its adoption.
> > > > > >>>>>>
> > > > > >>>>>> I drafted a design doc [1] with a few other community
> members
> > > that
> > > > > >>>>> proposes
> > > > > >>>>>> the design and implementation for adding DDL support in
> Flink.
> > > The
> > > > > >>>>> initial
> > > > > >>>>>> design considers DDL for table, view, type, library and
> > > function.
> > > > > >> It
> > > > > >>>> will
> > > > > >>>>>> be great to get feedback on the design from the community,
> and
> > > > > >> align
> > > > > >>>> with
> > > > > >>>>>> latest effort in unified SQL connector API [2] and Flink
> Hive
> > > > > >>>>> integration
> > > > > >>>>>> [3].
> > > > > >>>>>>
> > > > > >>>>>> Any feedback is highly appreciated.
> > > > > >>>>>>
> > > > > >>>>>> Thanks
> > > > > >>>>>> Shuyi Chen
> > > > > >>>>>>
> > > > > >>>>>> [1]
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > > > >>>>>> [2]
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > >>>>>> [3]
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > > > >>>>>> --
> > > > > >>>>>> "So you have to trust that the dots will somehow connect in
> > your
> > > > > >>>> future."
> > > > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > "So you have to trust that the dots will somehow connect in your
> > future."
> > > >
> > >
> >
>

Shuyi Chen

Re: [DISCUSS] Flink SQL DDL Design

Thanks a lot, Shaoxuan, Jack and Lin. We should definitely collaborate
here, we have also our own DDL implementation running in production for
almost 2 years at Uber. With the joint experience from both companies, we
can definitely make the Flink SQL DDL better.

As @shaoxuan suggest, Jark can come up with a doc that talks about the
current DDL design in Alibaba, and we can discuss and merge them into one,
make it as a FLIP, and plan the tasks for implementation. Also, we should
take into account the new external catalog effort in the design. What do
you guys think?

Shuyi

On Wed, Nov 28, 2018 at 6:45 AM Jark Wu <[hidden email]> wrote:

> Hi Shaoxuan,
>
> I think summarizing it into a google doc is a good idea. We will prepare it
> in the next few days.
>
> Thanks,
> Jark
>
> Shaoxuan Wang <[hidden email]> 于2018年11月28日周三下午9:17写道：
>
> > Hi Lin and Jark,
> > Thanks for sharing those details. Can you please consider summarizing
> your
> > DDL design into a google doc.
> > We can still continue the discussions on Shuyi's proposal. But having a
> > separate google doc will be easy for the DEV to
> understand/comment/discuss
> > on your proposed DDL implementation.
> >
> > Regards,
> > Shaoxuan
> >
> >
> > On Wed, Nov 28, 2018 at 7:39 PM Jark Wu <[hidden email]> wrote:
> >
> > > Hi Shuyi,
> > >
> > > Thanks for bringing up this discussion and the awesome work! I have
> left
> > > some comments in the doc.
> > >
> > > I want to share something more about the watermark definition learned
> > from
> > > Alibaba.
> > >
> > > 1.
> > >
> > > Table should be able to accept multiple watermark definition.
> > >
> > > Because a table may have more than one rowtime field. For example,
> one
> > > rowtime field is from existing field but missing in some records,
> > > another
> > > is the ingestion timestamp in Kafka but not very accurate. In this
> > case,
> > > user may define two rowtime fields with watermarks in the Table and
> > > choose
> > > one in different situation.
> > > 2.
> > >
> > > Watermark stragety always work with rowtime field together.
> > >
> > > Based on the two points metioned above, I think we should combine the
> > > watermark strategy and rowtime field selection (i.e. which existing
> field
> > > used to generate watermark) in one clause, so that we can define
> multiple
> > > watermarks in one Table.
> > >
> > > Here I will share the watermark syntax used in Alibaba (simply
> modified):
> > >
> > > watermarkDefinition:
> > > WATERMARK [watermarkName] FOR <rowtime_field> AS wm_strategy
> > >
> > > wm_strategy:
> > > BOUNDED WITH OFFSET 'string' timeUnit
> > > |
> > > ASCENDING
> > >
> > > The “WATERMARK” keyword starts a watermark definition. The “FOR”
> keyword
> > > defines which existing field used to generate watermark, this field
> > should
> > > already exist in the schema (we can use computed-column to derive from
> > > other fields). The “AS” keyword defines watermark strategy, such as
> > BOUNDED
> > > WITH OFFSET (covers almost all the requirements) and ASCENDING.
> > >
> > > When the expected rowtime field does not exist in the schema, we can
> use
> > > computed-column syntax to derive it from other existing fields using
> > > built-in functions or user defined functions. So the rowtime/watermark
> > > definition doesn’t need to care about “field-change” strategy
> > > (replace/add/from-field). And the proctime field definition can also be
> > > defined using computed-column. Such as pt as PROCTIME() which defines a
> > > proctime field named “pt” in the schema.
> > >
> > > Looking forward to working with you guys!
> > >
> > > Best,
> > > Jark Wu
> > >
> > >
> > > Lin Li <[hidden email]> 于2018年11月28日周三下午6:33写道：
> > >
> > > > @Shuyi
> > > > Thanks for the proposal! We have a simple DDL implementation
> (extends
> > > > Calcite's parser) which been running for almost two years on
> production
> > > and
> > > > works well.
> > > > I think the most valued things we'd learned is keeping simplicity and
> > > > standard compliance.
> > > > Here's the approximate grammar, FYI
> > > > CREATE TABLE
> > > >
> > > > CREATE TABLE tableName(
> > > > columnDefinition [, columnDefinition]*
> > > > [ computedColumnDefinition [, computedColumnDefinition]* ]
> > > > [ tableConstraint [, tableConstraint]* ]
> > > > [ tableIndex [, tableIndex]* ]
> > > > [ PERIOD FOR SYSTEM_TIME ]
> > > > [ WATERMARK watermarkName FOR rowTimeColumn AS
> > > > withOffset(rowTimeColumn, offset) ] ) [ WITH ( tableOption [ ,
> > > > tableOption]* ) ] [ ; ]
> > > >
> > > > columnDefinition ::=
> > > > columnName dataType [ NOT NULL ]
> > > >
> > > > dataType ::=
> > > > {
> > > > [ VARCHAR ]
> > > > | [ BOOLEAN ]
> > > > | [ TINYINT ]
> > > > | [ SMALLINT ]
> > > > | [ INT ]
> > > > | [ BIGINT ]
> > > > | [ FLOAT ]
> > > > | [ DECIMAL ]
> > > > | [ DOUBLE ]
> > > > | [ DATE ]
> > > > | [ TIME ]
> > > > | [ TIMESTAMP ]
> > > > | [ VARBINARY ]
> > > > }
> > > >
> > > > computedColumnDefinition ::=
> > > > columnName AS computedColumnExpression
> > > >
> > > > tableConstraint ::=
> > > > { PRIMARY KEY | UNIQUE }
> > > > (columnName [, columnName]* )
> > > >
> > > > tableIndex ::=
> > > > [ UNIQUE ] INDEX indexName
> > > > (columnName [, columnName]* )
> > > >
> > > > rowTimeColumn ::=
> > > > columnName
> > > >
> > > > tableOption ::=
> > > > property=value
> > > > offset ::=
> > > > positive integer (unit: ms)
> > > >
> > > > CREATE VIEW
> > > >
> > > > CREATE VIEW viewName
> > > > [
> > > > ( columnName [, columnName]* )
> > > > ]
> > > > AS queryStatement;
> > > >
> > > > CREATE FUNCTION
> > > >
> > > > CREATE FUNCTION functionName
> > > > AS 'className';
> > > >
> > > > className ::=
> > > > fully qualified name
> > > >
> > > >
> > > > Shuyi Chen <[hidden email]> 于2018年11月28日周三上午3:28写道：
> > > >
> > > > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the
> design
> > > doc
> > > > > first and start implementation w/o the unified connector API ready
> by
> > > > > skipping some featue.
> > > > >
> > > > > Xuefu, I like the idea of making Flink specific properties into
> > generic
> > > > > key-value pairs, so that it will make integration with Hive DDL (or
> > > > others,
> > > > > e.g. Beam DDL) easier.
> > > > >
> > > > > I'll run a final pass over the design doc and finalize the design
> in
> > > the
> > > > > next few days. And we can start creating tasks and collaborate on
> the
> > > > > implementation. Thanks a lot for all the comments and inputs.
> > > > >
> > > > > Cheers!
> > > > > Shuyi
> > > > >
> > > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Yeah! I agree with Timo that DDL can actually proceed w/o being
> > > blocked
> > > > > by
> > > > > > connector API. We can leave the unknown out while defining the
> > basic
> > > > > syntax.
> > > > > >
> > > > > > @Shuyi
> > > > > >
> > > > > > As commented in the doc, I think we can probably stick with
> simple
> > > > syntax
> > > > > > with general properties, without extending the syntax too much
> that
> > > it
> > > > > > mimics the descriptor API.
> > > > > >
> > > > > > Part of our effort on Flink-Hive integration is also to make DDL
> > > syntax
> > > > > > compatible with Hive's. The one in the current proposal seems
> > making
> > > > our
> > > > > > effort more challenging.
> > > > > >
> > > > > > We can help and collaborate. At this moment, I think we can
> > finalize
> > > on
> > > > > > the proposal and then we can divide the tasks for better
> > > collaboration.
> > > > > >
> > > > > > Please let me know if there are any questions or suggestions.
> > > > > >
> > > > > > Thanks,
> > > > > > Xuefu
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> ------------------------------------------------------------------
> > > > > > Sender:Timo Walther <[hidden email]>
> > > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > > Recipient:dev <[hidden email]>
> > > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > > >
> > > > > > Thanks for offering your help here, Xuefu. It would be great to
> > move
> > > > > > these efforts forward. I agree that the DDL is somehow releated
> to
> > > the
> > > > > > unified connector API design but we can also start with the basic
> > > > > > functionality now and evolve the DDL during this release and next
> > > > > releases.
> > > > > >
> > > > > > For example, we could identify the MVP DDL syntax that skips
> > defining
> > > > > > key constraints and maybe even time attributes. This DDL could be
> > > used
> > > > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > > > operations like windows).
> > > > > >
> > > > > > The unified connector API is high on our priority list for the
> 1.8
> > > > > > release. I will try to update the document until mid of next
> week.
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Timo
> > > > > >
> > > > > >
> > > > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > > > Thanks a lot, Xuefu. I was busy for some other stuff for the
> > last 2
> > > > > > weeks,
> > > > > > > but we are definitely interested in moving this forward. I
> think
> > > once
> > > > > the
> > > > > > > unified connector API design [1] is done, we can finalize the
> DDL
> > > > > design
> > > > > > as
> > > > > > > well and start creating concrete subtasks to collaborate on the
> > > > > > > implementation with the community.
> > > > > > >
> > > > > > > Shuyi
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > > >
> > > > > > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <
> > > > [hidden email]>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Shuyi,
> > > > > > >>
> > > > > > >> I'm wondering if you folks still have the bandwidth working on
> > > this.
> > > > > > >>
> > > > > > >> We have some dedicated resource and like to move this forward.
> > We
> > > > can
> > > > > > >> collaborate.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >>
> > > > > > >> Xuefu
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > ------------------------------------------------------------------
> > > > > > >> 发件人：wenlong.lwl<[hidden email]>
> > > > > > >> 日期：2018年11月05日 11:15:35
> > > > > > >> 收件人：<[hidden email]>
> > > > > > >> 主题：Re: [DISCUSS] Flink SQL DDL Design
> > > > > > >>
> > > > > > >> Hi, Shuyi, thanks for the proposal.
> > > > > > >>
> > > > > > >> I have two concerns about the table ddl:
> > > > > > >>
> > > > > > >> 1. how about remove the source/sink mark from the ddl, because
> > it
> > > is
> > > > > not
> > > > > > >> necessary, the framework determine the table referred is a
> > source
> > > > or a
> > > > > > sink
> > > > > > >> according to the context of the query using the table. it will
> > be
> > > > more
> > > > > > >> convenient for use defining a table which can be both a source
> > and
> > > > > sink,
> > > > > > >> and more convenient for catalog to persistent and manage the
> > meta
> > > > > infos.
> > > > > > >>
> > > > > > >> 2. how about just keeping one pure string map as parameters
> for
> > > > table,
> > > > > > like
> > > > > > >> create tabe Kafka10SourceTable (
> > > > > > >> intField INTEGER,
> > > > > > >> stringField VARCHAR(128),
> > > > > > >> longField BIGINT,
> > > > > > >> rowTimeField TIMESTAMP
> > > > > > >> ) with (
> > > > > > >> connector.type = ’kafka’,
> > > > > > >> connector.property-version = ’1’,
> > > > > > >> connector.version = ’0.10’,
> > > > > > >> connector.properties.topic = ‘test-kafka-topic’,
> > > > > > >> connector.properties.startup-mode = ‘latest-offset’,
> > > > > > >> connector.properties.specific-offset = ‘offset’,
> > > > > > >> format.type = 'json'
> > > > > > >> format.prperties.version=’1’,
> > > > > > >> format.derive-schema = 'true'
> > > > > > >> );
> > > > > > >> Because:
> > > > > > >> 1. in TableFactory, what user use is a string map properties,
> > > > defining
> > > > > > >> parameters by string-map can be the closest way to mapping how
> > > user
> > > > > use
> > > > > > the
> > > > > > >> parameters.
> > > > > > >> 2. The table descriptor can be extended by user, like what is
> > done
> > > > in
> > > > > > Kafka
> > > > > > >> and Json, it means that the parameter keys in connector or
> > format
> > > > can
> > > > > be
> > > > > > >> different in different implementation, we can not restrict the
> > key
> > > > in
> > > > > a
> > > > > > >> specified set, so we need a map in connector scope and a map
> in
> > > > > > >> connector.properties scope. why not just give user a single
> map,
> > > let
> > > > > > them
> > > > > > >> put parameters in a format they like, which is also the
> simplest
> > > way
> > > > > to
> > > > > > >> implement DDL parser.
> > > > > > >> 3. whether we can define a format clause or not, depends on
> the
> > > > > > >> implementation of the connector, using different clause in DDL
> > may
> > > > > make
> > > > > > a
> > > > > > >> misunderstanding that we can combine the connectors with
> > arbitrary
> > > > > > formats,
> > > > > > >> which may not work actually.
> > > > > > >>
> > > > > > >> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <
> [hidden email]
> > >
> > > > > wrote:
> > > > > > >>
> > > > > > >>> +1, Thanks for the proposal.
> > > > > > >>>
> > > > > > >>> I guess this is a long-awaited change. This can vastly
> increase
> > > the
> > > > > > >>> functionalities of the SQL Client as it will be possible to
> use
> > > > > complex
> > > > > > >>> extensions like for example those provided by Apache
> Bahir[1].
> > > > > > >>>
> > > > > > >>> Best Regards,
> > > > > > >>> Dom.
> > > > > > >>>
> > > > > > >>> [1]
> > > > > > >>> https://github.com/apache/bahir-flink
> > > > > > >>>
> > > > > > >>> sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]>
> > > > napisał(a):
> > > > > > >>>
> > > > > > >>>> +1. Thanks for putting the proposal together Shuyi.
> > > > > > >>>>
> > > > > > >>>> DDL has been brought up in a couple of times previously
> [1,2].
> > > > > > >> Utilizing
> > > > > > >>>> DDL will definitely be a great extension to the current
> Flink
> > > SQL
> > > > to
> > > > > > >>>> systematically support some of the previously brought up
> > > features
> > > > > such
> > > > > > >> as
> > > > > > >>>> [3]. And it will also be beneficial to see the document
> > closely
> > > > > > aligned
> > > > > > >>>> with the previous discussion for unified SQL connector API
> > [4].
> > > > > > >>>>
> > > > > > >>>> I also left a few comments on the doc. Looking forward to
> the
> > > > > > alignment
> > > > > > >>>> with the other couple of efforts and contributing to them!
> > > > > > >>>>
> > > > > > >>>> Best,
> > > > > > >>>> Rong
> > > > > > >>>>
> > > > > > >>>> [1]
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > > > > >>>> [2]
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > > > > > >>>> [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > > > > >>>> [4]
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
> > > > > > >>>>
> > > > > > >>>> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <
> [hidden email]
> > >
> > > > > wrote:
> > > > > > >>>>
> > > > > > >>>>> Thanks Shuyi!
> > > > > > >>>>>
> > > > > > >>>>> I left some comments there. I think the design of SQL DDL
> and
> > > > > > >>> Flink-Hive
> > > > > > >>>>> integration/External catalog enhancements will work closely
> > > with
> > > > > each
> > > > > > >>>>> other. Hope we are well aligned on the directions of the
> two
> > > > > designs,
> > > > > > >>>> and I
> > > > > > >>>>> look forward to working with you guys on both!
> > > > > > >>>>>
> > > > > > >>>>> Bowen
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <
> > [hidden email]
> > > >
> > > > > > >> wrote:
> > > > > > >>>>>> Hi everyone,
> > > > > > >>>>>>
> > > > > > >>>>>> SQL DDL support has been a long-time ask from the
> community.
> > > > > > >> Current
> > > > > > >>>>> Flink
> > > > > > >>>>>> SQL support only DML (e.g. SELECT and INSERT statements).
> In
> > > its
> > > > > > >>>> current
> > > > > > >>>>>> form, Flink SQL users still need to define/create table
> > > sources
> > > > > and
> > > > > > >>>> sinks
> > > > > > >>>>>> programmatically in Java/Scala. Also, in SQL Client,
> without
> > > DDL
> > > > > > >>>> support,
> > > > > > >>>>>> the current implementation does not allow dynamical
> creation
> > > of
> > > > > > >>> table,
> > > > > > >>>>> type
> > > > > > >>>>>> or functions with SQL, this adds friction for its
> adoption.
> > > > > > >>>>>>
> > > > > > >>>>>> I drafted a design doc [1] with a few other community
> > members
> > > > that
> > > > > > >>>>> proposes
> > > > > > >>>>>> the design and implementation for adding DDL support in
> > Flink.
> > > > The
> > > > > > >>>>> initial
> > > > > > >>>>>> design considers DDL for table, view, type, library and
> > > > function.
> > > > > > >> It
> > > > > > >>>> will
> > > > > > >>>>>> be great to get feedback on the design from the community,
> > and
> > > > > > >> align
> > > > > > >>>> with
> > > > > > >>>>>> latest effort in unified SQL connector API [2] and Flink
> > Hive
> > > > > > >>>>> integration
> > > > > > >>>>>> [3].
> > > > > > >>>>>>
> > > > > > >>>>>> Any feedback is highly appreciated.
> > > > > > >>>>>>
> > > > > > >>>>>> Thanks
> > > > > > >>>>>> Shuyi Chen
> > > > > > >>>>>>
> > > > > > >>>>>> [1]
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > > > > >>>>>> [2]
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > > >>>>>> [3]
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > > > > >>>>>> --
> > > > > > >>>>>> "So you have to trust that the dots will somehow connect
> in
> > > your
> > > > > > >>>> future."
> > > > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > "So you have to trust that the dots will somehow connect in your
> > > future."
> > > > >
> > > >
> > >
> >
>

--
"So you have to trust that the dots will somehow connect in your future."

Zhang, Xuefu

Re: [DISCUSS] Flink SQL DDL Design

+1 Sounds great!

------------------------------------------------------------------
Sender:Shuyi Chen <[hidden email]>
Sent at:2018 Nov 29 (Thu) 06:56
Recipient:dev <[hidden email]>
Subject:Re: [DISCUSS] Flink SQL DDL Design

Thanks a lot, Shaoxuan, Jack and Lin. We should definitely collaborate
here, we have also our own DDL implementation running in production for
almost 2 years at Uber. With the joint experience from both companies, we
can definitely make the Flink SQL DDL better.

As @shaoxuan suggest, Jark can come up with a doc that talks about the
current DDL design in Alibaba, and we can discuss and merge them into one,
make it as a FLIP, and plan the tasks for implementation. Also, we should
take into account the new external catalog effort in the design. What do
you guys think?

Shuyi

On Wed, Nov 28, 2018 at 6:45 AM Jark Wu <[hidden email]> wrote:

--
"So you have to trust that the dots will somehow connect in your future."

bowen.li

Re: [DISCUSS] Flink SQL DDL Design

Hi Shuyi,

re: we should take into account the new external catalog effort in the
design

Definitely. We've had a much better understanding of catalogs over the past
few weeks, and will be more than happy to contribute to that part.

Thanks, Bowen

On Wed, Nov 28, 2018 at 4:12 PM Zhang, Xuefu <[hidden email]>
wrote:

> +1 Sounds great!
>
>
> ------------------------------------------------------------------
> Sender:Shuyi Chen <[hidden email]>
> Sent at:2018 Nov 29 (Thu) 06:56
> Recipient:dev <[hidden email]>
> Subject:Re: [DISCUSS] Flink SQL DDL Design
>
> Thanks a lot, Shaoxuan, Jack and Lin. We should definitely collaborate
> here, we have also our own DDL implementation running in production for
> almost 2 years at Uber. With the joint experience from both companies, we
> can definitely make the Flink SQL DDL better.
>
> As @shaoxuan suggest, Jark can come up with a doc that talks about the
> current DDL design in Alibaba, and we can discuss and merge them into one,
> make it as a FLIP, and plan the tasks for implementation. Also, we should
> take into account the new external catalog effort in the design. What do
> you guys think?
>
> Shuyi
>
> On Wed, Nov 28, 2018 at 6:45 AM Jark Wu <[hidden email]> wrote:
>
> > Hi Shaoxuan,
> >
> > I think summarizing it into a google doc is a good idea. We will prepare
> it
> > in the next few days.
> >
> > Thanks,
> > Jark
> >
> > Shaoxuan Wang <[hidden email]> 于2018年11月28日周三下午9:17写道：
> >
> > > Hi Lin and Jark,
> > > Thanks for sharing those details. Can you please consider summarizing
> > your
> > > DDL design into a google doc.
> > > We can still continue the discussions on Shuyi's proposal. But having a
> > > separate google doc will be easy for the DEV to
> > understand/comment/discuss
> > > on your proposed DDL implementation.
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > >
> > > On Wed, Nov 28, 2018 at 7:39 PM Jark Wu <[hidden email]> wrote:
> > >
> > > > Hi Shuyi,
> > > >
> > > > Thanks for bringing up this discussion and the awesome work! I have
> > left
> > > > some comments in the doc.
> > > >
> > > > I want to share something more about the watermark definition learned
> > > from
> > > > Alibaba.
> > > >
> > > > 1.
> > > >
> > > > Table should be able to accept multiple watermark definition.
> > > >
> > > > Because a table may have more than one rowtime field. For example,
> > one
> > > > rowtime field is from existing field but missing in some records,
> > > > another
> > > > is the ingestion timestamp in Kafka but not very accurate. In this
> > > case,
> > > > user may define two rowtime fields with watermarks in the Table
> and
> > > > choose
> > > > one in different situation.
> > > > 2.
> > > >
> > > > Watermark stragety always work with rowtime field together.
> > > >
> > > > Based on the two points metioned above, I think we should combine the
> > > > watermark strategy and rowtime field selection (i.e. which existing
> > field
> > > > used to generate watermark) in one clause, so that we can define
> > multiple
> > > > watermarks in one Table.
> > > >
> > > > Here I will share the watermark syntax used in Alibaba (simply
> > modified):
> > > >
> > > > watermarkDefinition:
> > > > WATERMARK [watermarkName] FOR <rowtime_field> AS wm_strategy
> > > >
> > > > wm_strategy:
> > > > BOUNDED WITH OFFSET 'string' timeUnit
> > > > |
> > > > ASCENDING
> > > >
> > > > The “WATERMARK” keyword starts a watermark definition. The “FOR”
> > keyword
> > > > defines which existing field used to generate watermark, this field
> > > should
> > > > already exist in the schema (we can use computed-column to derive
> from
> > > > other fields). The “AS” keyword defines watermark strategy, such as
> > > BOUNDED
> > > > WITH OFFSET (covers almost all the requirements) and ASCENDING.
> > > >
> > > > When the expected rowtime field does not exist in the schema, we can
> > use
> > > > computed-column syntax to derive it from other existing fields using
> > > > built-in functions or user defined functions. So the
> rowtime/watermark
> > > > definition doesn’t need to care about “field-change” strategy
> > > > (replace/add/from-field). And the proctime field definition can also
> be
> > > > defined using computed-column. Such as pt as PROCTIME() which
> defines a
> > > > proctime field named “pt” in the schema.
> > > >
> > > > Looking forward to working with you guys!
> > > >
> > > > Best,
> > > > Jark Wu
> > > >
> > > >
> > > > Lin Li <[hidden email]> 于2018年11月28日周三下午6:33写道：
> > > >
> > > > > @Shuyi
> > > > > Thanks for the proposal! We have a simple DDL implementation
> > (extends
> > > > > Calcite's parser) which been running for almost two years on
> > production
> > > > and
> > > > > works well.
> > > > > I think the most valued things we'd learned is keeping simplicity
> and
> > > > > standard compliance.
> > > > > Here's the approximate grammar, FYI
> > > > > CREATE TABLE
> > > > >
> > > > > CREATE TABLE tableName(
> > > > > columnDefinition [, columnDefinition]*
> > > > > [ computedColumnDefinition [, computedColumnDefinition]* ]
> > > > > [ tableConstraint [, tableConstraint]* ]
> > > > > [ tableIndex [, tableIndex]* ]
> > > > > [ PERIOD FOR SYSTEM_TIME ]
> > > > > [ WATERMARK watermarkName FOR rowTimeColumn AS
> > > > > withOffset(rowTimeColumn, offset) ] ) [ WITH ( tableOption [ ,
> > > > > tableOption]* ) ] [ ; ]
> > > > >
> > > > > columnDefinition ::=
> > > > > columnName dataType [ NOT NULL ]
> > > > >
> > > > > dataType ::=
> > > > > {
> > > > > [ VARCHAR ]
> > > > > | [ BOOLEAN ]
> > > > > | [ TINYINT ]
> > > > > | [ SMALLINT ]
> > > > > | [ INT ]
> > > > > | [ BIGINT ]
> > > > > | [ FLOAT ]
> > > > > | [ DECIMAL ]
> > > > > | [ DOUBLE ]
> > > > > | [ DATE ]
> > > > > | [ TIME ]
> > > > > | [ TIMESTAMP ]
> > > > > | [ VARBINARY ]
> > > > > }
> > > > >
> > > > > computedColumnDefinition ::=
> > > > > columnName AS computedColumnExpression
> > > > >
> > > > > tableConstraint ::=
> > > > > { PRIMARY KEY | UNIQUE }
> > > > > (columnName [, columnName]* )
> > > > >
> > > > > tableIndex ::=
> > > > > [ UNIQUE ] INDEX indexName
> > > > > (columnName [, columnName]* )
> > > > >
> > > > > rowTimeColumn ::=
> > > > > columnName
> > > > >
> > > > > tableOption ::=
> > > > > property=value
> > > > > offset ::=
> > > > > positive integer (unit: ms)
> > > > >
> > > > > CREATE VIEW
> > > > >
> > > > > CREATE VIEW viewName
> > > > > [
> > > > > ( columnName [, columnName]* )
> > > > > ]
> > > > > AS queryStatement;
> > > > >
> > > > > CREATE FUNCTION
> > > > >
> > > > > CREATE FUNCTION functionName
> > > > > AS 'className';
> > > > >
> > > > > className ::=
> > > > > fully qualified name
> > > > >
> > > > >
> > > > > Shuyi Chen <[hidden email]> 于2018年11月28日周三上午3:28写道：
> > > > >
> > > > > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the
> > design
> > > > doc
> > > > > > first and start implementation w/o the unified connector API
> ready
> > by
> > > > > > skipping some featue.
> > > > > >
> > > > > > Xuefu, I like the idea of making Flink specific properties into
> > > generic
> > > > > > key-value pairs, so that it will make integration with Hive DDL
> (or
> > > > > others,
> > > > > > e.g. Beam DDL) easier.
> > > > > >
> > > > > > I'll run a final pass over the design doc and finalize the design
> > in
> > > > the
> > > > > > next few days. And we can start creating tasks and collaborate on
> > the
> > > > > > implementation. Thanks a lot for all the comments and inputs.
> > > > > >
> > > > > > Cheers!
> > > > > > Shuyi
> > > > > >
> > > > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Yeah! I agree with Timo that DDL can actually proceed w/o being
> > > > blocked
> > > > > > by
> > > > > > > connector API. We can leave the unknown out while defining the
> > > basic
> > > > > > syntax.
> > > > > > >
> > > > > > > @Shuyi
> > > > > > >
> > > > > > > As commented in the doc, I think we can probably stick with
> > simple
> > > > > syntax
> > > > > > > with general properties, without extending the syntax too much
> > that
> > > > it
> > > > > > > mimics the descriptor API.
> > > > > > >
> > > > > > > Part of our effort on Flink-Hive integration is also to make
> DDL
> > > > syntax
> > > > > > > compatible with Hive's. The one in the current proposal seems
> > > making
> > > > > our
> > > > > > > effort more challenging.
> > > > > > >
> > > > > > > We can help and collaborate. At this moment, I think we can
> > > finalize
> > > > on
> > > > > > > the proposal and then we can divide the tasks for better
> > > > collaboration.
> > > > > > >
> > > > > > > Please let me know if there are any questions or suggestions.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Xuefu
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > ------------------------------------------------------------------
> > > > > > > Sender:Timo Walther <[hidden email]>
> > > > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > > > Recipient:dev <[hidden email]>
> > > > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > > > >
> > > > > > > Thanks for offering your help here, Xuefu. It would be great to
> > > move
> > > > > > > these efforts forward. I agree that the DDL is somehow releated
> > to
> > > > the
> > > > > > > unified connector API design but we can also start with the
> basic
> > > > > > > functionality now and evolve the DDL during this release and
> next
> > > > > > releases.
> > > > > > >
> > > > > > > For example, we could identify the MVP DDL syntax that skips
> > > defining
> > > > > > > key constraints and maybe even time attributes. This DDL could
> be
> > > > used
> > > > > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > > > > operations like windows).
> > > > > > >
> > > > > > > The unified connector API is high on our priority list for the
> > 1.8
> > > > > > > release. I will try to update the document until mid of next
> > week.
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Timo
> > > > > > >
> > > > > > >
> > > > > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > > > > Thanks a lot, Xuefu. I was busy for some other stuff for the
> > > last 2
> > > > > > > weeks,
> > > > > > > > but we are definitely interested in moving this forward. I
> > think
> > > > once
> > > > > > the
> > > > > > > > unified connector API design [1] is done, we can finalize the
> > DDL
> > > > > > design
> > > > > > > as
> > > > > > > > well and start creating concrete subtasks to collaborate on
> the
> > > > > > > > implementation with the community.
> > > > > > > >
> > > > > > > > Shuyi
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > > > >
> > > > > > > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <
> > > > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi Shuyi,
> > > > > > > >>
> > > > > > > >> I'm wondering if you folks still have the bandwidth working
> on
> > > > this.
> > > > > > > >>
> > > > > > > >> We have some dedicated resource and like to move this
> forward.
> > > We
> > > > > can
> > > > > > > >> collaborate.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Xuefu
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > ------------------------------------------------------------------
> > > > > > > >> 发件人：wenlong.lwl<[hidden email]>
> > > > > > > >> 日期：2018年11月05日 11:15:35
> > > > > > > >> 收件人：<[hidden email]>
> > > > > > > >> 主题：Re: [DISCUSS] Flink SQL DDL Design
> > > > > > > >>
> > > > > > > >> Hi, Shuyi, thanks for the proposal.
> > > > > > > >>
> > > > > > > >> I have two concerns about the table ddl:
> > > > > > > >>
> > > > > > > >> 1. how about remove the source/sink mark from the ddl,
> because
> > > it
> > > > is
> > > > > > not
> > > > > > > >> necessary, the framework determine the table referred is a
> > > source
> > > > > or a
> > > > > > > sink
> > > > > > > >> according to the context of the query using the table. it
> will
> > > be
> > > > > more
> > > > > > > >> convenient for use defining a table which can be both a
> source
> > > and
> > > > > > sink,
> > > > > > > >> and more convenient for catalog to persistent and manage the
> > > meta
> > > > > > infos.
> > > > > > > >>
> > > > > > > >> 2. how about just keeping one pure string map as parameters
> > for
> > > > > table,
> > > > > > > like
> > > > > > > >> create tabe Kafka10SourceTable (
> > > > > > > >> intField INTEGER,
> > > > > > > >> stringField VARCHAR(128),
> > > > > > > >> longField BIGINT,
> > > > > > > >> rowTimeField TIMESTAMP
> > > > > > > >> ) with (
> > > > > > > >> connector.type = ’kafka’,
> > > > > > > >> connector.property-version = ’1’,
> > > > > > > >> connector.version = ’0.10’,
> > > > > > > >> connector.properties.topic = ‘test-kafka-topic’,
> > > > > > > >> connector.properties.startup-mode = ‘latest-offset’,
> > > > > > > >> connector.properties.specific-offset = ‘offset’,
> > > > > > > >> format.type = 'json'
> > > > > > > >> format.prperties.version=’1’,
> > > > > > > >> format.derive-schema = 'true'
> > > > > > > >> );
> > > > > > > >> Because:
> > > > > > > >> 1. in TableFactory, what user use is a string map
> properties,
> > > > > defining
> > > > > > > >> parameters by string-map can be the closest way to mapping
> how
> > > > user
> > > > > > use
> > > > > > > the
> > > > > > > >> parameters.
> > > > > > > >> 2. The table descriptor can be extended by user, like what
> is
> > > done
> > > > > in
> > > > > > > Kafka
> > > > > > > >> and Json, it means that the parameter keys in connector or
> > > format
> > > > > can
> > > > > > be
> > > > > > > >> different in different implementation, we can not restrict
> the
> > > key
> > > > > in
> > > > > > a
> > > > > > > >> specified set, so we need a map in connector scope and a map
> > in
> > > > > > > >> connector.properties scope. why not just give user a single
> > map,
> > > > let
> > > > > > > them
> > > > > > > >> put parameters in a format they like, which is also the
> > simplest
> > > > way
> > > > > > to
> > > > > > > >> implement DDL parser.
> > > > > > > >> 3. whether we can define a format clause or not, depends on
> > the
> > > > > > > >> implementation of the connector, using different clause in
> DDL
> > > may
> > > > > > make
> > > > > > > a
> > > > > > > >> misunderstanding that we can combine the connectors with
> > > arbitrary
> > > > > > > formats,
> > > > > > > >> which may not work actually.
> > > > > > > >>
> > > > > > > >> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <
> > [hidden email]
> > > >
> > > > > > wrote:
> > > > > > > >>
> > > > > > > >>> +1, Thanks for the proposal.
> > > > > > > >>>
> > > > > > > >>> I guess this is a long-awaited change. This can vastly
> > increase
> > > > the
> > > > > > > >>> functionalities of the SQL Client as it will be possible to
> > use
> > > > > > complex
> > > > > > > >>> extensions like for example those provided by Apache
> > Bahir[1].
> > > > > > > >>>
> > > > > > > >>> Best Regards,
> > > > > > > >>> Dom.
> > > > > > > >>>
> > > > > > > >>> [1]
> > > > > > > >>> https://github.com/apache/bahir-flink
> > > > > > > >>>
> > > > > > > >>> sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]>
> > > > > napisał(a):
> > > > > > > >>>
> > > > > > > >>>> +1. Thanks for putting the proposal together Shuyi.
> > > > > > > >>>>
> > > > > > > >>>> DDL has been brought up in a couple of times previously
> > [1,2].
> > > > > > > >> Utilizing
> > > > > > > >>>> DDL will definitely be a great extension to the current
> > Flink
> > > > SQL
> > > > > to
> > > > > > > >>>> systematically support some of the previously brought up
> > > > features
> > > > > > such
> > > > > > > >> as
> > > > > > > >>>> [3]. And it will also be beneficial to see the document
> > > closely
> > > > > > > aligned
> > > > > > > >>>> with the previous discussion for unified SQL connector API
> > > [4].
> > > > > > > >>>>
> > > > > > > >>>> I also left a few comments on the doc. Looking forward to
> > the
> > > > > > > alignment
> > > > > > > >>>> with the other couple of efforts and contributing to them!
> > > > > > > >>>>
> > > > > > > >>>> Best,
> > > > > > > >>>> Rong
> > > > > > > >>>>
> > > > > > > >>>> [1]
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > > > > > >>>> [2]
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > > > > > > >>>> [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > > > > > >>>> [4]
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
> > > > > > > >>>>
> > > > > > > >>>> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <
> > [hidden email]
> > > >
> > > > > > wrote:
> > > > > > > >>>>
> > > > > > > >>>>> Thanks Shuyi!
> > > > > > > >>>>>
> > > > > > > >>>>> I left some comments there. I think the design of SQL DDL
> > and
> > > > > > > >>> Flink-Hive
> > > > > > > >>>>> integration/External catalog enhancements will work
> closely
> > > > with
> > > > > > each
> > > > > > > >>>>> other. Hope we are well aligned on the directions of the
> > two
> > > > > > designs,
> > > > > > > >>>> and I
> > > > > > > >>>>> look forward to working with you guys on both!
> > > > > > > >>>>>
> > > > > > > >>>>> Bowen
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <
> > > [hidden email]
> > > > >
> > > > > > > >> wrote:
> > > > > > > >>>>>> Hi everyone,
> > > > > > > >>>>>>
> > > > > > > >>>>>> SQL DDL support has been a long-time ask from the
> > community.
> > > > > > > >> Current
> > > > > > > >>>>> Flink
> > > > > > > >>>>>> SQL support only DML (e.g. SELECT and INSERT
> statements).
> > In
> > > > its
> > > > > > > >>>> current
> > > > > > > >>>>>> form, Flink SQL users still need to define/create table
> > > > sources
> > > > > > and
> > > > > > > >>>> sinks
> > > > > > > >>>>>> programmatically in Java/Scala. Also, in SQL Client,
> > without
> > > > DDL
> > > > > > > >>>> support,
> > > > > > > >>>>>> the current implementation does not allow dynamical
> > creation
> > > > of
> > > > > > > >>> table,
> > > > > > > >>>>> type
> > > > > > > >>>>>> or functions with SQL, this adds friction for its
> > adoption.
> > > > > > > >>>>>>
> > > > > > > >>>>>> I drafted a design doc [1] with a few other community
> > > members
> > > > > that
> > > > > > > >>>>> proposes
> > > > > > > >>>>>> the design and implementation for adding DDL support in
> > > Flink.
> > > > > The
> > > > > > > >>>>> initial
> > > > > > > >>>>>> design considers DDL for table, view, type, library and
> > > > > function.
> > > > > > > >> It
> > > > > > > >>>> will
> > > > > > > >>>>>> be great to get feedback on the design from the
> community,
> > > and
> > > > > > > >> align
> > > > > > > >>>> with
> > > > > > > >>>>>> latest effort in unified SQL connector API [2] and Flink
> > > Hive
> > > > > > > >>>>> integration
> > > > > > > >>>>>> [3].
> > > > > > > >>>>>>
> > > > > > > >>>>>> Any feedback is highly appreciated.
> > > > > > > >>>>>>
> > > > > > > >>>>>> Thanks
> > > > > > > >>>>>> Shuyi Chen
> > > > > > > >>>>>>
> > > > > > > >>>>>> [1]
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > > > > > >>>>>> [2]
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > > > >>>>>> [3]
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > > > > > >>>>>> --
> > > > > > > >>>>>> "So you have to trust that the dots will somehow connect
> > in
> > > > your
> > > > > > > >>>> future."
> > > > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > "So you have to trust that the dots will somehow connect in your
> > > > future."
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>

Lin Li

Re: [DISCUSS] Flink SQL DDL Design

In reply to this post by Shuyi Chen

Hi Shuyi and Shaoxuan,
It's great to have a collaboration with you guys! Jark and I drafted an
google
doc [1] mainly about the DDL syntax part which summarizing the current
version in Alibaba.
We can continue discussing the DDL design and API changes based on your doc.
Let's moving this forward!

[1]
https://docs.google.com/document/d/1o16jC-AxnZoxMfHQptkKQkSC6ZDDBRhKg6gm8VGnY-k/edit?usp=sharing

Thanks
Lin

Shuyi Chen <[hidden email]> 于2018年11月29日周四上午6:56写道：

> Thanks a lot, Shaoxuan, Jack and Lin. We should definitely collaborate
> here, we have also our own DDL implementation running in production for
> almost 2 years at Uber. With the joint experience from both companies, we
> can definitely make the Flink SQL DDL better.
>
> As @shaoxuan suggest, Jark can come up with a doc that talks about the
> current DDL design in Alibaba, and we can discuss and merge them into one,
> make it as a FLIP, and plan the tasks for implementation. Also, we should
> take into account the new external catalog effort in the design. What do
> you guys think?
>
> Shuyi
>
> On Wed, Nov 28, 2018 at 6:45 AM Jark Wu <[hidden email]> wrote:
>
> > Hi Shaoxuan,
> >
> > I think summarizing it into a google doc is a good idea. We will prepare
> it
> > in the next few days.
> >
> > Thanks,
> > Jark
> >
> > Shaoxuan Wang <[hidden email]> 于2018年11月28日周三下午9:17写道：
> >
> > > Hi Lin and Jark,
> > > Thanks for sharing those details. Can you please consider summarizing
> > your
> > > DDL design into a google doc.
> > > We can still continue the discussions on Shuyi's proposal. But having a
> > > separate google doc will be easy for the DEV to
> > understand/comment/discuss
> > > on your proposed DDL implementation.
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > >
> > > On Wed, Nov 28, 2018 at 7:39 PM Jark Wu <[hidden email]> wrote:
> > >
> > > > Hi Shuyi,
> > > >
> > > > Thanks for bringing up this discussion and the awesome work! I have
> > left
> > > > some comments in the doc.
> > > >
> > > > I want to share something more about the watermark definition learned
> > > from
> > > > Alibaba.
> > > >
> > > > 1.
> > > >
> > > > Table should be able to accept multiple watermark definition.
> > > >
> > > > Because a table may have more than one rowtime field. For example,
> > one
> > > > rowtime field is from existing field but missing in some records,
> > > > another
> > > > is the ingestion timestamp in Kafka but not very accurate. In this
> > > case,
> > > > user may define two rowtime fields with watermarks in the Table
> and
> > > > choose
> > > > one in different situation.
> > > > 2.
> > > >
> > > > Watermark stragety always work with rowtime field together.
> > > >
> > > > Based on the two points metioned above, I think we should combine the
> > > > watermark strategy and rowtime field selection (i.e. which existing
> > field
> > > > used to generate watermark) in one clause, so that we can define
> > multiple
> > > > watermarks in one Table.
> > > >
> > > > Here I will share the watermark syntax used in Alibaba (simply
> > modified):
> > > >
> > > > watermarkDefinition:
> > > > WATERMARK [watermarkName] FOR <rowtime_field> AS wm_strategy
> > > >
> > > > wm_strategy:
> > > > BOUNDED WITH OFFSET 'string' timeUnit
> > > > |
> > > > ASCENDING
> > > >
> > > > The “WATERMARK” keyword starts a watermark definition. The “FOR”
> > keyword
> > > > defines which existing field used to generate watermark, this field
> > > should
> > > > already exist in the schema (we can use computed-column to derive
> from
> > > > other fields). The “AS” keyword defines watermark strategy, such as
> > > BOUNDED
> > > > WITH OFFSET (covers almost all the requirements) and ASCENDING.
> > > >
> > > > When the expected rowtime field does not exist in the schema, we can
> > use
> > > > computed-column syntax to derive it from other existing fields using
> > > > built-in functions or user defined functions. So the
> rowtime/watermark
> > > > definition doesn’t need to care about “field-change” strategy
> > > > (replace/add/from-field). And the proctime field definition can also
> be
> > > > defined using computed-column. Such as pt as PROCTIME() which
> defines a
> > > > proctime field named “pt” in the schema.
> > > >
> > > > Looking forward to working with you guys!
> > > >
> > > > Best,
> > > > Jark Wu
> > > >
> > > >
> > > > Lin Li <[hidden email]> 于2018年11月28日周三下午6:33写道：
> > > >
> > > > > @Shuyi
> > > > > Thanks for the proposal! We have a simple DDL implementation
> > (extends
> > > > > Calcite's parser) which been running for almost two years on
> > production
> > > > and
> > > > > works well.
> > > > > I think the most valued things we'd learned is keeping simplicity
> and
> > > > > standard compliance.
> > > > > Here's the approximate grammar, FYI
> > > > > CREATE TABLE
> > > > >
> > > > > CREATE TABLE tableName(
> > > > > columnDefinition [, columnDefinition]*
> > > > > [ computedColumnDefinition [, computedColumnDefinition]* ]
> > > > > [ tableConstraint [, tableConstraint]* ]
> > > > > [ tableIndex [, tableIndex]* ]
> > > > > [ PERIOD FOR SYSTEM_TIME ]
> > > > > [ WATERMARK watermarkName FOR rowTimeColumn AS
> > > > > withOffset(rowTimeColumn, offset) ] ) [ WITH ( tableOption [ ,
> > > > > tableOption]* ) ] [ ; ]
> > > > >
> > > > > columnDefinition ::=
> > > > > columnName dataType [ NOT NULL ]
> > > > >
> > > > > dataType ::=
> > > > > {
> > > > > [ VARCHAR ]
> > > > > | [ BOOLEAN ]
> > > > > | [ TINYINT ]
> > > > > | [ SMALLINT ]
> > > > > | [ INT ]
> > > > > | [ BIGINT ]
> > > > > | [ FLOAT ]
> > > > > | [ DECIMAL ]
> > > > > | [ DOUBLE ]
> > > > > | [ DATE ]
> > > > > | [ TIME ]
> > > > > | [ TIMESTAMP ]
> > > > > | [ VARBINARY ]
> > > > > }
> > > > >
> > > > > computedColumnDefinition ::=
> > > > > columnName AS computedColumnExpression
> > > > >
> > > > > tableConstraint ::=
> > > > > { PRIMARY KEY | UNIQUE }
> > > > > (columnName [, columnName]* )
> > > > >
> > > > > tableIndex ::=
> > > > > [ UNIQUE ] INDEX indexName
> > > > > (columnName [, columnName]* )
> > > > >
> > > > > rowTimeColumn ::=
> > > > > columnName
> > > > >
> > > > > tableOption ::=
> > > > > property=value
> > > > > offset ::=
> > > > > positive integer (unit: ms)
> > > > >
> > > > > CREATE VIEW
> > > > >
> > > > > CREATE VIEW viewName
> > > > > [
> > > > > ( columnName [, columnName]* )
> > > > > ]
> > > > > AS queryStatement;
> > > > >
> > > > > CREATE FUNCTION
> > > > >
> > > > > CREATE FUNCTION functionName
> > > > > AS 'className';
> > > > >
> > > > > className ::=
> > > > > fully qualified name
> > > > >
> > > > >
> > > > > Shuyi Chen <[hidden email]> 于2018年11月28日周三上午3:28写道：
> > > > >
> > > > > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the
> > design
> > > > doc
> > > > > > first and start implementation w/o the unified connector API
> ready
> > by
> > > > > > skipping some featue.
> > > > > >
> > > > > > Xuefu, I like the idea of making Flink specific properties into
> > > generic
> > > > > > key-value pairs, so that it will make integration with Hive DDL
> (or
> > > > > others,
> > > > > > e.g. Beam DDL) easier.
> > > > > >
> > > > > > I'll run a final pass over the design doc and finalize the design
> > in
> > > > the
> > > > > > next few days. And we can start creating tasks and collaborate on
> > the
> > > > > > implementation. Thanks a lot for all the comments and inputs.
> > > > > >
> > > > > > Cheers!
> > > > > > Shuyi
> > > > > >
> > > > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Yeah! I agree with Timo that DDL can actually proceed w/o being
> > > > blocked
> > > > > > by
> > > > > > > connector API. We can leave the unknown out while defining the
> > > basic
> > > > > > syntax.
> > > > > > >
> > > > > > > @Shuyi
> > > > > > >
> > > > > > > As commented in the doc, I think we can probably stick with
> > simple
> > > > > syntax
> > > > > > > with general properties, without extending the syntax too much
> > that
> > > > it
> > > > > > > mimics the descriptor API.
> > > > > > >
> > > > > > > Part of our effort on Flink-Hive integration is also to make
> DDL
> > > > syntax
> > > > > > > compatible with Hive's. The one in the current proposal seems
> > > making
> > > > > our
> > > > > > > effort more challenging.
> > > > > > >
> > > > > > > We can help and collaborate. At this moment, I think we can
> > > finalize
> > > > on
> > > > > > > the proposal and then we can divide the tasks for better
> > > > collaboration.
> > > > > > >
> > > > > > > Please let me know if there are any questions or suggestions.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Xuefu
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > ------------------------------------------------------------------
> > > > > > > Sender:Timo Walther <[hidden email]>
> > > > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > > > Recipient:dev <[hidden email]>
> > > > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > > > >
> > > > > > > Thanks for offering your help here, Xuefu. It would be great to
> > > move
> > > > > > > these efforts forward. I agree that the DDL is somehow releated
> > to
> > > > the
> > > > > > > unified connector API design but we can also start with the
> basic
> > > > > > > functionality now and evolve the DDL during this release and
> next
> > > > > > releases.
> > > > > > >
> > > > > > > For example, we could identify the MVP DDL syntax that skips
> > > defining
> > > > > > > key constraints and maybe even time attributes. This DDL could
> be
> > > > used
> > > > > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > > > > operations like windows).
> > > > > > >
> > > > > > > The unified connector API is high on our priority list for the
> > 1.8
> > > > > > > release. I will try to update the document until mid of next
> > week.
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Timo
> > > > > > >
> > > > > > >
> > > > > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > > > > Thanks a lot, Xuefu. I was busy for some other stuff for the
> > > last 2
> > > > > > > weeks,
> > > > > > > > but we are definitely interested in moving this forward. I
> > think
> > > > once
> > > > > > the
> > > > > > > > unified connector API design [1] is done, we can finalize the
> > DDL
> > > > > > design
> > > > > > > as
> > > > > > > > well and start creating concrete subtasks to collaborate on
> the
> > > > > > > > implementation with the community.
> > > > > > > >
> > > > > > > > Shuyi
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > > > >
> > > > > > > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <
> > > > > [hidden email]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi Shuyi,
> > > > > > > >>
> > > > > > > >> I'm wondering if you folks still have the bandwidth working
> on
> > > > this.
> > > > > > > >>
> > > > > > > >> We have some dedicated resource and like to move this
> forward.
> > > We
> > > > > can
> > > > > > > >> collaborate.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Xuefu
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > ------------------------------------------------------------------
> > > > > > > >> 发件人：wenlong.lwl<[hidden email]>
> > > > > > > >> 日期：2018年11月05日 11:15:35
> > > > > > > >> 收件人：<[hidden email]>
> > > > > > > >> 主题：Re: [DISCUSS] Flink SQL DDL Design
> > > > > > > >>
> > > > > > > >> Hi, Shuyi, thanks for the proposal.
> > > > > > > >>
> > > > > > > >> I have two concerns about the table ddl:
> > > > > > > >>
> > > > > > > >> 1. how about remove the source/sink mark from the ddl,
> because
> > > it
> > > > is
> > > > > > not
> > > > > > > >> necessary, the framework determine the table referred is a
> > > source
> > > > > or a
> > > > > > > sink
> > > > > > > >> according to the context of the query using the table. it
> will
> > > be
> > > > > more
> > > > > > > >> convenient for use defining a table which can be both a
> source
> > > and
> > > > > > sink,
> > > > > > > >> and more convenient for catalog to persistent and manage the
> > > meta
> > > > > > infos.
> > > > > > > >>
> > > > > > > >> 2. how about just keeping one pure string map as parameters
> > for
> > > > > table,
> > > > > > > like
> > > > > > > >> create tabe Kafka10SourceTable (
> > > > > > > >> intField INTEGER,
> > > > > > > >> stringField VARCHAR(128),
> > > > > > > >> longField BIGINT,
> > > > > > > >> rowTimeField TIMESTAMP
> > > > > > > >> ) with (
> > > > > > > >> connector.type = ’kafka’,
> > > > > > > >> connector.property-version = ’1’,
> > > > > > > >> connector.version = ’0.10’,
> > > > > > > >> connector.properties.topic = ‘test-kafka-topic’,
> > > > > > > >> connector.properties.startup-mode = ‘latest-offset’,
> > > > > > > >> connector.properties.specific-offset = ‘offset’,
> > > > > > > >> format.type = 'json'
> > > > > > > >> format.prperties.version=’1’,
> > > > > > > >> format.derive-schema = 'true'
> > > > > > > >> );
> > > > > > > >> Because:
> > > > > > > >> 1. in TableFactory, what user use is a string map
> properties,
> > > > > defining
> > > > > > > >> parameters by string-map can be the closest way to mapping
> how
> > > > user
> > > > > > use
> > > > > > > the
> > > > > > > >> parameters.
> > > > > > > >> 2. The table descriptor can be extended by user, like what
> is
> > > done
> > > > > in
> > > > > > > Kafka
> > > > > > > >> and Json, it means that the parameter keys in connector or
> > > format
> > > > > can
> > > > > > be
> > > > > > > >> different in different implementation, we can not restrict
> the
> > > key
> > > > > in
> > > > > > a
> > > > > > > >> specified set, so we need a map in connector scope and a map
> > in
> > > > > > > >> connector.properties scope. why not just give user a single
> > map,
> > > > let
> > > > > > > them
> > > > > > > >> put parameters in a format they like, which is also the
> > simplest
> > > > way
> > > > > > to
> > > > > > > >> implement DDL parser.
> > > > > > > >> 3. whether we can define a format clause or not, depends on
> > the
> > > > > > > >> implementation of the connector, using different clause in
> DDL
> > > may
> > > > > > make
> > > > > > > a
> > > > > > > >> misunderstanding that we can combine the connectors with
> > > arbitrary
> > > > > > > formats,
> > > > > > > >> which may not work actually.
> > > > > > > >>
> > > > > > > >> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <
> > [hidden email]
> > > >
> > > > > > wrote:
> > > > > > > >>
> > > > > > > >>> +1, Thanks for the proposal.
> > > > > > > >>>
> > > > > > > >>> I guess this is a long-awaited change. This can vastly
> > increase
> > > > the
> > > > > > > >>> functionalities of the SQL Client as it will be possible to
> > use
> > > > > > complex
> > > > > > > >>> extensions like for example those provided by Apache
> > Bahir[1].
> > > > > > > >>>
> > > > > > > >>> Best Regards,
> > > > > > > >>> Dom.
> > > > > > > >>>
> > > > > > > >>> [1]
> > > > > > > >>> https://github.com/apache/bahir-flink
> > > > > > > >>>
> > > > > > > >>> sob., 3 lis 2018 o 17:17 Rong Rong <[hidden email]>
> > > > > napisał(a):
> > > > > > > >>>
> > > > > > > >>>> +1. Thanks for putting the proposal together Shuyi.
> > > > > > > >>>>
> > > > > > > >>>> DDL has been brought up in a couple of times previously
> > [1,2].
> > > > > > > >> Utilizing
> > > > > > > >>>> DDL will definitely be a great extension to the current
> > Flink
> > > > SQL
> > > > > to
> > > > > > > >>>> systematically support some of the previously brought up
> > > > features
> > > > > > such
> > > > > > > >> as
> > > > > > > >>>> [3]. And it will also be beneficial to see the document
> > > closely
> > > > > > > aligned
> > > > > > > >>>> with the previous discussion for unified SQL connector API
> > > [4].
> > > > > > > >>>>
> > > > > > > >>>> I also left a few comments on the doc. Looking forward to
> > the
> > > > > > > alignment
> > > > > > > >>>> with the other couple of efforts and contributing to them!
> > > > > > > >>>>
> > > > > > > >>>> Best,
> > > > > > > >>>> Rong
> > > > > > > >>>>
> > > > > > > >>>> [1]
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > > > > > >>>> [2]
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > > > > > > >>>> [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > > > > > >>>> [4]
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3C6676cb66-6f31-23e1-eff5-2e9c19f88483@...%3E
> > > > > > > >>>>
> > > > > > > >>>> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <
> > [hidden email]
> > > >
> > > > > > wrote:
> > > > > > > >>>>
> > > > > > > >>>>> Thanks Shuyi!
> > > > > > > >>>>>
> > > > > > > >>>>> I left some comments there. I think the design of SQL DDL
> > and
> > > > > > > >>> Flink-Hive
> > > > > > > >>>>> integration/External catalog enhancements will work
> closely
> > > > with
> > > > > > each
> > > > > > > >>>>> other. Hope we are well aligned on the directions of the
> > two
> > > > > > designs,
> > > > > > > >>>> and I
> > > > > > > >>>>> look forward to working with you guys on both!
> > > > > > > >>>>>
> > > > > > > >>>>> Bowen
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <
> > > [hidden email]
> > > > >
> > > > > > > >> wrote:
> > > > > > > >>>>>> Hi everyone,
> > > > > > > >>>>>>
> > > > > > > >>>>>> SQL DDL support has been a long-time ask from the
> > community.
> > > > > > > >> Current
> > > > > > > >>>>> Flink
> > > > > > > >>>>>> SQL support only DML (e.g. SELECT and INSERT
> statements).
> > In
> > > > its
> > > > > > > >>>> current
> > > > > > > >>>>>> form, Flink SQL users still need to define/create table
> > > > sources
> > > > > > and
> > > > > > > >>>> sinks
> > > > > > > >>>>>> programmatically in Java/Scala. Also, in SQL Client,
> > without
> > > > DDL
> > > > > > > >>>> support,
> > > > > > > >>>>>> the current implementation does not allow dynamical
> > creation
> > > > of
> > > > > > > >>> table,
> > > > > > > >>>>> type
> > > > > > > >>>>>> or functions with SQL, this adds friction for its
> > adoption.
> > > > > > > >>>>>>
> > > > > > > >>>>>> I drafted a design doc [1] with a few other community
> > > members
> > > > > that
> > > > > > > >>>>> proposes
> > > > > > > >>>>>> the design and implementation for adding DDL support in
> > > Flink.
> > > > > The
> > > > > > > >>>>> initial
> > > > > > > >>>>>> design considers DDL for table, view, type, library and
> > > > > function.
> > > > > > > >> It
> > > > > > > >>>> will
> > > > > > > >>>>>> be great to get feedback on the design from the
> community,
> > > and
> > > > > > > >> align
> > > > > > > >>>> with
> > > > > > > >>>>>> latest effort in unified SQL connector API [2] and Flink
> > > Hive
> > > > > > > >>>>> integration
> > > > > > > >>>>>> [3].
> > > > > > > >>>>>>
> > > > > > > >>>>>> Any feedback is highly appreciated.
> > > > > > > >>>>>>
> > > > > > > >>>>>> Thanks
> > > > > > > >>>>>> Shuyi Chen
> > > > > > > >>>>>>
> > > > > > > >>>>>> [1]
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > > > > > >>>>>> [2]
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > > > >>>>>> [3]
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > > > > > >>>>>> --
> > > > > > > >>>>>> "So you have to trust that the dots will somehow connect
> > in
> > > > your
> > > > > > > >>>> future."
> > > > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > "So you have to trust that the dots will somehow connect in your
> > > > future."
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>

Shaoxuan Wang

Re: [DISCUSS] Flink SQL DDL Design

In reply to this post by Shuyi Chen

Sure Shuyu,
What I hope is that we can reach an agreement on DDL gramma as soon as
possible. There are a few differences between your proposal and ours. Once
Lin and Jark propose our design, we can quickly discuss on the those
differences, and see how far away towards a unified design.

WRT the external catalog, I think it is an orthogonal topic, we can design
it in parallel. I believe @Xuefu, @Bowen are already working on. We
should/will definitely involve them to review the final design of DDL
implementation. I would suggest that we should give it a higher priority on
the DDL implementation, as it is a crucial component for the user
experience of SQL_CLI.

Regards,
Shaoxuan

On Thu, Nov 29, 2018 at 6:56 AM Shuyi Chen <[hidden email]> wrote:

123