[DISCUSS] Ground Source and Sink Concepts in Flink SQL

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Ground Source and Sink Concepts in Flink SQL

Jark Wu-2
Hi all,

We have prepared a design doc [1] about source and sink concepts in Flink
SQL. This is actually an extended discussion about SQL DDL [2].

In the design doc, we want to figure out some concept problems. For
examples:

1. How to define boundedness in DDL
2. How to define a changelog in DDL, what's the behavior of a changelog
source and changelog sink?
3. How to define primary key in DDL and what's the semantic when we have a
primary key on a table and stream?

They are mostly related to DDL because DDL is plain text and we need to
keep close to standard as much as possible.

This is an important step before we starting to refactor our
TableSource/TableSink/TableFactory interfaces. Because we need to know what
changes we need to introduce to support these concepts.

Please feel free to leave feedbacks in the thread or the design doc.

Regards,
Jark

[1].
https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit#
[2].
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Ground Source and Sink Concepts in Flink SQL

Kurt Young
Thanks Jark for bringing this topic. I think proper concepts is very
important for users who are using Table API & SQL. Especially for
them to have a clear understanding about the behavior of the SQL job. Also
this is essential for connector developers to have a better
understanding why we abstracted the interfaces in this way, and have a
smooth experience when developing connectors for Table & SQL.

Best,
Kurt


On Mon, May 27, 2019 at 3:35 PM Jark Wu <[hidden email]> wrote:

> Hi all,
>
> We have prepared a design doc [1] about source and sink concepts in Flink
> SQL. This is actually an extended discussion about SQL DDL [2].
>
> In the design doc, we want to figure out some concept problems. For
> examples:
>
> 1. How to define boundedness in DDL
> 2. How to define a changelog in DDL, what's the behavior of a changelog
> source and changelog sink?
> 3. How to define primary key in DDL and what's the semantic when we have a
> primary key on a table and stream?
>
> They are mostly related to DDL because DDL is plain text and we need to
> keep close to standard as much as possible.
>
> This is an important step before we starting to refactor our
> TableSource/TableSink/TableFactory interfaces. Because we need to know what
> changes we need to introduce to support these concepts.
>
> Please feel free to leave feedbacks in the thread or the design doc.
>
> Regards,
> Jark
>
> [1].
>
> https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit#
> [2].
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Ground Source and Sink Concepts in Flink SQL

Timo Walther-2
Thanks for working on this great design document Jark. I think having
well-defined terminilogy and semantics around tables, changelogs, table
sources/sinks, and DDL should have been done much earlier. I will take a
closer look at the concepts and give feedback soon. I think having those
concepts defined and implemented should be the goal for Flink 1.10. It
also allows us to align it to the efforts of FLIP-27.

Introducing a DDL is a step that cannot be evolved easily as a DDL is
basically just a string that is being parsed. We should aim to involve
as many people as possible to have a future-proof design.

Thanks,
Timo

Am 27.05.19 um 10:40 schrieb Kurt Young:

> Thanks Jark for bringing this topic. I think proper concepts is very
> important for users who are using Table API & SQL. Especially for
> them to have a clear understanding about the behavior of the SQL job. Also
> this is essential for connector developers to have a better
> understanding why we abstracted the interfaces in this way, and have a
> smooth experience when developing connectors for Table & SQL.
>
> Best,
> Kurt
>
>
> On Mon, May 27, 2019 at 3:35 PM Jark Wu <[hidden email]> wrote:
>
>> Hi all,
>>
>> We have prepared a design doc [1] about source and sink concepts in Flink
>> SQL. This is actually an extended discussion about SQL DDL [2].
>>
>> In the design doc, we want to figure out some concept problems. For
>> examples:
>>
>> 1. How to define boundedness in DDL
>> 2. How to define a changelog in DDL, what's the behavior of a changelog
>> source and changelog sink?
>> 3. How to define primary key in DDL and what's the semantic when we have a
>> primary key on a table and stream?
>>
>> They are mostly related to DDL because DDL is plain text and we need to
>> keep close to standard as much as possible.
>>
>> This is an important step before we starting to refactor our
>> TableSource/TableSink/TableFactory interfaces. Because we need to know what
>> changes we need to introduce to support these concepts.
>>
>> Please feel free to leave feedbacks in the thread or the design doc.
>>
>> Regards,
>> Jark
>>
>> [1].
>>
>> https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit#
>> [2].
>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Ground Source and Sink Concepts in Flink SQL

Jark Wu-2
Thanks Timo,

I think it's fine to target it for Flink 1.10.  Looking forward for your
feedback.

On Mon, 24 Jun 2019 at 15:07, Timo Walther <[hidden email]> wrote:

> Thanks for working on this great design document Jark. I think having
> well-defined terminilogy and semantics around tables, changelogs, table
> sources/sinks, and DDL should have been done much earlier. I will take a
> closer look at the concepts and give feedback soon. I think having those
> concepts defined and implemented should be the goal for Flink 1.10. It
> also allows us to align it to the efforts of FLIP-27.
>
> Introducing a DDL is a step that cannot be evolved easily as a DDL is
> basically just a string that is being parsed. We should aim to involve
> as many people as possible to have a future-proof design.
>
> Thanks,
> Timo
>
> Am 27.05.19 um 10:40 schrieb Kurt Young:
> > Thanks Jark for bringing this topic. I think proper concepts is very
> > important for users who are using Table API & SQL. Especially for
> > them to have a clear understanding about the behavior of the SQL job.
> Also
> > this is essential for connector developers to have a better
> > understanding why we abstracted the interfaces in this way, and have a
> > smooth experience when developing connectors for Table & SQL.
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, May 27, 2019 at 3:35 PM Jark Wu <[hidden email]> wrote:
> >
> >> Hi all,
> >>
> >> We have prepared a design doc [1] about source and sink concepts in
> Flink
> >> SQL. This is actually an extended discussion about SQL DDL [2].
> >>
> >> In the design doc, we want to figure out some concept problems. For
> >> examples:
> >>
> >> 1. How to define boundedness in DDL
> >> 2. How to define a changelog in DDL, what's the behavior of a changelog
> >> source and changelog sink?
> >> 3. How to define primary key in DDL and what's the semantic when we
> have a
> >> primary key on a table and stream?
> >>
> >> They are mostly related to DDL because DDL is plain text and we need to
> >> keep close to standard as much as possible.
> >>
> >> This is an important step before we starting to refactor our
> >> TableSource/TableSink/TableFactory interfaces. Because we need to know
> what
> >> changes we need to introduce to support these concepts.
> >>
> >> Please feel free to leave feedbacks in the thread or the design doc.
> >>
> >> Regards,
> >> Jark
> >>
> >> [1].
> >>
> >>
> https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit#
> >> [2].
> >>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Ground Source and Sink Concepts in Flink SQL

Hequn Cheng
Hi Jark,

Impressive document!
I have gone over the document quickly and left some comments. I will have a
detailed look later. Below are two main thoughts from my side:

1. In the TableSource interface, can we move the getBoundedness() method
into the underneath Source?
This brings some benefits like we don't have to add `boundedSource()` to
the env in FLIP-27 and it can also be used in the Table API level. We may
also need to target FLIP-27 for the Flink 1.10 and coordinate these two big
design.

2. How are we going to address the compatible problem?
Are we going to add a totally new TableSource class or made some compatible
design? Maybe a new TableSource class is better? as we change the interface
somehow big.

What do you think?

Best, Hequn


On Mon, Jun 24, 2019 at 3:29 PM Jark Wu <[hidden email]> wrote:

> Thanks Timo,
>
> I think it's fine to target it for Flink 1.10.  Looking forward for your
> feedback.
>
> On Mon, 24 Jun 2019 at 15:07, Timo Walther <[hidden email]> wrote:
>
> > Thanks for working on this great design document Jark. I think having
> > well-defined terminilogy and semantics around tables, changelogs, table
> > sources/sinks, and DDL should have been done much earlier. I will take a
> > closer look at the concepts and give feedback soon. I think having those
> > concepts defined and implemented should be the goal for Flink 1.10. It
> > also allows us to align it to the efforts of FLIP-27.
> >
> > Introducing a DDL is a step that cannot be evolved easily as a DDL is
> > basically just a string that is being parsed. We should aim to involve
> > as many people as possible to have a future-proof design.
> >
> > Thanks,
> > Timo
> >
> > Am 27.05.19 um 10:40 schrieb Kurt Young:
> > > Thanks Jark for bringing this topic. I think proper concepts is very
> > > important for users who are using Table API & SQL. Especially for
> > > them to have a clear understanding about the behavior of the SQL job.
> > Also
> > > this is essential for connector developers to have a better
> > > understanding why we abstracted the interfaces in this way, and have a
> > > smooth experience when developing connectors for Table & SQL.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Mon, May 27, 2019 at 3:35 PM Jark Wu <[hidden email]> wrote:
> > >
> > >> Hi all,
> > >>
> > >> We have prepared a design doc [1] about source and sink concepts in
> > Flink
> > >> SQL. This is actually an extended discussion about SQL DDL [2].
> > >>
> > >> In the design doc, we want to figure out some concept problems. For
> > >> examples:
> > >>
> > >> 1. How to define boundedness in DDL
> > >> 2. How to define a changelog in DDL, what's the behavior of a
> changelog
> > >> source and changelog sink?
> > >> 3. How to define primary key in DDL and what's the semantic when we
> > have a
> > >> primary key on a table and stream?
> > >>
> > >> They are mostly related to DDL because DDL is plain text and we need
> to
> > >> keep close to standard as much as possible.
> > >>
> > >> This is an important step before we starting to refactor our
> > >> TableSource/TableSink/TableFactory interfaces. Because we need to know
> > what
> > >> changes we need to introduce to support these concepts.
> > >>
> > >> Please feel free to leave feedbacks in the thread or the design doc.
> > >>
> > >> Regards,
> > >> Jark
> > >>
> > >> [1].
> > >>
> > >>
> >
> https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit#
> > >> [2].
> > >>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html
> > >>
> >
> >
>