Hi all,
We have prepared a design doc [1] about source and sink concepts in Flink SQL. This is actually an extended discussion about SQL DDL [2]. In the design doc, we want to figure out some concept problems. For examples: 1. How to define boundedness in DDL 2. How to define a changelog in DDL, what's the behavior of a changelog source and changelog sink? 3. How to define primary key in DDL and what's the semantic when we have a primary key on a table and stream? They are mostly related to DDL because DDL is plain text and we need to keep close to standard as much as possible. This is an important step before we starting to refactor our TableSource/TableSink/TableFactory interfaces. Because we need to know what changes we need to introduce to support these concepts. Please feel free to leave feedbacks in the thread or the design doc. Regards, Jark [1]. https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit# [2]. http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html |
Thanks Jark for bringing this topic. I think proper concepts is very
important for users who are using Table API & SQL. Especially for them to have a clear understanding about the behavior of the SQL job. Also this is essential for connector developers to have a better understanding why we abstracted the interfaces in this way, and have a smooth experience when developing connectors for Table & SQL. Best, Kurt On Mon, May 27, 2019 at 3:35 PM Jark Wu <[hidden email]> wrote: > Hi all, > > We have prepared a design doc [1] about source and sink concepts in Flink > SQL. This is actually an extended discussion about SQL DDL [2]. > > In the design doc, we want to figure out some concept problems. For > examples: > > 1. How to define boundedness in DDL > 2. How to define a changelog in DDL, what's the behavior of a changelog > source and changelog sink? > 3. How to define primary key in DDL and what's the semantic when we have a > primary key on a table and stream? > > They are mostly related to DDL because DDL is plain text and we need to > keep close to standard as much as possible. > > This is an important step before we starting to refactor our > TableSource/TableSink/TableFactory interfaces. Because we need to know what > changes we need to introduce to support these concepts. > > Please feel free to leave feedbacks in the thread or the design doc. > > Regards, > Jark > > [1]. > > https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit# > [2]. > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html > |
Thanks for working on this great design document Jark. I think having
well-defined terminilogy and semantics around tables, changelogs, table sources/sinks, and DDL should have been done much earlier. I will take a closer look at the concepts and give feedback soon. I think having those concepts defined and implemented should be the goal for Flink 1.10. It also allows us to align it to the efforts of FLIP-27. Introducing a DDL is a step that cannot be evolved easily as a DDL is basically just a string that is being parsed. We should aim to involve as many people as possible to have a future-proof design. Thanks, Timo Am 27.05.19 um 10:40 schrieb Kurt Young: > Thanks Jark for bringing this topic. I think proper concepts is very > important for users who are using Table API & SQL. Especially for > them to have a clear understanding about the behavior of the SQL job. Also > this is essential for connector developers to have a better > understanding why we abstracted the interfaces in this way, and have a > smooth experience when developing connectors for Table & SQL. > > Best, > Kurt > > > On Mon, May 27, 2019 at 3:35 PM Jark Wu <[hidden email]> wrote: > >> Hi all, >> >> We have prepared a design doc [1] about source and sink concepts in Flink >> SQL. This is actually an extended discussion about SQL DDL [2]. >> >> In the design doc, we want to figure out some concept problems. For >> examples: >> >> 1. How to define boundedness in DDL >> 2. How to define a changelog in DDL, what's the behavior of a changelog >> source and changelog sink? >> 3. How to define primary key in DDL and what's the semantic when we have a >> primary key on a table and stream? >> >> They are mostly related to DDL because DDL is plain text and we need to >> keep close to standard as much as possible. >> >> This is an important step before we starting to refactor our >> TableSource/TableSink/TableFactory interfaces. Because we need to know what >> changes we need to introduce to support these concepts. >> >> Please feel free to leave feedbacks in the thread or the design doc. >> >> Regards, >> Jark >> >> [1]. >> >> https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit# >> [2]. >> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html >> |
Thanks Timo,
I think it's fine to target it for Flink 1.10. Looking forward for your feedback. On Mon, 24 Jun 2019 at 15:07, Timo Walther <[hidden email]> wrote: > Thanks for working on this great design document Jark. I think having > well-defined terminilogy and semantics around tables, changelogs, table > sources/sinks, and DDL should have been done much earlier. I will take a > closer look at the concepts and give feedback soon. I think having those > concepts defined and implemented should be the goal for Flink 1.10. It > also allows us to align it to the efforts of FLIP-27. > > Introducing a DDL is a step that cannot be evolved easily as a DDL is > basically just a string that is being parsed. We should aim to involve > as many people as possible to have a future-proof design. > > Thanks, > Timo > > Am 27.05.19 um 10:40 schrieb Kurt Young: > > Thanks Jark for bringing this topic. I think proper concepts is very > > important for users who are using Table API & SQL. Especially for > > them to have a clear understanding about the behavior of the SQL job. > Also > > this is essential for connector developers to have a better > > understanding why we abstracted the interfaces in this way, and have a > > smooth experience when developing connectors for Table & SQL. > > > > Best, > > Kurt > > > > > > On Mon, May 27, 2019 at 3:35 PM Jark Wu <[hidden email]> wrote: > > > >> Hi all, > >> > >> We have prepared a design doc [1] about source and sink concepts in > Flink > >> SQL. This is actually an extended discussion about SQL DDL [2]. > >> > >> In the design doc, we want to figure out some concept problems. For > >> examples: > >> > >> 1. How to define boundedness in DDL > >> 2. How to define a changelog in DDL, what's the behavior of a changelog > >> source and changelog sink? > >> 3. How to define primary key in DDL and what's the semantic when we > have a > >> primary key on a table and stream? > >> > >> They are mostly related to DDL because DDL is plain text and we need to > >> keep close to standard as much as possible. > >> > >> This is an important step before we starting to refactor our > >> TableSource/TableSink/TableFactory interfaces. Because we need to know > what > >> changes we need to introduce to support these concepts. > >> > >> Please feel free to leave feedbacks in the thread or the design doc. > >> > >> Regards, > >> Jark > >> > >> [1]. > >> > >> > https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit# > >> [2]. > >> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html > >> > > |
Hi Jark,
Impressive document! I have gone over the document quickly and left some comments. I will have a detailed look later. Below are two main thoughts from my side: 1. In the TableSource interface, can we move the getBoundedness() method into the underneath Source? This brings some benefits like we don't have to add `boundedSource()` to the env in FLIP-27 and it can also be used in the Table API level. We may also need to target FLIP-27 for the Flink 1.10 and coordinate these two big design. 2. How are we going to address the compatible problem? Are we going to add a totally new TableSource class or made some compatible design? Maybe a new TableSource class is better? as we change the interface somehow big. What do you think? Best, Hequn On Mon, Jun 24, 2019 at 3:29 PM Jark Wu <[hidden email]> wrote: > Thanks Timo, > > I think it's fine to target it for Flink 1.10. Looking forward for your > feedback. > > On Mon, 24 Jun 2019 at 15:07, Timo Walther <[hidden email]> wrote: > > > Thanks for working on this great design document Jark. I think having > > well-defined terminilogy and semantics around tables, changelogs, table > > sources/sinks, and DDL should have been done much earlier. I will take a > > closer look at the concepts and give feedback soon. I think having those > > concepts defined and implemented should be the goal for Flink 1.10. It > > also allows us to align it to the efforts of FLIP-27. > > > > Introducing a DDL is a step that cannot be evolved easily as a DDL is > > basically just a string that is being parsed. We should aim to involve > > as many people as possible to have a future-proof design. > > > > Thanks, > > Timo > > > > Am 27.05.19 um 10:40 schrieb Kurt Young: > > > Thanks Jark for bringing this topic. I think proper concepts is very > > > important for users who are using Table API & SQL. Especially for > > > them to have a clear understanding about the behavior of the SQL job. > > Also > > > this is essential for connector developers to have a better > > > understanding why we abstracted the interfaces in this way, and have a > > > smooth experience when developing connectors for Table & SQL. > > > > > > Best, > > > Kurt > > > > > > > > > On Mon, May 27, 2019 at 3:35 PM Jark Wu <[hidden email]> wrote: > > > > > >> Hi all, > > >> > > >> We have prepared a design doc [1] about source and sink concepts in > > Flink > > >> SQL. This is actually an extended discussion about SQL DDL [2]. > > >> > > >> In the design doc, we want to figure out some concept problems. For > > >> examples: > > >> > > >> 1. How to define boundedness in DDL > > >> 2. How to define a changelog in DDL, what's the behavior of a > changelog > > >> source and changelog sink? > > >> 3. How to define primary key in DDL and what's the semantic when we > > have a > > >> primary key on a table and stream? > > >> > > >> They are mostly related to DDL because DDL is plain text and we need > to > > >> keep close to standard as much as possible. > > >> > > >> This is an important step before we starting to refactor our > > >> TableSource/TableSink/TableFactory interfaces. Because we need to know > > what > > >> changes we need to introduce to support these concepts. > > >> > > >> Please feel free to leave feedbacks in the thread or the design doc. > > >> > > >> Regards, > > >> Jark > > >> > > >> [1]. > > >> > > >> > > > https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit# > > >> [2]. > > >> > > >> > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html > > >> > > > > > |
Free forum by Nabble | Edit this page |