Hi everyone,
I would like to start discussion about how to support interpreting external changelog into Flink SQL, and how to emit changelog from Flink SQL. This topic has already been mentioned several times in the past. CDC (Change Data Capture) data has been a very important streaming data in the world. Connect to CDC is a significant feature for Flink, it fills the missing piece for Flink's streaming processing. In FLIP-105, we propose 2 approaches to achieve. One is introducing new TableSource interface (higher priority), the other is introducing new SQL syntax to interpret and emit changelog. FLIP-105: https://docs.google.com/document/d/1onyIUUdWAHfr_Yd5nZOE7SOExBc6TiW5C4LiL5FrjtQ/edit# Thanks for any feedback! Best, Jark |
Hi everyone,
Since this FLIP was proposed, the community has discussed a lot about the first approach: introducing new TableSource and TableSink interfaces to support changelog. And yes, that is FLIP-95 which has been accepted last week. So most of the work has been merged into FLIP-95. In order to support the goal of FLIP-105, there is still a little things to discuss: how to connect external CDC formats. We propose to introduce 2 new formats: Debezium format and Canal format. They are the most popular CDC tools according to the survey in user [1] and user-zh [2] mailing list. I have updated the FLIP: https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL Welcome feedbacks! Best, Jark [1]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SURVEY-What-Change-Data-Capture-tools-are-you-using-td33569.html [2]: http://apache-flink.147419.n8.nabble.com/SURVEY-CDC-td1910.html On Fri, 14 Feb 2020 at 22:08, Jark Wu <[hidden email]> wrote: > Hi everyone, > > I would like to start discussion about how to support interpreting > external changelog into Flink SQL, and how to emit changelog from Flink SQL. > > This topic has already been mentioned several times in the past. CDC > (Change Data Capture) data has been a very important streaming data in the > world. Connect to CDC is a significant feature for Flink, it fills the > missing piece for Flink's streaming processing. > > In FLIP-105, we propose 2 approaches to achieve. > One is introducing new TableSource interface (higher priority), > the other is introducing new SQL syntax to interpret and emit changelog. > > FLIP-105: > https://docs.google.com/document/d/1onyIUUdWAHfr_Yd5nZOE7SOExBc6TiW5C4LiL5FrjtQ/edit# > > Thanks for any feedback! > > Best, > Jark > |
One minor comment, is there any other encoding or format in debezium? I'm
asking because the format name is debezium-json, i'm wondering whether debezium is enough. This also applies to canal. Best, Kurt On Tue, Apr 7, 2020 at 11:49 AM Jark Wu <[hidden email]> wrote: > Hi everyone, > > Since this FLIP was proposed, the community has discussed a lot about the > first approach: introducing new TableSource and TableSink interfaces to > support changelog. > And yes, that is FLIP-95 which has been accepted last week. So most of the > work has been merged into FLIP-95. > > In order to support the goal of FLIP-105, there is still a little things to > discuss: how to connect external CDC formats. > We propose to introduce 2 new formats: Debezium format and Canal format. > They are the most popular CDC tools according to the survey in user [1] and > user-zh [2] mailing list. > > I have updated the FLIP: > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL > > Welcome feedbacks! > > Best, > Jark > > [1]: > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SURVEY-What-Change-Data-Capture-tools-are-you-using-td33569.html > [2]: http://apache-flink.147419.n8.nabble.com/SURVEY-CDC-td1910.html > > > On Fri, 14 Feb 2020 at 22:08, Jark Wu <[hidden email]> wrote: > > > Hi everyone, > > > > I would like to start discussion about how to support interpreting > > external changelog into Flink SQL, and how to emit changelog from Flink > SQL. > > > > This topic has already been mentioned several times in the past. CDC > > (Change Data Capture) data has been a very important streaming data in > the > > world. Connect to CDC is a significant feature for Flink, it fills the > > missing piece for Flink's streaming processing. > > > > In FLIP-105, we propose 2 approaches to achieve. > > One is introducing new TableSource interface (higher priority), > > the other is introducing new SQL syntax to interpret and emit changelog. > > > > FLIP-105: > > > https://docs.google.com/document/d/1onyIUUdWAHfr_Yd5nZOE7SOExBc6TiW5C4LiL5FrjtQ/edit# > > > > Thanks for any feedback! > > > > Best, > > Jark > > > |
Hi Kurt,
The JSON encoding of Debezium can be configured to include or exclude the message schema using the `value.converter.schemas.enable` properties. That's why we propose to have a `format.schema-include` property key to config how to parse the json. Besides, the encoding format of debezium is stable and unified across different databases (MySQL, Oracle, SQL Server, DB2, PostgresSQL). However, because of the limitation of some special databases, some databases CDC encoding are different (Cassandra and MongoDB). If we want to support them in the future, we can introduce an optional property key, e.g. `format.encoding-connector=mongodb`, to recognize this special encoding. Canal currently only support to capture changes from MySQL, so there is only one encoding in Canal. But both Canal and Debezium may evolve their encoding in the future. We can also introduce a `format.encoding-version` in the future if needed. Best, Jark On Wed, 8 Apr 2020 at 14:26, Kurt Young <[hidden email]> wrote: > One minor comment, is there any other encoding or format in debezium? I'm > asking because the format > name is debezium-json, i'm wondering whether debezium is enough. This also > applies to canal. > > Best, > Kurt > > > On Tue, Apr 7, 2020 at 11:49 AM Jark Wu <[hidden email]> wrote: > > > Hi everyone, > > > > Since this FLIP was proposed, the community has discussed a lot about the > > first approach: introducing new TableSource and TableSink interfaces to > > support changelog. > > And yes, that is FLIP-95 which has been accepted last week. So most of > the > > work has been merged into FLIP-95. > > > > In order to support the goal of FLIP-105, there is still a little things > to > > discuss: how to connect external CDC formats. > > We propose to introduce 2 new formats: Debezium format and Canal format. > > They are the most popular CDC tools according to the survey in user [1] > and > > user-zh [2] mailing list. > > > > I have updated the FLIP: > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL > > > > Welcome feedbacks! > > > > Best, > > Jark > > > > [1]: > > > > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SURVEY-What-Change-Data-Capture-tools-are-you-using-td33569.html > > [2]: http://apache-flink.147419.n8.nabble.com/SURVEY-CDC-td1910.html > > > > > > On Fri, 14 Feb 2020 at 22:08, Jark Wu <[hidden email]> wrote: > > > > > Hi everyone, > > > > > > I would like to start discussion about how to support interpreting > > > external changelog into Flink SQL, and how to emit changelog from Flink > > SQL. > > > > > > This topic has already been mentioned several times in the past. CDC > > > (Change Data Capture) data has been a very important streaming data in > > the > > > world. Connect to CDC is a significant feature for Flink, it fills the > > > missing piece for Flink's streaming processing. > > > > > > In FLIP-105, we propose 2 approaches to achieve. > > > One is introducing new TableSource interface (higher priority), > > > the other is introducing new SQL syntax to interpret and emit > changelog. > > > > > > FLIP-105: > > > > > > https://docs.google.com/document/d/1onyIUUdWAHfr_Yd5nZOE7SOExBc6TiW5C4LiL5FrjtQ/edit# > > > > > > Thanks for any feedback! > > > > > > Best, > > > Jark > > > > > > |
Hi,
After a short offline discussion with Kurt, It seems that I misunderstood Kurt's meaning. Kurt meant: is `format=debezium` is enough, or split into two options `format=debezium` and `format.encoding=json`. Debezium not only support JSON encoding, but also Avro. Canal supports JSON and Protobuf. So a single `format=debezium` is not enough (in the long term). The reason I proposed a single option `format=debezium-json` instead of two: - It's simpler to write a single option instead of two, we also make this design decision for "connector" and "version". - I didn't find a good name for separate option keys, because JSON is also a format, not an encoding, but `format.format=json` is weird. Hi everyone, If there are no further concerns, I would like to start a voting thread by tomorrow. Best, Jark On Wed, 8 Apr 2020 at 15:37, Jark Wu <[hidden email]> wrote: > Hi Kurt, > > The JSON encoding of Debezium can be configured to include or exclude the > message schema using the `value.converter.schemas.enable` properties. > That's why we propose to have a `format.schema-include` property key to > config how to parse the json. > > Besides, the encoding format of debezium is stable and unified across > different databases (MySQL, Oracle, SQL Server, DB2, PostgresSQL). > However, because of the limitation of some special databases, some > databases CDC encoding are different (Cassandra and MongoDB). > If we want to support them in the future, we can introduce an optional > property key, e.g. `format.encoding-connector=mongodb`, to recognize this > special encoding. > > Canal currently only support to capture changes from MySQL, so there is > only one encoding in Canal. But both Canal and Debezium may evolve their > encoding in the future. > We can also introduce a `format.encoding-version` in the future if needed. > > Best, > Jark > > > On Wed, 8 Apr 2020 at 14:26, Kurt Young <[hidden email]> wrote: > >> One minor comment, is there any other encoding or format in debezium? I'm >> asking because the format >> name is debezium-json, i'm wondering whether debezium is enough. This also >> applies to canal. >> >> Best, >> Kurt >> >> >> On Tue, Apr 7, 2020 at 11:49 AM Jark Wu <[hidden email]> wrote: >> >> > Hi everyone, >> > >> > Since this FLIP was proposed, the community has discussed a lot about >> the >> > first approach: introducing new TableSource and TableSink interfaces to >> > support changelog. >> > And yes, that is FLIP-95 which has been accepted last week. So most of >> the >> > work has been merged into FLIP-95. >> > >> > In order to support the goal of FLIP-105, there is still a little >> things to >> > discuss: how to connect external CDC formats. >> > We propose to introduce 2 new formats: Debezium format and Canal format. >> > They are the most popular CDC tools according to the survey in user [1] >> and >> > user-zh [2] mailing list. >> > >> > I have updated the FLIP: >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL >> > >> > Welcome feedbacks! >> > >> > Best, >> > Jark >> > >> > [1]: >> > >> > >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SURVEY-What-Change-Data-Capture-tools-are-you-using-td33569.html >> > [2]: http://apache-flink.147419.n8.nabble.com/SURVEY-CDC-td1910.html >> > >> > >> > On Fri, 14 Feb 2020 at 22:08, Jark Wu <[hidden email]> wrote: >> > >> > > Hi everyone, >> > > >> > > I would like to start discussion about how to support interpreting >> > > external changelog into Flink SQL, and how to emit changelog from >> Flink >> > SQL. >> > > >> > > This topic has already been mentioned several times in the past. CDC >> > > (Change Data Capture) data has been a very important streaming data in >> > the >> > > world. Connect to CDC is a significant feature for Flink, it fills the >> > > missing piece for Flink's streaming processing. >> > > >> > > In FLIP-105, we propose 2 approaches to achieve. >> > > One is introducing new TableSource interface (higher priority), >> > > the other is introducing new SQL syntax to interpret and emit >> changelog. >> > > >> > > FLIP-105: >> > > >> > >> https://docs.google.com/document/d/1onyIUUdWAHfr_Yd5nZOE7SOExBc6TiW5C4LiL5FrjtQ/edit# >> > > >> > > Thanks for any feedback! >> > > >> > > Best, >> > > Jark >> > > >> > >> > |
Free forum by Nabble | Edit this page |