I hadn’t heard a response on this, so I’m going to expand this to the dev email list.
If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR. Regards, Dylan Forciea Oseberg From: Dylan Forciea <[hidden email]> Date: Thursday, October 1, 2020 at 5:14 PM To: "[hidden email]" <[hidden email]> Subject: autoCommit for postgres jdbc streaming in Table/SQL API Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API. I see that the API has the scan.fetch-size option, but not scan.auto-commit per https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit option, and I was then able to immediately start streaming and cut my memory usage way down. I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API. Am I missing something on how to pull this off? Regards, Dylan Forciea Oseberg |
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.
Thanks, Dylan From: Shengkai Fang <[hidden email]> Date: Wednesday, October 7, 2020 at 9:06 AM To: Dylan Forciea <[hidden email]> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Sorry for late response. +1 to support it. I will open a jira about it later. Dylan Forciea <[hidden email]<mailto:[hidden email]>>于2020年10月7日 周三下午9:53写道: I hadn’t heard a response on this, so I’m going to expand this to the dev email list. If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR. Regards, Dylan Forciea Oseberg From: Dylan Forciea <[hidden email]<mailto:[hidden email]>> Date: Thursday, October 1, 2020 at 5:14 PM To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>> Subject: autoCommit for postgres jdbc streaming in Table/SQL API Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API. I see that the API has the scan.fetch-size option, but not scan.auto-commit per https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit option, and I was then able to immediately start streaming and cut my memory usage way down. I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API. Am I missing something on how to pull this off? Regards, Dylan Forciea Oseberg |
Hi Dylan,
thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken. What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal. [1] https://issues.apache.org/jira/browse/FLINK-12198 Cheers, Till On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote: > I appreciate it! Let me know if you want me to submit a PR against the > issue after it is created. It wasn’t a huge amount of code, so it’s > probably not a big deal if you wanted to redo it. > > Thanks, > Dylan > > From: Shengkai Fang <[hidden email]> > Date: Wednesday, October 7, 2020 at 9:06 AM > To: Dylan Forciea <[hidden email]> > Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API > > Sorry for late response. +1 to support it. I will open a jira about it > later. > > Dylan Forciea <[hidden email]<mailto:[hidden email]>>于2020年10月7日 > 周三下午9:53写道: > > > > > > > > > > > > > > I hadn’t heard a response on this, so I’m going to expand this to the dev > email list. > > > > If this is indeed an issue and not my misunderstanding, I have most of a > patch already coded up. Please let me know, and I can create a JIRA issue > and send out a PR. > > > > Regards, > > Dylan Forciea > > Oseberg > > > > > From: Dylan Forciea <[hidden email]<mailto:[hidden email]>> > > > Date: Thursday, October 1, 2020 at 5:14 PM > > > To: "[hidden email]<mailto:[hidden email]>" < > [hidden email]<mailto:[hidden email]>> > > > Subject: autoCommit for postgres jdbc streaming in Table/SQL API > > > > > > > Hi! I’ve just recently started evaluating Flink for our ETL needs, and I > ran across an issue with streaming postgres data via the Table/SQL API. > > > > I see that the API has the scan.fetch-size option, but not > scan.auto-commit per > > > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html > . I had attempted to load a large table in, but it completely slurped it > into memory before starting the streaming. I modified the flink source code > to add a scan.auto-commit > > option, and I was then able to immediately start streaming and cut my > memory usage way down. > > > > I see in this thread that there was a similar issue resolved for > JDBCInputFormat in this thread: > > > > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html > , but I don’t see a way to utilize that in the Table/SQL API. > > > > Am I missing something on how to pull this off? > > > > Regards, > > Dylan Forciea > > Oseberg > > > > > > |
Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.
Thanks, Dylan From: Till Rohrmann <[hidden email]> Date: Wednesday, October 7, 2020 at 10:55 AM To: dev <[hidden email]> Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Hi Dylan, thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken. What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal. [1] https://issues.apache.org/jira/browse/FLINK-12198 Cheers, Till On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]<mailto:[hidden email]>> wrote: I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it. Thanks, Dylan From: Shengkai Fang <[hidden email]<mailto:[hidden email]>> Date: Wednesday, October 7, 2020 at 9:06 AM To: Dylan Forciea <[hidden email]<mailto:[hidden email]>> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Sorry for late response. +1 to support it. I will open a jira about it later. Dylan Forciea <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>于2020年10月7日 周三下午9:53写道: I hadn’t heard a response on this, so I’m going to expand this to the dev email list. If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR. Regards, Dylan Forciea Oseberg From: Dylan Forciea <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> Date: Thursday, October 1, 2020 at 5:14 PM To: "[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>" <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> Subject: autoCommit for postgres jdbc streaming in Table/SQL API Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API. I see that the API has the scan.fetch-size option, but not scan.auto-commit per https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit option, and I was then able to immediately start streaming and cut my memory usage way down. I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API. Am I missing something on how to pull this off? Regards, Dylan Forciea Oseberg |
Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.
Dylan From: Dylan Forciea <[hidden email]> Date: Wednesday, October 7, 2020 at 11:47 AM To: Till Rohrmann <[hidden email]>, dev <[hidden email]> Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR. Thanks, Dylan From: Till Rohrmann <[hidden email]> Date: Wednesday, October 7, 2020 at 10:55 AM To: dev <[hidden email]> Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Hi Dylan, thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken. What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal. [1] https://issues.apache.org/jira/browse/FLINK-12198 Cheers, Till On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]<mailto:[hidden email]>> wrote: I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it. Thanks, Dylan From: Shengkai Fang <[hidden email]<mailto:[hidden email]>> Date: Wednesday, October 7, 2020 at 9:06 AM To: Dylan Forciea <[hidden email]<mailto:[hidden email]>> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Sorry for late response. +1 to support it. I will open a jira about it later. Dylan Forciea <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>于2020年10月7日 周三下午9:53写道: I hadn’t heard a response on this, so I’m going to expand this to the dev email list. If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR. Regards, Dylan Forciea Oseberg From: Dylan Forciea <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> Date: Thursday, October 1, 2020 at 5:14 PM To: "[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>" <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> Subject: autoCommit for postgres jdbc streaming in Table/SQL API Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API. I see that the API has the scan.fetch-size option, but not scan.auto-commit per https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit option, and I was then able to immediately start streaming and cut my memory usage way down. I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API. Am I missing something on how to pull this off? Regards, Dylan Forciea Oseberg |
This sounds good. Maybe there are others in the community who can help with
the review before the Jark and Leonard are back. Cheers, Till On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <[hidden email]> wrote: > Actually…. It looks like what I did covers both cases. I’ll see about > getting some unit tests and documentation updated. > > > > Dylan > > > > *From: *Dylan Forciea <[hidden email]> > *Date: *Wednesday, October 7, 2020 at 11:47 AM > *To: *Till Rohrmann <[hidden email]>, dev <[hidden email]> > *Cc: *Shengkai Fang <[hidden email]>, "[hidden email]" < > [hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu < > [hidden email]> > *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API > > > > Ok, I have created FLINK-19522 describing the issue. I have the code I > made so far checked in at > https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but > this only fixes the SQL API. It sounds like there may be another change > needed for the Table API… I’ll look into that and see if I can figure it > out on my own while they’re out. I will also need to add some unit tests > and update some documentation to get this ready for a PR. > > > > Thanks, > > Dylan > > > > *From: *Till Rohrmann <[hidden email]> > *Date: *Wednesday, October 7, 2020 at 10:55 AM > *To: *dev <[hidden email]> > *Cc: *Shengkai Fang <[hidden email]>, "[hidden email]" < > [hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu < > [hidden email]> > *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API > > > > Hi Dylan, > > > > thanks for reaching out to the Flink community and excuse our late > response. I am not an expert for the Table API and its JDBC connector but > what you describe sounds like a missing feature. Also given that > FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we > might simply need to make it configurable from the JdbcTableSource. I am > pulling in Jark and Leonard who worked on the JdbcTableSource and might > help you to get this feature into Flink. Their response could take a week > because they are currently on vacation if I am not mistaken. > > > > What you could already do is to open an issue linking FLINK-12198 and > describing the problem and your solution proposal. > > > > [1] https://issues.apache.org/jira/browse/FLINK-12198 > > > > Cheers, > > Till > > > > On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote: > > I appreciate it! Let me know if you want me to submit a PR against the > issue after it is created. It wasn’t a huge amount of code, so it’s > probably not a big deal if you wanted to redo it. > > Thanks, > Dylan > > From: Shengkai Fang <[hidden email]> > Date: Wednesday, October 7, 2020 at 9:06 AM > To: Dylan Forciea <[hidden email]> > Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API > > Sorry for late response. +1 to support it. I will open a jira about it > later. > > Dylan Forciea <[hidden email]<mailto:[hidden email]>>于2020年10月7日 周三下午 > 9:53写道: > > > > > > > > > > > > > > I hadn’t heard a response on this, so I’m going to expand this to the dev > email list. > > > > If this is indeed an issue and not my misunderstanding, I have most of a > patch already coded up. Please let me know, and I can create a JIRA issue > and send out a PR. > > > > Regards, > > Dylan Forciea > > Oseberg > > > > > From: Dylan Forciea <[hidden email]<mailto:[hidden email]>> > > > Date: Thursday, October 1, 2020 at 5:14 PM > > > To: "[hidden email]<mailto:[hidden email]>" < > [hidden email]<mailto:[hidden email]>> > > > Subject: autoCommit for postgres jdbc streaming in Table/SQL API > > > > > > > Hi! I’ve just recently started evaluating Flink for our ETL needs, and I > ran across an issue with streaming postgres data via the Table/SQL API. > > > > I see that the API has the scan.fetch-size option, but not > scan.auto-commit per > > > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html > . I had attempted to load a large table in, but it completely slurped it > into memory before starting the streaming. I modified the flink source code > to add a scan.auto-commit > > option, and I was then able to immediately start streaming and cut my > memory usage way down. > > > > I see in this thread that there was a similar issue resolved for > JDBCInputFormat in this thread: > > > > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html > , but I don’t see a way to utilize that in the Table/SQL API. > > > > Am I missing something on how to pull this off? > > > > Regards, > > Dylan Forciea > > Oseberg > > > > > > |
I’ve updated the unit tests and documentation, and I was running the azure test pipeline as described in the instructions. However, it appears that what seems to be an unrelated test for the JMX code failed. Is this a matter of me not setting things up correctly? I wanted to ensure everything looked good before I submitted the PR.
[ERROR] Failures: [ERROR] JMXReporterFactoryTest.testPortRangeArgument:46 Expected: (a value equal to or greater than <9000> and a value less than or equal to <9010>) but: a value less than or equal to <9010> <9040> was greater than <9010> [ERROR] JMXReporterFactoryTest.testWithoutArgument:60 [INFO] [ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0 Thanks, Dylan Forciea From: Till Rohrmann <[hidden email]> Date: Thursday, October 8, 2020 at 2:29 AM To: Dylan Forciea <[hidden email]> Cc: dev <[hidden email]>, Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API This sounds good. Maybe there are others in the community who can help with the review before the Jark and Leonard are back. Cheers, Till On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <[hidden email]<mailto:[hidden email]>> wrote: Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated. Dylan From: Dylan Forciea <[hidden email]<mailto:[hidden email]>> Date: Wednesday, October 7, 2020 at 11:47 AM To: Till Rohrmann <[hidden email]<mailto:[hidden email]>>, dev <[hidden email]<mailto:[hidden email]>> Cc: Shengkai Fang <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, Leonard Xu <[hidden email]<mailto:[hidden email]>> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR. Thanks, Dylan From: Till Rohrmann <[hidden email]<mailto:[hidden email]>> Date: Wednesday, October 7, 2020 at 10:55 AM To: dev <[hidden email]<mailto:[hidden email]>> Cc: Shengkai Fang <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, Leonard Xu <[hidden email]<mailto:[hidden email]>> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Hi Dylan, thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken. What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal. [1] https://issues.apache.org/jira/browse/FLINK-12198 Cheers, Till On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]<mailto:[hidden email]>> wrote: I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it. Thanks, Dylan From: Shengkai Fang <[hidden email]<mailto:[hidden email]>> Date: Wednesday, October 7, 2020 at 9:06 AM To: Dylan Forciea <[hidden email]<mailto:[hidden email]>> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Sorry for late response. +1 to support it. I will open a jira about it later. Dylan Forciea <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>于2020年10月7日 周三下午9:53写道: I hadn’t heard a response on this, so I’m going to expand this to the dev email list. If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR. Regards, Dylan Forciea Oseberg From: Dylan Forciea <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> Date: Thursday, October 1, 2020 at 5:14 PM To: "[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>" <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> Subject: autoCommit for postgres jdbc streaming in Table/SQL API Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API. I see that the API has the scan.fetch-size option, but not scan.auto-commit per https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit option, and I was then able to immediately start streaming and cut my memory usage way down. I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API. Am I missing something on how to pull this off? Regards, Dylan Forciea Oseberg |
Hi Dylan,
Sorry for the late reply. We've just come back from a long holiday. Thanks for reporting this problem. First, I think this is a bug that `autoCommit` is false by default (JdbcRowDataInputFormat.Builder). We can fix the default to true in 1.11 series, and I think this can solve your problem in a short term? Besides, we should expose the connector options to set auto commit and this can be another issue to be implemented in master. I'm glad to review the code. What do you think? Regarding to the failed JMXReporterFactoryTest, I think this is a known issue, see FLINK-19539 [1] Best, Jark [1]: https://issues.apache.org/jira/browse/FLINK-19539 On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <[hidden email]> wrote: > I’ve updated the unit tests and documentation, and I was running the azure > test pipeline as described in the instructions. However, it appears that > what seems to be an unrelated test for the JMX code failed. Is this a > matter of me not setting things up correctly? I wanted to ensure everything > looked good before I submitted the PR. > > > > [ERROR] Failures: > > [ERROR] JMXReporterFactoryTest.testPortRangeArgument:46 > > Expected: (a value equal to or greater than <9000> and a value less than > or equal to <9010>) > > but: a value less than or equal to <9010> <9040> was greater than > <9010> > > [ERROR] JMXReporterFactoryTest.testWithoutArgument:60 > > [INFO] > > [ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0 > > > > Thanks, > > Dylan Forciea > > > > *From: *Till Rohrmann <[hidden email]> > *Date: *Thursday, October 8, 2020 at 2:29 AM > *To: *Dylan Forciea <[hidden email]> > *Cc: *dev <[hidden email]>, Shengkai Fang <[hidden email]>, " > [hidden email]" <[hidden email]>, "[hidden email]" < > [hidden email]>, Leonard Xu <[hidden email]> > *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API > > > > This sounds good. Maybe there are others in the community who can help > with the review before the Jark and Leonard are back. > > > > Cheers, > > Till > > > > On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <[hidden email]> wrote: > > Actually…. It looks like what I did covers both cases. I’ll see about > getting some unit tests and documentation updated. > > > > Dylan > > > > *From: *Dylan Forciea <[hidden email]> > *Date: *Wednesday, October 7, 2020 at 11:47 AM > *To: *Till Rohrmann <[hidden email]>, dev <[hidden email]> > *Cc: *Shengkai Fang <[hidden email]>, "[hidden email]" < > [hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu < > [hidden email]> > *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API > > > > Ok, I have created FLINK-19522 describing the issue. I have the code I > made so far checked in at > https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but > this only fixes the SQL API. It sounds like there may be another change > needed for the Table API… I’ll look into that and see if I can figure it > out on my own while they’re out. I will also need to add some unit tests > and update some documentation to get this ready for a PR. > > > > Thanks, > > Dylan > > > > *From: *Till Rohrmann <[hidden email]> > *Date: *Wednesday, October 7, 2020 at 10:55 AM > *To: *dev <[hidden email]> > *Cc: *Shengkai Fang <[hidden email]>, "[hidden email]" < > [hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu < > [hidden email]> > *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API > > > > Hi Dylan, > > > > thanks for reaching out to the Flink community and excuse our late > response. I am not an expert for the Table API and its JDBC connector but > what you describe sounds like a missing feature. Also given that > FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we > might simply need to make it configurable from the JdbcTableSource. I am > pulling in Jark and Leonard who worked on the JdbcTableSource and might > help you to get this feature into Flink. Their response could take a week > because they are currently on vacation if I am not mistaken. > > > > What you could already do is to open an issue linking FLINK-12198 and > describing the problem and your solution proposal. > > > > [1] https://issues.apache.org/jira/browse/FLINK-12198 > > > > Cheers, > > Till > > > > On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote: > > I appreciate it! Let me know if you want me to submit a PR against the > issue after it is created. It wasn’t a huge amount of code, so it’s > probably not a big deal if you wanted to redo it. > > Thanks, > Dylan > > From: Shengkai Fang <[hidden email]> > Date: Wednesday, October 7, 2020 at 9:06 AM > To: Dylan Forciea <[hidden email]> > Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API > > Sorry for late response. +1 to support it. I will open a jira about it > later. > > Dylan Forciea <[hidden email]<mailto:[hidden email]>>于2020年10月7日 周三下午 > 9:53写道: > > > > > > > > > > > > > > I hadn’t heard a response on this, so I’m going to expand this to the dev > email list. > > > > If this is indeed an issue and not my misunderstanding, I have most of a > patch already coded up. Please let me know, and I can create a JIRA issue > and send out a PR. > > > > Regards, > > Dylan Forciea > > Oseberg > > > > > From: Dylan Forciea <[hidden email]<mailto:[hidden email]>> > > > Date: Thursday, October 1, 2020 at 5:14 PM > > > To: "[hidden email]<mailto:[hidden email]>" < > [hidden email]<mailto:[hidden email]>> > > > Subject: autoCommit for postgres jdbc streaming in Table/SQL API > > > > > > > Hi! I’ve just recently started evaluating Flink for our ETL needs, and I > ran across an issue with streaming postgres data via the Table/SQL API. > > > > I see that the API has the scan.fetch-size option, but not > scan.auto-commit per > > > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html > . I had attempted to load a large table in, but it completely slurped it > into memory before starting the streaming. I modified the flink source code > to add a scan.auto-commit > > option, and I was then able to immediately start streaming and cut my > memory usage way down. > > > > I see in this thread that there was a similar issue resolved for > JDBCInputFormat in this thread: > > > > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html > , but I don’t see a way to utilize that in the Table/SQL API. > > > > Am I missing something on how to pull this off? > > > > Regards, > > Dylan Forciea > > Oseberg > > > > > |
Jark,
Thank you! I had actually mistyped the JIRA issue; autoCommit needs to be set to false for streaming to work. The default on the driver is true when the option isn’t specified. I’ve updated the issue accordingly. Setting this to false automatically on the read path would fix my issue. However, I’m only certain that this is proper for Postgres. I’m not sure if this should be done for other drivers, although my gut would say it should be fine if it’s only done for reading. My patch as it is will set the builder to not specify whether to set autoCommit if the option is not specified, which means it would then be left at the default of true. That would conflict with the 1.11 patch you suggested. Let me know if you think I should make the default true in the SQL API. https://github.com/apache/flink/pull/13570 Regards, Dylan From: Jark Wu <[hidden email]> Date: Thursday, October 8, 2020 at 10:15 PM To: Dylan Forciea <[hidden email]> Cc: Till Rohrmann <[hidden email]>, dev <[hidden email]>, Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Hi Dylan, Sorry for the late reply. We've just come back from a long holiday. Thanks for reporting this problem. First, I think this is a bug that `autoCommit` is false by default (JdbcRowDataInputFormat.Builder). We can fix the default to true in 1.11 series, and I think this can solve your problem in a short term? Besides, we should expose the connector options to set auto commit and this can be another issue to be implemented in master. I'm glad to review the code. What do you think? Regarding to the failed JMXReporterFactoryTest, I think this is a known issue, see FLINK-19539 [1] Best, Jark [1]: https://issues.apache.org/jira/browse/FLINK-19539 On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <[hidden email]<mailto:[hidden email]>> wrote: I’ve updated the unit tests and documentation, and I was running the azure test pipeline as described in the instructions. However, it appears that what seems to be an unrelated test for the JMX code failed. Is this a matter of me not setting things up correctly? I wanted to ensure everything looked good before I submitted the PR. [ERROR] Failures: [ERROR] JMXReporterFactoryTest.testPortRangeArgument:46 Expected: (a value equal to or greater than <9000> and a value less than or equal to <9010>) but: a value less than or equal to <9010> <9040> was greater than <9010> [ERROR] JMXReporterFactoryTest.testWithoutArgument:60 [INFO] [ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0 Thanks, Dylan Forciea From: Till Rohrmann <[hidden email]<mailto:[hidden email]>> Date: Thursday, October 8, 2020 at 2:29 AM To: Dylan Forciea <[hidden email]<mailto:[hidden email]>> Cc: dev <[hidden email]<mailto:[hidden email]>>, Shengkai Fang <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, Leonard Xu <[hidden email]<mailto:[hidden email]>> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API This sounds good. Maybe there are others in the community who can help with the review before the Jark and Leonard are back. Cheers, Till On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <[hidden email]<mailto:[hidden email]>> wrote: Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated. Dylan From: Dylan Forciea <[hidden email]<mailto:[hidden email]>> Date: Wednesday, October 7, 2020 at 11:47 AM To: Till Rohrmann <[hidden email]<mailto:[hidden email]>>, dev <[hidden email]<mailto:[hidden email]>> Cc: Shengkai Fang <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, Leonard Xu <[hidden email]<mailto:[hidden email]>> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR. Thanks, Dylan From: Till Rohrmann <[hidden email]<mailto:[hidden email]>> Date: Wednesday, October 7, 2020 at 10:55 AM To: dev <[hidden email]<mailto:[hidden email]>> Cc: Shengkai Fang <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, Leonard Xu <[hidden email]<mailto:[hidden email]>> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Hi Dylan, thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken. What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal. [1] https://issues.apache.org/jira/browse/FLINK-12198 Cheers, Till On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]<mailto:[hidden email]>> wrote: I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it. Thanks, Dylan From: Shengkai Fang <[hidden email]<mailto:[hidden email]>> Date: Wednesday, October 7, 2020 at 9:06 AM To: Dylan Forciea <[hidden email]<mailto:[hidden email]>> Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API Sorry for late response. +1 to support it. I will open a jira about it later. Dylan Forciea <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>于2020年10月7日 周三下午9:53写道: I hadn’t heard a response on this, so I’m going to expand this to the dev email list. If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR. Regards, Dylan Forciea Oseberg From: Dylan Forciea <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> Date: Thursday, October 1, 2020 at 5:14 PM To: "[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>" <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> Subject: autoCommit for postgres jdbc streaming in Table/SQL API Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API. I see that the API has the scan.fetch-size option, but not scan.auto-commit per https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit option, and I was then able to immediately start streaming and cut my memory usage way down. I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API. Am I missing something on how to pull this off? Regards, Dylan Forciea Oseberg |
Thanks for the explanation. I will review the pull request. Let's move the
discussion to the PR. Best, Jark On Fri, 9 Oct 2020 at 21:06, Dylan Forciea <[hidden email]> wrote: > Jark, > > > > Thank you! I had actually mistyped the JIRA issue; autoCommit needs to be > set to false for streaming to work. The default on the driver is true when > the option isn’t specified. I’ve updated the issue accordingly. > > > > Setting this to false automatically on the read path would fix my issue. > However, I’m only certain that this is proper for Postgres. I’m not sure if > this should be done for other drivers, although my gut would say it should > be fine if it’s only done for reading. My patch as it is will set the > builder to not specify whether to set autoCommit if the option is not > specified, which means it would then be left at the default of true. That > would conflict with the 1.11 patch you suggested. Let me know if you think > I should make the default true in the SQL API. > > > > https://github.com/apache/flink/pull/13570 > > > > Regards, > > Dylan > > > > *From: *Jark Wu <[hidden email]> > *Date: *Thursday, October 8, 2020 at 10:15 PM > *To: *Dylan Forciea <[hidden email]> > *Cc: *Till Rohrmann <[hidden email]>, dev <[hidden email]>, > Shengkai Fang <[hidden email]>, "[hidden email]" < > [hidden email]>, Leonard Xu <[hidden email]> > *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API > > > > Hi Dylan, > > > > Sorry for the late reply. We've just come back from a long holiday. > > > > Thanks for reporting this problem. First, I think this is a bug that > `autoCommit` is false by default (JdbcRowDataInputFormat.Builder). > > We can fix the default to true in 1.11 series, and I think this can solve > your problem in a short term? > > Besides, we should expose the connector options to set auto commit and > this can be another issue to be implemented in master. > > I'm glad to review the code. > > > > What do you think? > > > > Regarding to the failed JMXReporterFactoryTest, I think this is a known > issue, see FLINK-19539 [1] > > > > Best, > > Jark > > > > [1]: https://issues.apache.org/jira/browse/FLINK-19539 > > > > On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <[hidden email]> wrote: > > I’ve updated the unit tests and documentation, and I was running the azure > test pipeline as described in the instructions. However, it appears that > what seems to be an unrelated test for the JMX code failed. Is this a > matter of me not setting things up correctly? I wanted to ensure everything > looked good before I submitted the PR. > > > > [ERROR] Failures: > > [ERROR] JMXReporterFactoryTest.testPortRangeArgument:46 > > Expected: (a value equal to or greater than <9000> and a value less than > or equal to <9010>) > > but: a value less than or equal to <9010> <9040> was greater than > <9010> > > [ERROR] JMXReporterFactoryTest.testWithoutArgument:60 > > [INFO] > > [ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0 > > > > Thanks, > > Dylan Forciea > > > > *From: *Till Rohrmann <[hidden email]> > *Date: *Thursday, October 8, 2020 at 2:29 AM > *To: *Dylan Forciea <[hidden email]> > *Cc: *dev <[hidden email]>, Shengkai Fang <[hidden email]>, " > [hidden email]" <[hidden email]>, "[hidden email]" < > [hidden email]>, Leonard Xu <[hidden email]> > *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API > > > > This sounds good. Maybe there are others in the community who can help > with the review before the Jark and Leonard are back. > > > > Cheers, > > Till > > > > On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <[hidden email]> wrote: > > Actually…. It looks like what I did covers both cases. I’ll see about > getting some unit tests and documentation updated. > > > > Dylan > > > > *From: *Dylan Forciea <[hidden email]> > *Date: *Wednesday, October 7, 2020 at 11:47 AM > *To: *Till Rohrmann <[hidden email]>, dev <[hidden email]> > *Cc: *Shengkai Fang <[hidden email]>, "[hidden email]" < > [hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu < > [hidden email]> > *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API > > > > Ok, I have created FLINK-19522 describing the issue. I have the code I > made so far checked in at > https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but > this only fixes the SQL API. It sounds like there may be another change > needed for the Table API… I’ll look into that and see if I can figure it > out on my own while they’re out. I will also need to add some unit tests > and update some documentation to get this ready for a PR. > > > > Thanks, > > Dylan > > > > *From: *Till Rohrmann <[hidden email]> > *Date: *Wednesday, October 7, 2020 at 10:55 AM > *To: *dev <[hidden email]> > *Cc: *Shengkai Fang <[hidden email]>, "[hidden email]" < > [hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu < > [hidden email]> > *Subject: *Re: autoCommit for postgres jdbc streaming in Table/SQL API > > > > Hi Dylan, > > > > thanks for reaching out to the Flink community and excuse our late > response. I am not an expert for the Table API and its JDBC connector but > what you describe sounds like a missing feature. Also given that > FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we > might simply need to make it configurable from the JdbcTableSource. I am > pulling in Jark and Leonard who worked on the JdbcTableSource and might > help you to get this feature into Flink. Their response could take a week > because they are currently on vacation if I am not mistaken. > > > > What you could already do is to open an issue linking FLINK-12198 and > describing the problem and your solution proposal. > > > > [1] https://issues.apache.org/jira/browse/FLINK-12198 > > > > Cheers, > > Till > > > > On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote: > > I appreciate it! Let me know if you want me to submit a PR against the > issue after it is created. It wasn’t a huge amount of code, so it’s > probably not a big deal if you wanted to redo it. > > Thanks, > Dylan > > From: Shengkai Fang <[hidden email]> > Date: Wednesday, October 7, 2020 at 9:06 AM > To: Dylan Forciea <[hidden email]> > Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API > > Sorry for late response. +1 to support it. I will open a jira about it > later. > > Dylan Forciea <[hidden email]<mailto:[hidden email]>>于2020年10月7日 周三下午 > 9:53写道: > > > > > > > > > > > > > > I hadn’t heard a response on this, so I’m going to expand this to the dev > email list. > > > > If this is indeed an issue and not my misunderstanding, I have most of a > patch already coded up. Please let me know, and I can create a JIRA issue > and send out a PR. > > > > Regards, > > Dylan Forciea > > Oseberg > > > > > From: Dylan Forciea <[hidden email]<mailto:[hidden email]>> > > > Date: Thursday, October 1, 2020 at 5:14 PM > > > To: "[hidden email]<mailto:[hidden email]>" < > [hidden email]<mailto:[hidden email]>> > > > Subject: autoCommit for postgres jdbc streaming in Table/SQL API > > > > > > > Hi! I’ve just recently started evaluating Flink for our ETL needs, and I > ran across an issue with streaming postgres data via the Table/SQL API. > > > > I see that the API has the scan.fetch-size option, but not > scan.auto-commit per > > > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html > . I had attempted to load a large table in, but it completely slurped it > into memory before starting the streaming. I modified the flink source code > to add a scan.auto-commit > > option, and I was then able to immediately start streaming and cut my > memory usage way down. > > > > I see in this thread that there was a similar issue resolved for > JDBCInputFormat in this thread: > > > > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html > , but I don’t see a way to utilize that in the Table/SQL API. > > > > Am I missing something on how to pull this off? > > > > Regards, > > Dylan Forciea > > Oseberg > > > > |
Free forum by Nabble | Edit this page |