I didn’t see any docs on this - is there a JDBC Driver that allows the same functionalities as the SQL CLI ?
If not , is it on the roadmap ? |
Hi Hanan,
I'm not aware of any plans to add a JDBC Driver. One issue with the JDBC interface is that it only works well for queries on batch data and a subset of queries on streaming data. Many streaming SQL queries are not able to emit final results (or need to update previously emitted results). Take for instance a query like SELECT colA, COUNT(*) FROM tab GROUP BY colA; If tab is a continuously growing table, no row of the queries result will ever be final because a new row with any value of colA can be added at any point in time. JDBC does not support to retract or update result rows that were emitted before. Best, Fabian Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai < [hidden email]>: > I didn’t see any docs on this - is there a JDBC Driver that allows the > same functionalities as the SQL CLI ? > If not , is it on the roadmap ? > > |
Fabian , looking at the response below again..
As I’m currently looking into the Batch mode only ( execution result mode = table ) I was thinking of wrapping the SQL CLI code with a Calcite Adapter might do the trick. I don’t want to have a different execution engine ( like DRILL) just to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd part display ( BI tools , notebooks , etc..). Do you believe its a viable solution while the JDBC and SQL GW is still work in progress ? -----Original Message----- From: Fabian Hueske <[hidden email]> Sent: 8 April 2019 11:18 To: dev <[hidden email]> Subject: Re: SQL CLI and JDBC Hi Hanan, I'm not aware of any plans to add a JDBC Driver. One issue with the JDBC interface is that it only works well for queries on batch data and a subset of queries on streaming data. Many streaming SQL queries are not able to emit final results (or need to update previously emitted results). Take for instance a query like SELECT colA, COUNT(*) FROM tab GROUP BY colA; If tab is a continuously growing table, no row of the queries result will ever be final because a new row with any value of colA can be added at any point in time. JDBC does not support to retract or update result rows that were emitted before. Best, Fabian Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai < [hidden email]>: > I didn’t see any docs on this - is there a JDBC Driver that allows > the same functionalities as the SQL CLI ? > If not , is it on the roadmap ? > > |
Hi,
I don't have much experience with Calcite connectors. One potential problem might be fetching the results. The CLI client uses the DataSet.collect() method which collects all results from all TMs in the JM and (AFAIK) transfers it in a single RPC message back to the client. Hence, this only works for small results (a few MBs) and breaks if the result size exceeds the max message size of RPC calls. For even larger results, it might even crash the JM. You would need a robust mechanism to collect results from multiple TMs. Best, Fabian Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai < [hidden email]>: > Fabian , looking at the response below again.. > > As I’m currently looking into the Batch mode only ( execution result mode > = table ) > I was thinking of wrapping the SQL CLI code with a Calcite Adapter might > do the trick. > > I don’t want to have a different execution engine ( like DRILL) just to > allow ad hoc queries. And JDBC will allow me to use a lot of 3rd part > display ( BI tools , notebooks , etc..). > > Do you believe its a viable solution while the JDBC and SQL GW is still > work in progress ? > > > -----Original Message----- > From: Fabian Hueske <[hidden email]> > Sent: 8 April 2019 11:18 > To: dev <[hidden email]> > Subject: Re: SQL CLI and JDBC > > Hi Hanan, > > I'm not aware of any plans to add a JDBC Driver. > > One issue with the JDBC interface is that it only works well for queries > on batch data and a subset of queries on streaming data. > > Many streaming SQL queries are not able to emit final results (or need to > update previously emitted results). > Take for instance a query like > > SELECT colA, COUNT(*) > FROM tab > GROUP BY colA; > > If tab is a continuously growing table, no row of the queries result will > ever be final because a new row with any value of colA can be added at any > point in time. > JDBC does not support to retract or update result rows that were emitted > before. > > Best, Fabian > > > Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai < > [hidden email]>: > > > I didn’t see any docs on this - is there a JDBC Driver that allows > > the same functionalities as the SQL CLI ? > > If not , is it on the roadmap ? > > > > > |
Yes, I Know .
Going to replace this with Kafka once the approach will work for me 😊 -----Original Message----- From: Fabian Hueske <[hidden email]> Sent: 15 April 2019 11:46 To: dev <[hidden email]> Subject: Re: SQL CLI and JDBC Hi, I don't have much experience with Calcite connectors. One potential problem might be fetching the results. The CLI client uses the DataSet.collect() method which collects all results from all TMs in the JM and (AFAIK) transfers it in a single RPC message back to the client. Hence, this only works for small results (a few MBs) and breaks if the result size exceeds the max message size of RPC calls. For even larger results, it might even crash the JM. You would need a robust mechanism to collect results from multiple TMs. Best, Fabian Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai < [hidden email]>: > Fabian , looking at the response below again.. > > As I’m currently looking into the Batch mode only ( execution result > mode = table ) I was thinking of wrapping the SQL CLI code with a > Calcite Adapter might do the trick. > > I don’t want to have a different execution engine ( like DRILL) just > to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd > part display ( BI tools , notebooks , etc..). > > Do you believe its a viable solution while the JDBC and SQL GW is > still work in progress ? > > > -----Original Message----- > From: Fabian Hueske <[hidden email]> > Sent: 8 April 2019 11:18 > To: dev <[hidden email]> > Subject: Re: SQL CLI and JDBC > > Hi Hanan, > > I'm not aware of any plans to add a JDBC Driver. > > One issue with the JDBC interface is that it only works well for > queries on batch data and a subset of queries on streaming data. > > Many streaming SQL queries are not able to emit final results (or need > to update previously emitted results). > Take for instance a query like > > SELECT colA, COUNT(*) > FROM tab > GROUP BY colA; > > If tab is a continuously growing table, no row of the queries result > will ever be final because a new row with any value of colA can be > added at any point in time. > JDBC does not support to retract or update result rows that were > emitted before. > > Best, Fabian > > > Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai < > [hidden email]>: > > > I didn’t see any docs on this - is there a JDBC Driver that allows > > the same functionalities as the SQL CLI ? > > If not , is it on the roadmap ? > > > > > |
I think this problem sounds fixable. Having proper JDBC support through the
SQL client would be really cool! Adding Timo and Shaoxuan here: Let's assume that the "collect()" call supports large results (I think we can get that support through the blob manager with some changes). What do you think about adding JDBC support? On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]> wrote: > Yes, I Know . > > Going to replace this with Kafka once the approach will work for me 😊 > > > -----Original Message----- > From: Fabian Hueske <[hidden email]> > Sent: 15 April 2019 11:46 > To: dev <[hidden email]> > Subject: Re: SQL CLI and JDBC > > Hi, > > I don't have much experience with Calcite connectors. > > One potential problem might be fetching the results. The CLI client uses > the DataSet.collect() method which collects all results from all TMs in the > JM and (AFAIK) transfers it in a single RPC message back to the client. > Hence, this only works for small results (a few MBs) and breaks if the > result size exceeds the max message size of RPC calls. For even larger > results, it might even crash the JM. > You would need a robust mechanism to collect results from multiple TMs. > > Best, Fabian > > > Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai < > [hidden email]>: > > > Fabian , looking at the response below again.. > > > > As I’m currently looking into the Batch mode only ( execution result > > mode = table ) I was thinking of wrapping the SQL CLI code with a > > Calcite Adapter might do the trick. > > > > I don’t want to have a different execution engine ( like DRILL) just > > to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd > > part display ( BI tools , notebooks , etc..). > > > > Do you believe its a viable solution while the JDBC and SQL GW is > > still work in progress ? > > > > > > -----Original Message----- > > From: Fabian Hueske <[hidden email]> > > Sent: 8 April 2019 11:18 > > To: dev <[hidden email]> > > Subject: Re: SQL CLI and JDBC > > > > Hi Hanan, > > > > I'm not aware of any plans to add a JDBC Driver. > > > > One issue with the JDBC interface is that it only works well for > > queries on batch data and a subset of queries on streaming data. > > > > Many streaming SQL queries are not able to emit final results (or need > > to update previously emitted results). > > Take for instance a query like > > > > SELECT colA, COUNT(*) > > FROM tab > > GROUP BY colA; > > > > If tab is a continuously growing table, no row of the queries result > > will ever be final because a new row with any value of colA can be > > added at any point in time. > > JDBC does not support to retract or update result rows that were > > emitted before. > > > > Best, Fabian > > > > > > Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai < > > [hidden email]>: > > > > > I didn’t see any docs on this - is there a JDBC Driver that allows > > > the same functionalities as the SQL CLI ? > > > If not , is it on the roadmap ? > > > > > > > > > |
Also +1 to support JDBC.
Best, Kurt On Wed, Apr 17, 2019 at 7:38 PM Stephan Ewen <[hidden email]> wrote: > I think this problem sounds fixable. Having proper JDBC support through the > SQL client would be really cool! > > Adding Timo and Shaoxuan here: > > Let's assume that the "collect()" call supports large results (I think we > can get that support through the blob manager with some changes). > What do you think about adding JDBC support? > > On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]> > wrote: > > > Yes, I Know . > > > > Going to replace this with Kafka once the approach will work for me 😊 > > > > > > -----Original Message----- > > From: Fabian Hueske <[hidden email]> > > Sent: 15 April 2019 11:46 > > To: dev <[hidden email]> > > Subject: Re: SQL CLI and JDBC > > > > Hi, > > > > I don't have much experience with Calcite connectors. > > > > One potential problem might be fetching the results. The CLI client uses > > the DataSet.collect() method which collects all results from all TMs in > the > > JM and (AFAIK) transfers it in a single RPC message back to the client. > > Hence, this only works for small results (a few MBs) and breaks if the > > result size exceeds the max message size of RPC calls. For even larger > > results, it might even crash the JM. > > You would need a robust mechanism to collect results from multiple TMs. > > > > Best, Fabian > > > > > > Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai < > > [hidden email]>: > > > > > Fabian , looking at the response below again.. > > > > > > As I’m currently looking into the Batch mode only ( execution result > > > mode = table ) I was thinking of wrapping the SQL CLI code with a > > > Calcite Adapter might do the trick. > > > > > > I don’t want to have a different execution engine ( like DRILL) just > > > to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd > > > part display ( BI tools , notebooks , etc..). > > > > > > Do you believe its a viable solution while the JDBC and SQL GW is > > > still work in progress ? > > > > > > > > > -----Original Message----- > > > From: Fabian Hueske <[hidden email]> > > > Sent: 8 April 2019 11:18 > > > To: dev <[hidden email]> > > > Subject: Re: SQL CLI and JDBC > > > > > > Hi Hanan, > > > > > > I'm not aware of any plans to add a JDBC Driver. > > > > > > One issue with the JDBC interface is that it only works well for > > > queries on batch data and a subset of queries on streaming data. > > > > > > Many streaming SQL queries are not able to emit final results (or need > > > to update previously emitted results). > > > Take for instance a query like > > > > > > SELECT colA, COUNT(*) > > > FROM tab > > > GROUP BY colA; > > > > > > If tab is a continuously growing table, no row of the queries result > > > will ever be final because a new row with any value of colA can be > > > added at any point in time. > > > JDBC does not support to retract or update result rows that were > > > emitted before. > > > > > > Best, Fabian > > > > > > > > > Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai < > > > [hidden email]>: > > > > > > > I didn’t see any docs on this - is there a JDBC Driver that allows > > > > the same functionalities as the SQL CLI ? > > > > If not , is it on the roadmap ? > > > > > > > > > > > > > > |
+1 for supporting jdbc. One concern is that we need to provide a dedicated
service to jdbc support. But sql-client is not designed to be a service IIUC, it doesn't expose any api for users, and it is designed to be used by single user, not for multiple users and concurrent usage. IMHO, we might need to create a new dedicated service for jdbc support, something like hive's thrift server. Kurt Young <[hidden email]> 于2019年4月17日周三 下午7:40写道: > Also +1 to support JDBC. > > Best, > Kurt > > > On Wed, Apr 17, 2019 at 7:38 PM Stephan Ewen <[hidden email]> wrote: > > > I think this problem sounds fixable. Having proper JDBC support through > the > > SQL client would be really cool! > > > > Adding Timo and Shaoxuan here: > > > > Let's assume that the "collect()" call supports large results (I think we > > can get that support through the blob manager with some changes). > > What do you think about adding JDBC support? > > > > On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]> > > wrote: > > > > > Yes, I Know . > > > > > > Going to replace this with Kafka once the approach will work for me 😊 > > > > > > > > > -----Original Message----- > > > From: Fabian Hueske <[hidden email]> > > > Sent: 15 April 2019 11:46 > > > To: dev <[hidden email]> > > > Subject: Re: SQL CLI and JDBC > > > > > > Hi, > > > > > > I don't have much experience with Calcite connectors. > > > > > > One potential problem might be fetching the results. The CLI client > uses > > > the DataSet.collect() method which collects all results from all TMs in > > the > > > JM and (AFAIK) transfers it in a single RPC message back to the client. > > > Hence, this only works for small results (a few MBs) and breaks if the > > > result size exceeds the max message size of RPC calls. For even larger > > > results, it might even crash the JM. > > > You would need a robust mechanism to collect results from multiple TMs. > > > > > > Best, Fabian > > > > > > > > > Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai < > > > [hidden email]>: > > > > > > > Fabian , looking at the response below again.. > > > > > > > > As I’m currently looking into the Batch mode only ( execution result > > > > mode = table ) I was thinking of wrapping the SQL CLI code with a > > > > Calcite Adapter might do the trick. > > > > > > > > I don’t want to have a different execution engine ( like DRILL) just > > > > to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd > > > > part display ( BI tools , notebooks , etc..). > > > > > > > > Do you believe its a viable solution while the JDBC and SQL GW is > > > > still work in progress ? > > > > > > > > > > > > -----Original Message----- > > > > From: Fabian Hueske <[hidden email]> > > > > Sent: 8 April 2019 11:18 > > > > To: dev <[hidden email]> > > > > Subject: Re: SQL CLI and JDBC > > > > > > > > Hi Hanan, > > > > > > > > I'm not aware of any plans to add a JDBC Driver. > > > > > > > > One issue with the JDBC interface is that it only works well for > > > > queries on batch data and a subset of queries on streaming data. > > > > > > > > Many streaming SQL queries are not able to emit final results (or > need > > > > to update previously emitted results). > > > > Take for instance a query like > > > > > > > > SELECT colA, COUNT(*) > > > > FROM tab > > > > GROUP BY colA; > > > > > > > > If tab is a continuously growing table, no row of the queries result > > > > will ever be final because a new row with any value of colA can be > > > > added at any point in time. > > > > JDBC does not support to retract or update result rows that were > > > > emitted before. > > > > > > > > Best, Fabian > > > > > > > > > > > > Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai < > > > > [hidden email]>: > > > > > > > > > I didn’t see any docs on this - is there a JDBC Driver that allows > > > > > the same functionalities as the SQL CLI ? > > > > > If not , is it on the roadmap ? > > > > > > > > > > > > > > > > > > > > -- Best Regards Jeff Zhang |
> 在 2019年4月17日,下午9:14,Jeff Zhang <[hidden email]> 写道: > > +1 for supporting jdbc. One concern is that we need to provide a dedicated > service to jdbc support. But sql-client is not designed to be a service > IIUC, it doesn't expose any api for users, and it is designed to be used by > single user, not for multiple users and concurrent usage. > > IMHO, we might need to create a new dedicated service for jdbc support, > something like hive's thrift server. > > Kurt Young <[hidden email]> 于2019年4月17日周三 下午7:40写道: > >> Also +1 to support JDBC. >> >> Best, >> Kurt >> >> >> On Wed, Apr 17, 2019 at 7:38 PM Stephan Ewen <[hidden email]> wrote: >> >>> I think this problem sounds fixable. Having proper JDBC support through >> the >>> SQL client would be really cool! >>> >>> Adding Timo and Shaoxuan here: >>> >>> Let's assume that the "collect()" call supports large results (I think we >>> can get that support through the blob manager with some changes). >>> What do you think about adding JDBC support? >>> >>> On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]> >>> wrote: >>> >>>> Yes, I Know . >>>> >>>> Going to replace this with Kafka once the approach will work for me 😊 >>>> >>>> >>>> -----Original Message----- >>>> From: Fabian Hueske <[hidden email]> >>>> Sent: 15 April 2019 11:46 >>>> To: dev <[hidden email]> >>>> Subject: Re: SQL CLI and JDBC >>>> >>>> Hi, >>>> >>>> I don't have much experience with Calcite connectors. >>>> >>>> One potential problem might be fetching the results. The CLI client >> uses >>>> the DataSet.collect() method which collects all results from all TMs in >>> the >>>> JM and (AFAIK) transfers it in a single RPC message back to the client. >>>> Hence, this only works for small results (a few MBs) and breaks if the >>>> result size exceeds the max message size of RPC calls. For even larger >>>> results, it might even crash the JM. >>>> You would need a robust mechanism to collect results from multiple TMs. >>>> >>>> Best, Fabian >>>> >>>> >>>> Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai < >>>> [hidden email]>: >>>> >>>>> Fabian , looking at the response below again.. >>>>> >>>>> As I’m currently looking into the Batch mode only ( execution result >>>>> mode = table ) I was thinking of wrapping the SQL CLI code with a >>>>> Calcite Adapter might do the trick. >>>>> >>>>> I don’t want to have a different execution engine ( like DRILL) just >>>>> to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd >>>>> part display ( BI tools , notebooks , etc..). >>>>> >>>>> Do you believe its a viable solution while the JDBC and SQL GW is >>>>> still work in progress ? >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Fabian Hueske <[hidden email]> >>>>> Sent: 8 April 2019 11:18 >>>>> To: dev <[hidden email]> >>>>> Subject: Re: SQL CLI and JDBC >>>>> >>>>> Hi Hanan, >>>>> >>>>> I'm not aware of any plans to add a JDBC Driver. >>>>> >>>>> One issue with the JDBC interface is that it only works well for >>>>> queries on batch data and a subset of queries on streaming data. >>>>> >>>>> Many streaming SQL queries are not able to emit final results (or >> need >>>>> to update previously emitted results). >>>>> Take for instance a query like >>>>> >>>>> SELECT colA, COUNT(*) >>>>> FROM tab >>>>> GROUP BY colA; >>>>> >>>>> If tab is a continuously growing table, no row of the queries result >>>>> will ever be final because a new row with any value of colA can be >>>>> added at any point in time. >>>>> JDBC does not support to retract or update result rows that were >>>>> emitted before. >>>>> >>>>> Best, Fabian >>>>> >>>>> >>>>> Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai < >>>>> [hidden email]>: >>>>> >>>>>> I didn’t see any docs on this - is there a JDBC Driver that allows >>>>>> the same functionalities as the SQL CLI ? >>>>>> If not , is it on the roadmap ? >>>>>> >>>>>> >>>>> >>>> >>> >> > > > -- > Best Regards > > Jeff Zhang |
Free forum by Nabble | Edit this page |