SQL CLI and JDBC

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

SQL CLI and JDBC

Hanan Yehudai
I didn’t see any docs on this -  is there a JDBC Driver that allows the same functionalities as the SQL CLI ?
If not , is it on the roadmap ?

Reply | Threaded
Open this post in threaded view
|

Re: SQL CLI and JDBC

Fabian Hueske-2
Hi Hanan,

I'm not aware of any plans to add a JDBC Driver.

One issue with the JDBC interface is that it only works well for queries on
batch data and a subset of queries on streaming data.

Many streaming SQL queries are not able to emit final results (or need to
update previously emitted results).
Take for instance a query like

SELECT colA, COUNT(*)
FROM tab
GROUP BY colA;

If tab is a continuously growing table, no row of the queries result will
ever be final because a new row with any value of colA can be added at any
point in time.
JDBC does not support to retract or update result rows that were emitted
before.

Best, Fabian


Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
[hidden email]>:

> I didn’t see any docs on this -  is there a JDBC Driver that allows the
> same functionalities as the SQL CLI ?
> If not , is it on the roadmap ?
>
>
Reply | Threaded
Open this post in threaded view
|

RE: SQL CLI and JDBC

Hanan Yehudai
Fabian , looking at the response below again..

As I’m currently looking into the Batch mode only ( execution result mode = table )
I was thinking of wrapping the SQL CLI code with a Calcite Adapter might do the trick.

I don’t want to have a different execution engine ( like  DRILL) just to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd part display ( BI tools , notebooks , etc..).

Do you believe its  a viable solution while the JDBC and SQL GW is still work in progress ?


-----Original Message-----
From: Fabian Hueske <[hidden email]>
Sent: 8 April 2019 11:18
To: dev <[hidden email]>
Subject: Re: SQL CLI and JDBC

Hi Hanan,

I'm not aware of any plans to add a JDBC Driver.

One issue with the JDBC interface is that it only works well for queries on batch data and a subset of queries on streaming data.

Many streaming SQL queries are not able to emit final results (or need to update previously emitted results).
Take for instance a query like

SELECT colA, COUNT(*)
FROM tab
GROUP BY colA;

If tab is a continuously growing table, no row of the queries result will ever be final because a new row with any value of colA can be added at any point in time.
JDBC does not support to retract or update result rows that were emitted before.

Best, Fabian


Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
[hidden email]>:

> I didn’t see any docs on this -  is there a JDBC Driver that allows
> the same functionalities as the SQL CLI ?
> If not , is it on the roadmap ?
>
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL CLI and JDBC

Fabian Hueske-2
Hi,

I don't have much experience with Calcite connectors.

One potential problem might be fetching the results. The CLI client uses
the DataSet.collect() method which collects all results from all TMs in the
JM and (AFAIK) transfers it in a single RPC message back to the client.
Hence, this only works for small results (a few MBs) and breaks if the
result size exceeds the max message size of RPC calls. For even larger
results, it might even crash the JM.
You would need a robust mechanism to collect results from multiple TMs.

Best, Fabian


Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
[hidden email]>:

> Fabian , looking at the response below again..
>
> As I’m currently looking into the Batch mode only ( execution result mode
> = table )
> I was thinking of wrapping the SQL CLI code with a Calcite Adapter might
> do the trick.
>
> I don’t want to have a different execution engine ( like  DRILL) just to
> allow ad hoc queries. And JDBC will allow me to use a lot of 3rd part
> display ( BI tools , notebooks , etc..).
>
> Do you believe its  a viable solution while the JDBC and SQL GW is still
> work in progress ?
>
>
> -----Original Message-----
> From: Fabian Hueske <[hidden email]>
> Sent: 8 April 2019 11:18
> To: dev <[hidden email]>
> Subject: Re: SQL CLI and JDBC
>
> Hi Hanan,
>
> I'm not aware of any plans to add a JDBC Driver.
>
> One issue with the JDBC interface is that it only works well for queries
> on batch data and a subset of queries on streaming data.
>
> Many streaming SQL queries are not able to emit final results (or need to
> update previously emitted results).
> Take for instance a query like
>
> SELECT colA, COUNT(*)
> FROM tab
> GROUP BY colA;
>
> If tab is a continuously growing table, no row of the queries result will
> ever be final because a new row with any value of colA can be added at any
> point in time.
> JDBC does not support to retract or update result rows that were emitted
> before.
>
> Best, Fabian
>
>
> Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
> [hidden email]>:
>
> > I didn’t see any docs on this -  is there a JDBC Driver that allows
> > the same functionalities as the SQL CLI ?
> > If not , is it on the roadmap ?
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: SQL CLI and JDBC

Hanan Yehudai
Yes, I Know .

 Going to replace this with Kafka once the approach will work for me 😊


-----Original Message-----
From: Fabian Hueske <[hidden email]>
Sent: 15 April 2019 11:46
To: dev <[hidden email]>
Subject: Re: SQL CLI and JDBC

Hi,

I don't have much experience with Calcite connectors.

One potential problem might be fetching the results. The CLI client uses the DataSet.collect() method which collects all results from all TMs in the JM and (AFAIK) transfers it in a single RPC message back to the client.
Hence, this only works for small results (a few MBs) and breaks if the result size exceeds the max message size of RPC calls. For even larger results, it might even crash the JM.
You would need a robust mechanism to collect results from multiple TMs.

Best, Fabian


Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
[hidden email]>:

> Fabian , looking at the response below again..
>
> As I’m currently looking into the Batch mode only ( execution result
> mode = table ) I was thinking of wrapping the SQL CLI code with a
> Calcite Adapter might do the trick.
>
> I don’t want to have a different execution engine ( like  DRILL) just
> to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd
> part display ( BI tools , notebooks , etc..).
>
> Do you believe its  a viable solution while the JDBC and SQL GW is
> still work in progress ?
>
>
> -----Original Message-----
> From: Fabian Hueske <[hidden email]>
> Sent: 8 April 2019 11:18
> To: dev <[hidden email]>
> Subject: Re: SQL CLI and JDBC
>
> Hi Hanan,
>
> I'm not aware of any plans to add a JDBC Driver.
>
> One issue with the JDBC interface is that it only works well for
> queries on batch data and a subset of queries on streaming data.
>
> Many streaming SQL queries are not able to emit final results (or need
> to update previously emitted results).
> Take for instance a query like
>
> SELECT colA, COUNT(*)
> FROM tab
> GROUP BY colA;
>
> If tab is a continuously growing table, no row of the queries result
> will ever be final because a new row with any value of colA can be
> added at any point in time.
> JDBC does not support to retract or update result rows that were
> emitted before.
>
> Best, Fabian
>
>
> Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
> [hidden email]>:
>
> > I didn’t see any docs on this -  is there a JDBC Driver that allows
> > the same functionalities as the SQL CLI ?
> > If not , is it on the roadmap ?
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL CLI and JDBC

Stephan Ewen
I think this problem sounds fixable. Having proper JDBC support through the
SQL client would be really cool!

Adding Timo and Shaoxuan here:

Let's assume that the "collect()" call supports large results (I think we
can get that support through the blob manager with some changes).
What do you think about adding JDBC support?

On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]>
wrote:

> Yes, I Know .
>
>  Going to replace this with Kafka once the approach will work for me 😊
>
>
> -----Original Message-----
> From: Fabian Hueske <[hidden email]>
> Sent: 15 April 2019 11:46
> To: dev <[hidden email]>
> Subject: Re: SQL CLI and JDBC
>
> Hi,
>
> I don't have much experience with Calcite connectors.
>
> One potential problem might be fetching the results. The CLI client uses
> the DataSet.collect() method which collects all results from all TMs in the
> JM and (AFAIK) transfers it in a single RPC message back to the client.
> Hence, this only works for small results (a few MBs) and breaks if the
> result size exceeds the max message size of RPC calls. For even larger
> results, it might even crash the JM.
> You would need a robust mechanism to collect results from multiple TMs.
>
> Best, Fabian
>
>
> Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
> [hidden email]>:
>
> > Fabian , looking at the response below again..
> >
> > As I’m currently looking into the Batch mode only ( execution result
> > mode = table ) I was thinking of wrapping the SQL CLI code with a
> > Calcite Adapter might do the trick.
> >
> > I don’t want to have a different execution engine ( like  DRILL) just
> > to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd
> > part display ( BI tools , notebooks , etc..).
> >
> > Do you believe its  a viable solution while the JDBC and SQL GW is
> > still work in progress ?
> >
> >
> > -----Original Message-----
> > From: Fabian Hueske <[hidden email]>
> > Sent: 8 April 2019 11:18
> > To: dev <[hidden email]>
> > Subject: Re: SQL CLI and JDBC
> >
> > Hi Hanan,
> >
> > I'm not aware of any plans to add a JDBC Driver.
> >
> > One issue with the JDBC interface is that it only works well for
> > queries on batch data and a subset of queries on streaming data.
> >
> > Many streaming SQL queries are not able to emit final results (or need
> > to update previously emitted results).
> > Take for instance a query like
> >
> > SELECT colA, COUNT(*)
> > FROM tab
> > GROUP BY colA;
> >
> > If tab is a continuously growing table, no row of the queries result
> > will ever be final because a new row with any value of colA can be
> > added at any point in time.
> > JDBC does not support to retract or update result rows that were
> > emitted before.
> >
> > Best, Fabian
> >
> >
> > Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
> > [hidden email]>:
> >
> > > I didn’t see any docs on this -  is there a JDBC Driver that allows
> > > the same functionalities as the SQL CLI ?
> > > If not , is it on the roadmap ?
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL CLI and JDBC

Kurt Young
Also +1 to support JDBC.

Best,
Kurt


On Wed, Apr 17, 2019 at 7:38 PM Stephan Ewen <[hidden email]> wrote:

> I think this problem sounds fixable. Having proper JDBC support through the
> SQL client would be really cool!
>
> Adding Timo and Shaoxuan here:
>
> Let's assume that the "collect()" call supports large results (I think we
> can get that support through the blob manager with some changes).
> What do you think about adding JDBC support?
>
> On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]>
> wrote:
>
> > Yes, I Know .
> >
> >  Going to replace this with Kafka once the approach will work for me 😊
> >
> >
> > -----Original Message-----
> > From: Fabian Hueske <[hidden email]>
> > Sent: 15 April 2019 11:46
> > To: dev <[hidden email]>
> > Subject: Re: SQL CLI and JDBC
> >
> > Hi,
> >
> > I don't have much experience with Calcite connectors.
> >
> > One potential problem might be fetching the results. The CLI client uses
> > the DataSet.collect() method which collects all results from all TMs in
> the
> > JM and (AFAIK) transfers it in a single RPC message back to the client.
> > Hence, this only works for small results (a few MBs) and breaks if the
> > result size exceeds the max message size of RPC calls. For even larger
> > results, it might even crash the JM.
> > You would need a robust mechanism to collect results from multiple TMs.
> >
> > Best, Fabian
> >
> >
> > Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
> > [hidden email]>:
> >
> > > Fabian , looking at the response below again..
> > >
> > > As I’m currently looking into the Batch mode only ( execution result
> > > mode = table ) I was thinking of wrapping the SQL CLI code with a
> > > Calcite Adapter might do the trick.
> > >
> > > I don’t want to have a different execution engine ( like  DRILL) just
> > > to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd
> > > part display ( BI tools , notebooks , etc..).
> > >
> > > Do you believe its  a viable solution while the JDBC and SQL GW is
> > > still work in progress ?
> > >
> > >
> > > -----Original Message-----
> > > From: Fabian Hueske <[hidden email]>
> > > Sent: 8 April 2019 11:18
> > > To: dev <[hidden email]>
> > > Subject: Re: SQL CLI and JDBC
> > >
> > > Hi Hanan,
> > >
> > > I'm not aware of any plans to add a JDBC Driver.
> > >
> > > One issue with the JDBC interface is that it only works well for
> > > queries on batch data and a subset of queries on streaming data.
> > >
> > > Many streaming SQL queries are not able to emit final results (or need
> > > to update previously emitted results).
> > > Take for instance a query like
> > >
> > > SELECT colA, COUNT(*)
> > > FROM tab
> > > GROUP BY colA;
> > >
> > > If tab is a continuously growing table, no row of the queries result
> > > will ever be final because a new row with any value of colA can be
> > > added at any point in time.
> > > JDBC does not support to retract or update result rows that were
> > > emitted before.
> > >
> > > Best, Fabian
> > >
> > >
> > > Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
> > > [hidden email]>:
> > >
> > > > I didn’t see any docs on this -  is there a JDBC Driver that allows
> > > > the same functionalities as the SQL CLI ?
> > > > If not , is it on the roadmap ?
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL CLI and JDBC

Jeff Zhang
+1 for supporting jdbc. One concern is that we need to provide a dedicated
service to jdbc support. But sql-client is not designed to be a service
IIUC, it doesn't expose any api for users, and it is designed to be used by
single user, not for multiple users and concurrent usage.

IMHO, we might need to create a new dedicated service for jdbc support,
something like hive's thrift server.

Kurt Young <[hidden email]> 于2019年4月17日周三 下午7:40写道:

> Also +1 to support JDBC.
>
> Best,
> Kurt
>
>
> On Wed, Apr 17, 2019 at 7:38 PM Stephan Ewen <[hidden email]> wrote:
>
> > I think this problem sounds fixable. Having proper JDBC support through
> the
> > SQL client would be really cool!
> >
> > Adding Timo and Shaoxuan here:
> >
> > Let's assume that the "collect()" call supports large results (I think we
> > can get that support through the blob manager with some changes).
> > What do you think about adding JDBC support?
> >
> > On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]>
> > wrote:
> >
> > > Yes, I Know .
> > >
> > >  Going to replace this with Kafka once the approach will work for me 😊
> > >
> > >
> > > -----Original Message-----
> > > From: Fabian Hueske <[hidden email]>
> > > Sent: 15 April 2019 11:46
> > > To: dev <[hidden email]>
> > > Subject: Re: SQL CLI and JDBC
> > >
> > > Hi,
> > >
> > > I don't have much experience with Calcite connectors.
> > >
> > > One potential problem might be fetching the results. The CLI client
> uses
> > > the DataSet.collect() method which collects all results from all TMs in
> > the
> > > JM and (AFAIK) transfers it in a single RPC message back to the client.
> > > Hence, this only works for small results (a few MBs) and breaks if the
> > > result size exceeds the max message size of RPC calls. For even larger
> > > results, it might even crash the JM.
> > > You would need a robust mechanism to collect results from multiple TMs.
> > >
> > > Best, Fabian
> > >
> > >
> > > Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
> > > [hidden email]>:
> > >
> > > > Fabian , looking at the response below again..
> > > >
> > > > As I’m currently looking into the Batch mode only ( execution result
> > > > mode = table ) I was thinking of wrapping the SQL CLI code with a
> > > > Calcite Adapter might do the trick.
> > > >
> > > > I don’t want to have a different execution engine ( like  DRILL) just
> > > > to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd
> > > > part display ( BI tools , notebooks , etc..).
> > > >
> > > > Do you believe its  a viable solution while the JDBC and SQL GW is
> > > > still work in progress ?
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Fabian Hueske <[hidden email]>
> > > > Sent: 8 April 2019 11:18
> > > > To: dev <[hidden email]>
> > > > Subject: Re: SQL CLI and JDBC
> > > >
> > > > Hi Hanan,
> > > >
> > > > I'm not aware of any plans to add a JDBC Driver.
> > > >
> > > > One issue with the JDBC interface is that it only works well for
> > > > queries on batch data and a subset of queries on streaming data.
> > > >
> > > > Many streaming SQL queries are not able to emit final results (or
> need
> > > > to update previously emitted results).
> > > > Take for instance a query like
> > > >
> > > > SELECT colA, COUNT(*)
> > > > FROM tab
> > > > GROUP BY colA;
> > > >
> > > > If tab is a continuously growing table, no row of the queries result
> > > > will ever be final because a new row with any value of colA can be
> > > > added at any point in time.
> > > > JDBC does not support to retract or update result rows that were
> > > > emitted before.
> > > >
> > > > Best, Fabian
> > > >
> > > >
> > > > Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
> > > > [hidden email]>:
> > > >
> > > > > I didn’t see any docs on this -  is there a JDBC Driver that allows
> > > > > the same functionalities as the SQL CLI ?
> > > > > If not , is it on the roadmap ?
> > > > >
> > > > >
> > > >
> > >
> >
>


--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: SQL CLI and JDBC

zhang yue


> 在 2019年4月17日,下午9:14,Jeff Zhang <[hidden email]> 写道:
>
> +1 for supporting jdbc. One concern is that we need to provide a dedicated
> service to jdbc support. But sql-client is not designed to be a service
> IIUC, it doesn't expose any api for users, and it is designed to be used by
> single user, not for multiple users and concurrent usage.
>
> IMHO, we might need to create a new dedicated service for jdbc support,
> something like hive's thrift server.
>
> Kurt Young <[hidden email]> 于2019年4月17日周三 下午7:40写道:
>
>> Also +1 to support JDBC.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Apr 17, 2019 at 7:38 PM Stephan Ewen <[hidden email]> wrote:
>>
>>> I think this problem sounds fixable. Having proper JDBC support through
>> the
>>> SQL client would be really cool!
>>>
>>> Adding Timo and Shaoxuan here:
>>>
>>> Let's assume that the "collect()" call supports large results (I think we
>>> can get that support through the blob manager with some changes).
>>> What do you think about adding JDBC support?
>>>
>>> On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]>
>>> wrote:
>>>
>>>> Yes, I Know .
>>>>
>>>> Going to replace this with Kafka once the approach will work for me 😊
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Fabian Hueske <[hidden email]>
>>>> Sent: 15 April 2019 11:46
>>>> To: dev <[hidden email]>
>>>> Subject: Re: SQL CLI and JDBC
>>>>
>>>> Hi,
>>>>
>>>> I don't have much experience with Calcite connectors.
>>>>
>>>> One potential problem might be fetching the results. The CLI client
>> uses
>>>> the DataSet.collect() method which collects all results from all TMs in
>>> the
>>>> JM and (AFAIK) transfers it in a single RPC message back to the client.
>>>> Hence, this only works for small results (a few MBs) and breaks if the
>>>> result size exceeds the max message size of RPC calls. For even larger
>>>> results, it might even crash the JM.
>>>> You would need a robust mechanism to collect results from multiple TMs.
>>>>
>>>> Best, Fabian
>>>>
>>>>
>>>> Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
>>>> [hidden email]>:
>>>>
>>>>> Fabian , looking at the response below again..
>>>>>
>>>>> As I’m currently looking into the Batch mode only ( execution result
>>>>> mode = table ) I was thinking of wrapping the SQL CLI code with a
>>>>> Calcite Adapter might do the trick.
>>>>>
>>>>> I don’t want to have a different execution engine ( like  DRILL) just
>>>>> to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd
>>>>> part display ( BI tools , notebooks , etc..).
>>>>>
>>>>> Do you believe its  a viable solution while the JDBC and SQL GW is
>>>>> still work in progress ?
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Fabian Hueske <[hidden email]>
>>>>> Sent: 8 April 2019 11:18
>>>>> To: dev <[hidden email]>
>>>>> Subject: Re: SQL CLI and JDBC
>>>>>
>>>>> Hi Hanan,
>>>>>
>>>>> I'm not aware of any plans to add a JDBC Driver.
>>>>>
>>>>> One issue with the JDBC interface is that it only works well for
>>>>> queries on batch data and a subset of queries on streaming data.
>>>>>
>>>>> Many streaming SQL queries are not able to emit final results (or
>> need
>>>>> to update previously emitted results).
>>>>> Take for instance a query like
>>>>>
>>>>> SELECT colA, COUNT(*)
>>>>> FROM tab
>>>>> GROUP BY colA;
>>>>>
>>>>> If tab is a continuously growing table, no row of the queries result
>>>>> will ever be final because a new row with any value of colA can be
>>>>> added at any point in time.
>>>>> JDBC does not support to retract or update result rows that were
>>>>> emitted before.
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>>
>>>>> Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
>>>>> [hidden email]>:
>>>>>
>>>>>> I didn’t see any docs on this -  is there a JDBC Driver that allows
>>>>>> the same functionalities as the SQL CLI ?
>>>>>> If not , is it on the roadmap ?
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang