(DEPRECATED) Apache Flink Mailing List archive.

SQL CLI and JDBC

Classic

List

Threaded

9 messages Options

Hanan Yehudai

SQL CLI and JDBC

I didn’t see any docs on this - is there a JDBC Driver that allows the same functionalities as the SQL CLI ?
If not , is it on the roadmap ?

Fabian Hueske-2

Re: SQL CLI and JDBC

Hi Hanan,

I'm not aware of any plans to add a JDBC Driver.

One issue with the JDBC interface is that it only works well for queries on
batch data and a subset of queries on streaming data.

Many streaming SQL queries are not able to emit final results (or need to
update previously emitted results).
Take for instance a query like

SELECT colA, COUNT(*)
FROM tab
GROUP BY colA;

If tab is a continuously growing table, no row of the queries result will
ever be final because a new row with any value of colA can be added at any
point in time.
JDBC does not support to retract or update result rows that were emitted
before.

Best, Fabian

Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
[hidden email]>:

> I didn’t see any docs on this - is there a JDBC Driver that allows the
> same functionalities as the SQL CLI ?
> If not , is it on the roadmap ?
>
>

Hanan Yehudai

RE: SQL CLI and JDBC

Fabian , looking at the response below again..

As I’m currently looking into the Batch mode only ( execution result mode = table )
I was thinking of wrapping the SQL CLI code with a Calcite Adapter might do the trick.

I don’t want to have a different execution engine ( like DRILL) just to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd part display ( BI tools , notebooks , etc..).

Do you believe its a viable solution while the JDBC and SQL GW is still work in progress ?

-----Original Message-----
From: Fabian Hueske <[hidden email]>
Sent: 8 April 2019 11:18
To: dev <[hidden email]>
Subject: Re: SQL CLI and JDBC

Hi Hanan,

I'm not aware of any plans to add a JDBC Driver.

One issue with the JDBC interface is that it only works well for queries on batch data and a subset of queries on streaming data.

Many streaming SQL queries are not able to emit final results (or need to update previously emitted results).
Take for instance a query like

SELECT colA, COUNT(*)
FROM tab
GROUP BY colA;

If tab is a continuously growing table, no row of the queries result will ever be final because a new row with any value of colA can be added at any point in time.
JDBC does not support to retract or update result rows that were emitted before.

Best, Fabian

Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
[hidden email]>:

> I didn’t see any docs on this - is there a JDBC Driver that allows
> the same functionalities as the SQL CLI ?
> If not , is it on the roadmap ?
>
>

Fabian Hueske-2

Re: SQL CLI and JDBC

Hi,

I don't have much experience with Calcite connectors.

One potential problem might be fetching the results. The CLI client uses
the DataSet.collect() method which collects all results from all TMs in the
JM and (AFAIK) transfers it in a single RPC message back to the client.
Hence, this only works for small results (a few MBs) and breaks if the
result size exceeds the max message size of RPC calls. For even larger
results, it might even crash the JM.
You would need a robust mechanism to collect results from multiple TMs.

Best, Fabian

Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
[hidden email]>:

> Fabian , looking at the response below again..
>
> As I’m currently looking into the Batch mode only ( execution result mode
> = table )
> I was thinking of wrapping the SQL CLI code with a Calcite Adapter might
> do the trick.
>
> I don’t want to have a different execution engine ( like DRILL) just to
> allow ad hoc queries. And JDBC will allow me to use a lot of 3rd part
> display ( BI tools , notebooks , etc..).
>
> Do you believe its a viable solution while the JDBC and SQL GW is still
> work in progress ?
>
>
> -----Original Message-----
> From: Fabian Hueske <[hidden email]>
> Sent: 8 April 2019 11:18
> To: dev <[hidden email]>
> Subject: Re: SQL CLI and JDBC
>
> Hi Hanan,
>
> I'm not aware of any plans to add a JDBC Driver.
>
> One issue with the JDBC interface is that it only works well for queries
> on batch data and a subset of queries on streaming data.
>
> Many streaming SQL queries are not able to emit final results (or need to
> update previously emitted results).
> Take for instance a query like
>
> SELECT colA, COUNT(*)
> FROM tab
> GROUP BY colA;
>
> If tab is a continuously growing table, no row of the queries result will
> ever be final because a new row with any value of colA can be added at any
> point in time.
> JDBC does not support to retract or update result rows that were emitted
> before.
>
> Best, Fabian
>
>
> Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
> [hidden email]>:
>
> > I didn’t see any docs on this - is there a JDBC Driver that allows
> > the same functionalities as the SQL CLI ?
> > If not , is it on the roadmap ?
> >
> >
>

Hanan Yehudai

RE: SQL CLI and JDBC

Yes, I Know .

Going to replace this with Kafka once the approach will work for me 😊

-----Original Message-----
From: Fabian Hueske <[hidden email]>
Sent: 15 April 2019 11:46
To: dev <[hidden email]>
Subject: Re: SQL CLI and JDBC

Hi,

I don't have much experience with Calcite connectors.

One potential problem might be fetching the results. The CLI client uses the DataSet.collect() method which collects all results from all TMs in the JM and (AFAIK) transfers it in a single RPC message back to the client.
Hence, this only works for small results (a few MBs) and breaks if the result size exceeds the max message size of RPC calls. For even larger results, it might even crash the JM.
You would need a robust mechanism to collect results from multiple TMs.

Best, Fabian

Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
[hidden email]>:

> Fabian , looking at the response below again..
>
> As I’m currently looking into the Batch mode only ( execution result
> mode = table ) I was thinking of wrapping the SQL CLI code with a
> Calcite Adapter might do the trick.
>
> I don’t want to have a different execution engine ( like DRILL) just
> to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd
> part display ( BI tools , notebooks , etc..).
>
> Do you believe its a viable solution while the JDBC and SQL GW is
> still work in progress ?
>
>
> -----Original Message-----
> From: Fabian Hueske <[hidden email]>
> Sent: 8 April 2019 11:18
> To: dev <[hidden email]>
> Subject: Re: SQL CLI and JDBC
>
> Hi Hanan,
>
> I'm not aware of any plans to add a JDBC Driver.
>
> One issue with the JDBC interface is that it only works well for
> queries on batch data and a subset of queries on streaming data.
>
> Many streaming SQL queries are not able to emit final results (or need
> to update previously emitted results).
> Take for instance a query like
>
> SELECT colA, COUNT(*)
> FROM tab
> GROUP BY colA;
>
> If tab is a continuously growing table, no row of the queries result
> will ever be final because a new row with any value of colA can be
> added at any point in time.
> JDBC does not support to retract or update result rows that were
> emitted before.
>
> Best, Fabian
>
>
> Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
> [hidden email]>:
>
> > I didn’t see any docs on this - is there a JDBC Driver that allows
> > the same functionalities as the SQL CLI ?
> > If not , is it on the roadmap ?
> >
> >
>

Stephan Ewen

Re: SQL CLI and JDBC

I think this problem sounds fixable. Having proper JDBC support through the
SQL client would be really cool!

Adding Timo and Shaoxuan here:

Let's assume that the "collect()" call supports large results (I think we
can get that support through the blob manager with some changes).
What do you think about adding JDBC support?

On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]>
wrote:

> Yes, I Know .
>
> Going to replace this with Kafka once the approach will work for me 😊
>
>
> -----Original Message-----
> From: Fabian Hueske <[hidden email]>
> Sent: 15 April 2019 11:46
> To: dev <[hidden email]>
> Subject: Re: SQL CLI and JDBC
>
> Hi,
>
> I don't have much experience with Calcite connectors.
>
> One potential problem might be fetching the results. The CLI client uses
> the DataSet.collect() method which collects all results from all TMs in the
> JM and (AFAIK) transfers it in a single RPC message back to the client.
> Hence, this only works for small results (a few MBs) and breaks if the
> result size exceeds the max message size of RPC calls. For even larger
> results, it might even crash the JM.
> You would need a robust mechanism to collect results from multiple TMs.
>
> Best, Fabian
>
>
> Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
> [hidden email]>:
>
> > Fabian , looking at the response below again..
> >
> > As I’m currently looking into the Batch mode only ( execution result
> > mode = table ) I was thinking of wrapping the SQL CLI code with a
> > Calcite Adapter might do the trick.
> >
> > I don’t want to have a different execution engine ( like DRILL) just
> > to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd
> > part display ( BI tools , notebooks , etc..).
> >
> > Do you believe its a viable solution while the JDBC and SQL GW is
> > still work in progress ?
> >
> >
> > -----Original Message-----
> > From: Fabian Hueske <[hidden email]>
> > Sent: 8 April 2019 11:18
> > To: dev <[hidden email]>
> > Subject: Re: SQL CLI and JDBC
> >
> > Hi Hanan,
> >
> > I'm not aware of any plans to add a JDBC Driver.
> >
> > One issue with the JDBC interface is that it only works well for
> > queries on batch data and a subset of queries on streaming data.
> >
> > Many streaming SQL queries are not able to emit final results (or need
> > to update previously emitted results).
> > Take for instance a query like
> >
> > SELECT colA, COUNT(*)
> > FROM tab
> > GROUP BY colA;
> >
> > If tab is a continuously growing table, no row of the queries result
> > will ever be final because a new row with any value of colA can be
> > added at any point in time.
> > JDBC does not support to retract or update result rows that were
> > emitted before.
> >
> > Best, Fabian
> >
> >
> > Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
> > [hidden email]>:
> >
> > > I didn’t see any docs on this - is there a JDBC Driver that allows
> > > the same functionalities as the SQL CLI ?
> > > If not , is it on the roadmap ?
> > >
> > >
> >
>

Kurt Young

Re: SQL CLI and JDBC

Also +1 to support JDBC.

Best,
Kurt

On Wed, Apr 17, 2019 at 7:38 PM Stephan Ewen <[hidden email]> wrote:

> I think this problem sounds fixable. Having proper JDBC support through the
> SQL client would be really cool!
>
> Adding Timo and Shaoxuan here:
>
> Let's assume that the "collect()" call supports large results (I think we
> can get that support through the blob manager with some changes).
> What do you think about adding JDBC support?
>
> On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]>
> wrote:
>
> > Yes, I Know .
> >
> > Going to replace this with Kafka once the approach will work for me 😊
> >
> >
> > -----Original Message-----
> > From: Fabian Hueske <[hidden email]>
> > Sent: 15 April 2019 11:46
> > To: dev <[hidden email]>
> > Subject: Re: SQL CLI and JDBC
> >
> > Hi,
> >
> > I don't have much experience with Calcite connectors.
> >
> > One potential problem might be fetching the results. The CLI client uses
> > the DataSet.collect() method which collects all results from all TMs in
> the
> > JM and (AFAIK) transfers it in a single RPC message back to the client.
> > Hence, this only works for small results (a few MBs) and breaks if the
> > result size exceeds the max message size of RPC calls. For even larger
> > results, it might even crash the JM.
> > You would need a robust mechanism to collect results from multiple TMs.
> >
> > Best, Fabian
> >
> >
> > Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
> > [hidden email]>:
> >
> > > Fabian , looking at the response below again..
> > >
> > > As I’m currently looking into the Batch mode only ( execution result
> > > mode = table ) I was thinking of wrapping the SQL CLI code with a
> > > Calcite Adapter might do the trick.
> > >
> > > I don’t want to have a different execution engine ( like DRILL) just
> > > to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd
> > > part display ( BI tools , notebooks , etc..).
> > >
> > > Do you believe its a viable solution while the JDBC and SQL GW is
> > > still work in progress ?
> > >
> > >
> > > -----Original Message-----
> > > From: Fabian Hueske <[hidden email]>
> > > Sent: 8 April 2019 11:18
> > > To: dev <[hidden email]>
> > > Subject: Re: SQL CLI and JDBC
> > >
> > > Hi Hanan,
> > >
> > > I'm not aware of any plans to add a JDBC Driver.
> > >
> > > One issue with the JDBC interface is that it only works well for
> > > queries on batch data and a subset of queries on streaming data.
> > >
> > > Many streaming SQL queries are not able to emit final results (or need
> > > to update previously emitted results).
> > > Take for instance a query like
> > >
> > > SELECT colA, COUNT(*)
> > > FROM tab
> > > GROUP BY colA;
> > >
> > > If tab is a continuously growing table, no row of the queries result
> > > will ever be final because a new row with any value of colA can be
> > > added at any point in time.
> > > JDBC does not support to retract or update result rows that were
> > > emitted before.
> > >
> > > Best, Fabian
> > >
> > >
> > > Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
> > > [hidden email]>:
> > >
> > > > I didn’t see any docs on this - is there a JDBC Driver that allows
> > > > the same functionalities as the SQL CLI ?
> > > > If not , is it on the roadmap ?
> > > >
> > > >
> > >
> >
>

Jeff Zhang

Re: SQL CLI and JDBC

+1 for supporting jdbc. One concern is that we need to provide a dedicated
service to jdbc support. But sql-client is not designed to be a service
IIUC, it doesn't expose any api for users, and it is designed to be used by
single user, not for multiple users and concurrent usage.

IMHO, we might need to create a new dedicated service for jdbc support,
something like hive's thrift server.

Kurt Young <[hidden email]> 于2019年4月17日周三下午7:40写道：

> Also +1 to support JDBC.
>
> Best,
> Kurt
>
>
> On Wed, Apr 17, 2019 at 7:38 PM Stephan Ewen <[hidden email]> wrote:
>
> > I think this problem sounds fixable. Having proper JDBC support through
> the
> > SQL client would be really cool!
> >
> > Adding Timo and Shaoxuan here:
> >
> > Let's assume that the "collect()" call supports large results (I think we
> > can get that support through the blob manager with some changes).
> > What do you think about adding JDBC support?
> >
> > On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]>
> > wrote:
> >
> > > Yes, I Know .
> > >
> > > Going to replace this with Kafka once the approach will work for me 😊
> > >
> > >
> > > -----Original Message-----
> > > From: Fabian Hueske <[hidden email]>
> > > Sent: 15 April 2019 11:46
> > > To: dev <[hidden email]>
> > > Subject: Re: SQL CLI and JDBC
> > >
> > > Hi,
> > >
> > > I don't have much experience with Calcite connectors.
> > >
> > > One potential problem might be fetching the results. The CLI client
> uses
> > > the DataSet.collect() method which collects all results from all TMs in
> > the
> > > JM and (AFAIK) transfers it in a single RPC message back to the client.
> > > Hence, this only works for small results (a few MBs) and breaks if the
> > > result size exceeds the max message size of RPC calls. For even larger
> > > results, it might even crash the JM.
> > > You would need a robust mechanism to collect results from multiple TMs.
> > >
> > > Best, Fabian
> > >
> > >
> > > Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
> > > [hidden email]>:
> > >
> > > > Fabian , looking at the response below again..
> > > >
> > > > As I’m currently looking into the Batch mode only ( execution result
> > > > mode = table ) I was thinking of wrapping the SQL CLI code with a
> > > > Calcite Adapter might do the trick.
> > > >
> > > > I don’t want to have a different execution engine ( like DRILL) just
> > > > to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd
> > > > part display ( BI tools , notebooks , etc..).
> > > >
> > > > Do you believe its a viable solution while the JDBC and SQL GW is
> > > > still work in progress ?
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Fabian Hueske <[hidden email]>
> > > > Sent: 8 April 2019 11:18
> > > > To: dev <[hidden email]>
> > > > Subject: Re: SQL CLI and JDBC
> > > >
> > > > Hi Hanan,
> > > >
> > > > I'm not aware of any plans to add a JDBC Driver.
> > > >
> > > > One issue with the JDBC interface is that it only works well for
> > > > queries on batch data and a subset of queries on streaming data.
> > > >
> > > > Many streaming SQL queries are not able to emit final results (or
> need
> > > > to update previously emitted results).
> > > > Take for instance a query like
> > > >
> > > > SELECT colA, COUNT(*)
> > > > FROM tab
> > > > GROUP BY colA;
> > > >
> > > > If tab is a continuously growing table, no row of the queries result
> > > > will ever be final because a new row with any value of colA can be
> > > > added at any point in time.
> > > > JDBC does not support to retract or update result rows that were
> > > > emitted before.
> > > >
> > > > Best, Fabian
> > > >
> > > >
> > > > Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
> > > > [hidden email]>:
> > > >
> > > > > I didn’t see any docs on this - is there a JDBC Driver that allows
> > > > > the same functionalities as the SQL CLI ?
> > > > > If not , is it on the roadmap ?
> > > > >
> > > > >
> > > >
> > >
> >
>

--
Best Regards

Jeff Zhang

zhang yue

Re: SQL CLI and JDBC

> 在 2019年4月17日，下午9:14，Jeff Zhang <[hidden email]> 写道：
>
> +1 for supporting jdbc. One concern is that we need to provide a dedicated
> service to jdbc support. But sql-client is not designed to be a service
> IIUC, it doesn't expose any api for users, and it is designed to be used by
> single user, not for multiple users and concurrent usage.
>
> IMHO, we might need to create a new dedicated service for jdbc support,
> something like hive's thrift server.
>
> Kurt Young <[hidden email]> 于2019年4月17日周三下午7:40写道：
>
>> Also +1 to support JDBC.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Apr 17, 2019 at 7:38 PM Stephan Ewen <[hidden email]> wrote:
>>
>>> I think this problem sounds fixable. Having proper JDBC support through
>> the
>>> SQL client would be really cool!
>>>
>>> Adding Timo and Shaoxuan here:
>>>
>>> Let's assume that the "collect()" call supports large results (I think we
>>> can get that support through the blob manager with some changes).
>>> What do you think about adding JDBC support?
>>>
>>> On Tue, Apr 16, 2019 at 9:19 AM Hanan Yehudai <[hidden email]>
>>> wrote:
>>>
>>>> Yes, I Know .
>>>>
>>>> Going to replace this with Kafka once the approach will work for me 😊
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Fabian Hueske <[hidden email]>
>>>> Sent: 15 April 2019 11:46
>>>> To: dev <[hidden email]>
>>>> Subject: Re: SQL CLI and JDBC
>>>>
>>>> Hi,
>>>>
>>>> I don't have much experience with Calcite connectors.
>>>>
>>>> One potential problem might be fetching the results. The CLI client
>> uses
>>>> the DataSet.collect() method which collects all results from all TMs in
>>> the
>>>> JM and (AFAIK) transfers it in a single RPC message back to the client.
>>>> Hence, this only works for small results (a few MBs) and breaks if the
>>>> result size exceeds the max message size of RPC calls. For even larger
>>>> results, it might even crash the JM.
>>>> You would need a robust mechanism to collect results from multiple TMs.
>>>>
>>>> Best, Fabian
>>>>
>>>>
>>>> Am So., 14. Apr. 2019 um 09:28 Uhr schrieb Hanan Yehudai <
>>>> [hidden email]>:
>>>>
>>>>> Fabian , looking at the response below again..
>>>>>
>>>>> As I’m currently looking into the Batch mode only ( execution result
>>>>> mode = table ) I was thinking of wrapping the SQL CLI code with a
>>>>> Calcite Adapter might do the trick.
>>>>>
>>>>> I don’t want to have a different execution engine ( like DRILL) just
>>>>> to allow ad hoc queries. And JDBC will allow me to use a lot of 3rd
>>>>> part display ( BI tools , notebooks , etc..).
>>>>>
>>>>> Do you believe its a viable solution while the JDBC and SQL GW is
>>>>> still work in progress ?
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Fabian Hueske <[hidden email]>
>>>>> Sent: 8 April 2019 11:18
>>>>> To: dev <[hidden email]>
>>>>> Subject: Re: SQL CLI and JDBC
>>>>>
>>>>> Hi Hanan,
>>>>>
>>>>> I'm not aware of any plans to add a JDBC Driver.
>>>>>
>>>>> One issue with the JDBC interface is that it only works well for
>>>>> queries on batch data and a subset of queries on streaming data.
>>>>>
>>>>> Many streaming SQL queries are not able to emit final results (or
>> need
>>>>> to update previously emitted results).
>>>>> Take for instance a query like
>>>>>
>>>>> SELECT colA, COUNT(*)
>>>>> FROM tab
>>>>> GROUP BY colA;
>>>>>
>>>>> If tab is a continuously growing table, no row of the queries result
>>>>> will ever be final because a new row with any value of colA can be
>>>>> added at any point in time.
>>>>> JDBC does not support to retract or update result rows that were
>>>>> emitted before.
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>>
>>>>> Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
>>>>> [hidden email]>:
>>>>>
>>>>>> I didn’t see any docs on this - is there a JDBC Driver that allows
>>>>>> the same functionalities as the SQL CLI ?
>>>>>> If not , is it on the roadmap ?
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang