SQL on Flink

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

SQL on Flink

Timo Walther-2
Hey everyone,

I would be interested in having a complete SQL API in Flink. How is the
status there? Is someone already working on it? If not, I would like to
work on it. I found http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf but
I couldn't find anything on the mailing list or Jira. Otherwise I would
open an issue and start a discussion about it there.

Regards,
Timo
Reply | Threaded
Open this post in threaded view
|

Re: SQL on Flink

Fabian Hueske-2
Hi,

Flink's Table API is pretty close to what SQL provides. IMO, the best
approach would be to leverage that and build a SQL parser (maybe together
with a logical optimizer) on top of the Table API. Parser (and optimizer)
could be built using Apache Calcite which is providing exactly this.

Since the Table API is still a fairly new component and not very feature
rich, it might make sense to extend and strengthen it before putting
something major on top.

Cheers, Fabian

2015-05-26 21:38 GMT+02:00 Timo Walther <[hidden email]>:

> Hey everyone,
>
> I would be interested in having a complete SQL API in Flink. How is the
> status there? Is someone already working on it? If not, I would like to
> work on it. I found http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf but
> I couldn't find anything on the mailing list or Jira. Otherwise I would
> open an issue and start a discussion about it there.
>
> Regards,
> Timo
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL on Flink

Ted Dunning
It would also be relatively simple (I think) to retarget drill to Flink if
Flink doesn't provide enough typing meta-data to do traditional SQL.



On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske <[hidden email]> wrote:

> Hi,
>
> Flink's Table API is pretty close to what SQL provides. IMO, the best
> approach would be to leverage that and build a SQL parser (maybe together
> with a logical optimizer) on top of the Table API. Parser (and optimizer)
> could be built using Apache Calcite which is providing exactly this.
>
> Since the Table API is still a fairly new component and not very feature
> rich, it might make sense to extend and strengthen it before putting
> something major on top.
>
> Cheers, Fabian
>
> 2015-05-26 21:38 GMT+02:00 Timo Walther <[hidden email]>:
>
> > Hey everyone,
> >
> > I would be interested in having a complete SQL API in Flink. How is the
> > status there? Is someone already working on it? If not, I would like to
> > work on it. I found http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf but
> > I couldn't find anything on the mailing list or Jira. Otherwise I would
> > open an issue and start a discussion about it there.
> >
> > Regards,
> > Timo
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL on Flink

Robert Metzger
I didn't know that paper...  Thanks for sharing.

I've worked on a SQL layer for Stratosphere some time ago, using Apache
Calcite (called Optiq back then). I think the project provides a lot of
very good tooling for creating a SQL layer. So if we decide to go for SQL
on Flink, I would suggest to use Calcite.
I can also help you a bit with Calcite to get started with it.

I agree with Fabian that it would probably make more sense for now to
enhance the Table API.
I think the biggest limitation right now is that it only supports POJOs.
We should also support Tuples (I know thats difficult to do), data from
HCatalog (that includes parquet & orc), JSON, ...
Then, I would add filter and projection pushdown into the table API.



On Tue, May 26, 2015 at 10:03 PM, Ted Dunning <[hidden email]> wrote:

> It would also be relatively simple (I think) to retarget drill to Flink if
> Flink doesn't provide enough typing meta-data to do traditional SQL.
>
>
>
> On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske <[hidden email]> wrote:
>
> > Hi,
> >
> > Flink's Table API is pretty close to what SQL provides. IMO, the best
> > approach would be to leverage that and build a SQL parser (maybe together
> > with a logical optimizer) on top of the Table API. Parser (and optimizer)
> > could be built using Apache Calcite which is providing exactly this.
> >
> > Since the Table API is still a fairly new component and not very feature
> > rich, it might make sense to extend and strengthen it before putting
> > something major on top.
> >
> > Cheers, Fabian
> >
> > 2015-05-26 21:38 GMT+02:00 Timo Walther <[hidden email]>:
> >
> > > Hey everyone,
> > >
> > > I would be interested in having a complete SQL API in Flink. How is the
> > > status there? Is someone already working on it? If not, I would like to
> > > work on it. I found http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf
> but
> > > I couldn't find anything on the mailing list or Jira. Otherwise I would
> > > open an issue and start a discussion about it there.
> > >
> > > Regards,
> > > Timo
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL on Flink

Stephan Ewen
I see no reason why a SQL interface cannot be "bootstrapped" concurrently.
It would initially not support many operations,
but would act as a good source to test and drive functionality from the
Table API.


@Ted:

I would like to learn a bit more about the stack and internal abstractions
of Drill. It may make sense to
reuse some of the query execution operators from Drill. I especially like
the "learning schema on the fly" part of drill.

Flink DataSets and Streams have a schema, but it may in several cases be a
"schema lower bound", like the greatest common superclass.
Those cases may benefit big time from Drill's ability to refine schema on
the fly.

That may be useful also in the Table API, making it again available to
LINQ-like programs, and SQL scripts.

On Wed, May 27, 2015 at 3:49 PM, Robert Metzger <[hidden email]> wrote:

> I didn't know that paper...  Thanks for sharing.
>
> I've worked on a SQL layer for Stratosphere some time ago, using Apache
> Calcite (called Optiq back then). I think the project provides a lot of
> very good tooling for creating a SQL layer. So if we decide to go for SQL
> on Flink, I would suggest to use Calcite.
> I can also help you a bit with Calcite to get started with it.
>
> I agree with Fabian that it would probably make more sense for now to
> enhance the Table API.
> I think the biggest limitation right now is that it only supports POJOs.
> We should also support Tuples (I know thats difficult to do), data from
> HCatalog (that includes parquet & orc), JSON, ...
> Then, I would add filter and projection pushdown into the table API.
>
>
>
> On Tue, May 26, 2015 at 10:03 PM, Ted Dunning <[hidden email]>
> wrote:
>
> > It would also be relatively simple (I think) to retarget drill to Flink
> if
> > Flink doesn't provide enough typing meta-data to do traditional SQL.
> >
> >
> >
> > On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske <[hidden email]>
> wrote:
> >
> > > Hi,
> > >
> > > Flink's Table API is pretty close to what SQL provides. IMO, the best
> > > approach would be to leverage that and build a SQL parser (maybe
> together
> > > with a logical optimizer) on top of the Table API. Parser (and
> optimizer)
> > > could be built using Apache Calcite which is providing exactly this.
> > >
> > > Since the Table API is still a fairly new component and not very
> feature
> > > rich, it might make sense to extend and strengthen it before putting
> > > something major on top.
> > >
> > > Cheers, Fabian
> > >
> > > 2015-05-26 21:38 GMT+02:00 Timo Walther <[hidden email]>:
> > >
> > > > Hey everyone,
> > > >
> > > > I would be interested in having a complete SQL API in Flink. How is
> the
> > > > status there? Is someone already working on it? If not, I would like
> to
> > > > work on it. I found http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf
> > but
> > > > I couldn't find anything on the mailing list or Jira. Otherwise I
> would
> > > > open an issue and start a discussion about it there.
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL on Flink

Fabian Hueske-2
IMO, it is better to have one feature that is reasonably well developed
instead of two half-baked features. That's why I proposed to advance the
Table API a bit further before starting the next big thing. I played around
with the Table API recently and I think it definitely needs a bit more
contributor attention and more features to be actually usable. Also since
all features of the SQL interface need to be included in the Table API
(given we follow the SQL on Table approach) it makes sense IMO to push the
Table API a bit further before going for the next thing.

2015-05-27 16:06 GMT+02:00 Stephan Ewen <[hidden email]>:

> I see no reason why a SQL interface cannot be "bootstrapped" concurrently.
> It would initially not support many operations,
> but would act as a good source to test and drive functionality from the
> Table API.
>
>
> @Ted:
>
> I would like to learn a bit more about the stack and internal abstractions
> of Drill. It may make sense to
> reuse some of the query execution operators from Drill. I especially like
> the "learning schema on the fly" part of drill.
>
> Flink DataSets and Streams have a schema, but it may in several cases be a
> "schema lower bound", like the greatest common superclass.
> Those cases may benefit big time from Drill's ability to refine schema on
> the fly.
>
> That may be useful also in the Table API, making it again available to
> LINQ-like programs, and SQL scripts.
>
> On Wed, May 27, 2015 at 3:49 PM, Robert Metzger <[hidden email]>
> wrote:
>
> > I didn't know that paper...  Thanks for sharing.
> >
> > I've worked on a SQL layer for Stratosphere some time ago, using Apache
> > Calcite (called Optiq back then). I think the project provides a lot of
> > very good tooling for creating a SQL layer. So if we decide to go for SQL
> > on Flink, I would suggest to use Calcite.
> > I can also help you a bit with Calcite to get started with it.
> >
> > I agree with Fabian that it would probably make more sense for now to
> > enhance the Table API.
> > I think the biggest limitation right now is that it only supports POJOs.
> > We should also support Tuples (I know thats difficult to do), data from
> > HCatalog (that includes parquet & orc), JSON, ...
> > Then, I would add filter and projection pushdown into the table API.
> >
> >
> >
> > On Tue, May 26, 2015 at 10:03 PM, Ted Dunning <[hidden email]>
> > wrote:
> >
> > > It would also be relatively simple (I think) to retarget drill to Flink
> > if
> > > Flink doesn't provide enough typing meta-data to do traditional SQL.
> > >
> > >
> > >
> > > On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske <[hidden email]>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > Flink's Table API is pretty close to what SQL provides. IMO, the best
> > > > approach would be to leverage that and build a SQL parser (maybe
> > together
> > > > with a logical optimizer) on top of the Table API. Parser (and
> > optimizer)
> > > > could be built using Apache Calcite which is providing exactly this.
> > > >
> > > > Since the Table API is still a fairly new component and not very
> > feature
> > > > rich, it might make sense to extend and strengthen it before putting
> > > > something major on top.
> > > >
> > > > Cheers, Fabian
> > > >
> > > > 2015-05-26 21:38 GMT+02:00 Timo Walther <[hidden email]>:
> > > >
> > > > > Hey everyone,
> > > > >
> > > > > I would be interested in having a complete SQL API in Flink. How is
> > the
> > > > > status there? Is someone already working on it? If not, I would
> like
> > to
> > > > > work on it. I found
> http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf
> > > but
> > > > > I couldn't find anything on the mailing list or Jira. Otherwise I
> > would
> > > > > open an issue and start a discussion about it there.
> > > > >
> > > > > Regards,
> > > > > Timo
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL on Flink

Kostas Tzoumas-2
I think Fabian's arguments make a lot of sense.

However, if Timo *really wants* to start SQL on top of Table, that is what
he will do a great job at :-) As usual, we can keep it in beta status in
flink-staging until it is mature... and it will help create issues for the
Table API and give direction to its development. Perhaps we will have a
feature-poor SQL for a bit, then switch to hardening the Table API to
support more features and then back to SQL.

I'm just advocating for "committer passion"-first here :-) Perhaps Timo
should weight in

On Wed, May 27, 2015 at 4:19 PM, Fabian Hueske <[hidden email]> wrote:

> IMO, it is better to have one feature that is reasonably well developed
> instead of two half-baked features. That's why I proposed to advance the
> Table API a bit further before starting the next big thing. I played around
> with the Table API recently and I think it definitely needs a bit more
> contributor attention and more features to be actually usable. Also since
> all features of the SQL interface need to be included in the Table API
> (given we follow the SQL on Table approach) it makes sense IMO to push the
> Table API a bit further before going for the next thing.
>
> 2015-05-27 16:06 GMT+02:00 Stephan Ewen <[hidden email]>:
>
> > I see no reason why a SQL interface cannot be "bootstrapped"
> concurrently.
> > It would initially not support many operations,
> > but would act as a good source to test and drive functionality from the
> > Table API.
> >
> >
> > @Ted:
> >
> > I would like to learn a bit more about the stack and internal
> abstractions
> > of Drill. It may make sense to
> > reuse some of the query execution operators from Drill. I especially like
> > the "learning schema on the fly" part of drill.
> >
> > Flink DataSets and Streams have a schema, but it may in several cases be
> a
> > "schema lower bound", like the greatest common superclass.
> > Those cases may benefit big time from Drill's ability to refine schema on
> > the fly.
> >
> > That may be useful also in the Table API, making it again available to
> > LINQ-like programs, and SQL scripts.
> >
> > On Wed, May 27, 2015 at 3:49 PM, Robert Metzger <[hidden email]>
> > wrote:
> >
> > > I didn't know that paper...  Thanks for sharing.
> > >
> > > I've worked on a SQL layer for Stratosphere some time ago, using Apache
> > > Calcite (called Optiq back then). I think the project provides a lot of
> > > very good tooling for creating a SQL layer. So if we decide to go for
> SQL
> > > on Flink, I would suggest to use Calcite.
> > > I can also help you a bit with Calcite to get started with it.
> > >
> > > I agree with Fabian that it would probably make more sense for now to
> > > enhance the Table API.
> > > I think the biggest limitation right now is that it only supports
> POJOs.
> > > We should also support Tuples (I know thats difficult to do), data from
> > > HCatalog (that includes parquet & orc), JSON, ...
> > > Then, I would add filter and projection pushdown into the table API.
> > >
> > >
> > >
> > > On Tue, May 26, 2015 at 10:03 PM, Ted Dunning <[hidden email]>
> > > wrote:
> > >
> > > > It would also be relatively simple (I think) to retarget drill to
> Flink
> > > if
> > > > Flink doesn't provide enough typing meta-data to do traditional SQL.
> > > >
> > > >
> > > >
> > > > On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske <[hidden email]>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Flink's Table API is pretty close to what SQL provides. IMO, the
> best
> > > > > approach would be to leverage that and build a SQL parser (maybe
> > > together
> > > > > with a logical optimizer) on top of the Table API. Parser (and
> > > optimizer)
> > > > > could be built using Apache Calcite which is providing exactly
> this.
> > > > >
> > > > > Since the Table API is still a fairly new component and not very
> > > feature
> > > > > rich, it might make sense to extend and strengthen it before
> putting
> > > > > something major on top.
> > > > >
> > > > > Cheers, Fabian
> > > > >
> > > > > 2015-05-26 21:38 GMT+02:00 Timo Walther <[hidden email]>:
> > > > >
> > > > > > Hey everyone,
> > > > > >
> > > > > > I would be interested in having a complete SQL API in Flink. How
> is
> > > the
> > > > > > status there? Is someone already working on it? If not, I would
> > like
> > > to
> > > > > > work on it. I found
> > http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf
> > > > but
> > > > > > I couldn't find anything on the mailing list or Jira. Otherwise I
> > > would
> > > > > > open an issue and start a discussion about it there.
> > > > > >
> > > > > > Regards,
> > > > > > Timo
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL on Flink

Fabian Hueske-2
+1 for committer passion!

Please don't get me wrong, I think SQL on Flink would be a great feature.
I just wanted to make the point that the Table API needs to mirror all SQL
features, if SQL is implemented on top of the Table API.


2015-05-27 16:34 GMT+02:00 Kostas Tzoumas <[hidden email]>:

> I think Fabian's arguments make a lot of sense.
>
> However, if Timo *really wants* to start SQL on top of Table, that is what
> he will do a great job at :-) As usual, we can keep it in beta status in
> flink-staging until it is mature... and it will help create issues for the
> Table API and give direction to its development. Perhaps we will have a
> feature-poor SQL for a bit, then switch to hardening the Table API to
> support more features and then back to SQL.
>
> I'm just advocating for "committer passion"-first here :-) Perhaps Timo
> should weight in
>
> On Wed, May 27, 2015 at 4:19 PM, Fabian Hueske <[hidden email]> wrote:
>
> > IMO, it is better to have one feature that is reasonably well developed
> > instead of two half-baked features. That's why I proposed to advance the
> > Table API a bit further before starting the next big thing. I played
> around
> > with the Table API recently and I think it definitely needs a bit more
> > contributor attention and more features to be actually usable. Also since
> > all features of the SQL interface need to be included in the Table API
> > (given we follow the SQL on Table approach) it makes sense IMO to push
> the
> > Table API a bit further before going for the next thing.
> >
> > 2015-05-27 16:06 GMT+02:00 Stephan Ewen <[hidden email]>:
> >
> > > I see no reason why a SQL interface cannot be "bootstrapped"
> > concurrently.
> > > It would initially not support many operations,
> > > but would act as a good source to test and drive functionality from the
> > > Table API.
> > >
> > >
> > > @Ted:
> > >
> > > I would like to learn a bit more about the stack and internal
> > abstractions
> > > of Drill. It may make sense to
> > > reuse some of the query execution operators from Drill. I especially
> like
> > > the "learning schema on the fly" part of drill.
> > >
> > > Flink DataSets and Streams have a schema, but it may in several cases
> be
> > a
> > > "schema lower bound", like the greatest common superclass.
> > > Those cases may benefit big time from Drill's ability to refine schema
> on
> > > the fly.
> > >
> > > That may be useful also in the Table API, making it again available to
> > > LINQ-like programs, and SQL scripts.
> > >
> > > On Wed, May 27, 2015 at 3:49 PM, Robert Metzger <[hidden email]>
> > > wrote:
> > >
> > > > I didn't know that paper...  Thanks for sharing.
> > > >
> > > > I've worked on a SQL layer for Stratosphere some time ago, using
> Apache
> > > > Calcite (called Optiq back then). I think the project provides a lot
> of
> > > > very good tooling for creating a SQL layer. So if we decide to go for
> > SQL
> > > > on Flink, I would suggest to use Calcite.
> > > > I can also help you a bit with Calcite to get started with it.
> > > >
> > > > I agree with Fabian that it would probably make more sense for now to
> > > > enhance the Table API.
> > > > I think the biggest limitation right now is that it only supports
> > POJOs.
> > > > We should also support Tuples (I know thats difficult to do), data
> from
> > > > HCatalog (that includes parquet & orc), JSON, ...
> > > > Then, I would add filter and projection pushdown into the table API.
> > > >
> > > >
> > > >
> > > > On Tue, May 26, 2015 at 10:03 PM, Ted Dunning <[hidden email]
> >
> > > > wrote:
> > > >
> > > > > It would also be relatively simple (I think) to retarget drill to
> > Flink
> > > > if
> > > > > Flink doesn't provide enough typing meta-data to do traditional
> SQL.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske <[hidden email]
> >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Flink's Table API is pretty close to what SQL provides. IMO, the
> > best
> > > > > > approach would be to leverage that and build a SQL parser (maybe
> > > > together
> > > > > > with a logical optimizer) on top of the Table API. Parser (and
> > > > optimizer)
> > > > > > could be built using Apache Calcite which is providing exactly
> > this.
> > > > > >
> > > > > > Since the Table API is still a fairly new component and not very
> > > > feature
> > > > > > rich, it might make sense to extend and strengthen it before
> > putting
> > > > > > something major on top.
> > > > > >
> > > > > > Cheers, Fabian
> > > > > >
> > > > > > 2015-05-26 21:38 GMT+02:00 Timo Walther <[hidden email]>:
> > > > > >
> > > > > > > Hey everyone,
> > > > > > >
> > > > > > > I would be interested in having a complete SQL API in Flink.
> How
> > is
> > > > the
> > > > > > > status there? Is someone already working on it? If not, I would
> > > like
> > > > to
> > > > > > > work on it. I found
> > > http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf
> > > > > but
> > > > > > > I couldn't find anything on the mailing list or Jira.
> Otherwise I
> > > > would
> > > > > > > open an issue and start a discussion about it there.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Timo
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL on Flink

Timo Walther-2
It's rather passion for the future of the project than passion for SQL ;-)

I always try to think like someone from the economy. And IMO the guys
from economy are still thinking in SQL. If you want to persuade someone
coming from the SQL world, you should offer a SQL interface to run
legacy code first (similar to Hadoop operators). Rewriting old queries
in Table API is not very convenient.

I share Stephans opinion. Building both APIs concurrently would act as a
good source to test and extend the Table API. Currently, the Table API
is half-done, but I think the goal is to have SQL functionality. I can
implement an SQL operator and extend the Table API if functionality is
missing.

On 27.05.2015 16:41, Fabian Hueske wrote:

> +1 for committer passion!
>
> Please don't get me wrong, I think SQL on Flink would be a great feature.
> I just wanted to make the point that the Table API needs to mirror all SQL
> features, if SQL is implemented on top of the Table API.
>
>
> 2015-05-27 16:34 GMT+02:00 Kostas Tzoumas <[hidden email]>:
>
>> I think Fabian's arguments make a lot of sense.
>>
>> However, if Timo *really wants* to start SQL on top of Table, that is what
>> he will do a great job at :-) As usual, we can keep it in beta status in
>> flink-staging until it is mature... and it will help create issues for the
>> Table API and give direction to its development. Perhaps we will have a
>> feature-poor SQL for a bit, then switch to hardening the Table API to
>> support more features and then back to SQL.
>>
>> I'm just advocating for "committer passion"-first here :-) Perhaps Timo
>> should weight in
>>
>> On Wed, May 27, 2015 at 4:19 PM, Fabian Hueske <[hidden email]> wrote:
>>
>>> IMO, it is better to have one feature that is reasonably well developed
>>> instead of two half-baked features. That's why I proposed to advance the
>>> Table API a bit further before starting the next big thing. I played
>> around
>>> with the Table API recently and I think it definitely needs a bit more
>>> contributor attention and more features to be actually usable. Also since
>>> all features of the SQL interface need to be included in the Table API
>>> (given we follow the SQL on Table approach) it makes sense IMO to push
>> the
>>> Table API a bit further before going for the next thing.
>>>
>>> 2015-05-27 16:06 GMT+02:00 Stephan Ewen <[hidden email]>:
>>>
>>>> I see no reason why a SQL interface cannot be "bootstrapped"
>>> concurrently.
>>>> It would initially not support many operations,
>>>> but would act as a good source to test and drive functionality from the
>>>> Table API.
>>>>
>>>>
>>>> @Ted:
>>>>
>>>> I would like to learn a bit more about the stack and internal
>>> abstractions
>>>> of Drill. It may make sense to
>>>> reuse some of the query execution operators from Drill. I especially
>> like
>>>> the "learning schema on the fly" part of drill.
>>>>
>>>> Flink DataSets and Streams have a schema, but it may in several cases
>> be
>>> a
>>>> "schema lower bound", like the greatest common superclass.
>>>> Those cases may benefit big time from Drill's ability to refine schema
>> on
>>>> the fly.
>>>>
>>>> That may be useful also in the Table API, making it again available to
>>>> LINQ-like programs, and SQL scripts.
>>>>
>>>> On Wed, May 27, 2015 at 3:49 PM, Robert Metzger <[hidden email]>
>>>> wrote:
>>>>
>>>>> I didn't know that paper...  Thanks for sharing.
>>>>>
>>>>> I've worked on a SQL layer for Stratosphere some time ago, using
>> Apache
>>>>> Calcite (called Optiq back then). I think the project provides a lot
>> of
>>>>> very good tooling for creating a SQL layer. So if we decide to go for
>>> SQL
>>>>> on Flink, I would suggest to use Calcite.
>>>>> I can also help you a bit with Calcite to get started with it.
>>>>>
>>>>> I agree with Fabian that it would probably make more sense for now to
>>>>> enhance the Table API.
>>>>> I think the biggest limitation right now is that it only supports
>>> POJOs.
>>>>> We should also support Tuples (I know thats difficult to do), data
>> from
>>>>> HCatalog (that includes parquet & orc), JSON, ...
>>>>> Then, I would add filter and projection pushdown into the table API.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, May 26, 2015 at 10:03 PM, Ted Dunning <[hidden email]
>>>>> wrote:
>>>>>
>>>>>> It would also be relatively simple (I think) to retarget drill to
>>> Flink
>>>>> if
>>>>>> Flink doesn't provide enough typing meta-data to do traditional
>> SQL.
>>>>>>
>>>>>>
>>>>>> On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske <[hidden email]
>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Flink's Table API is pretty close to what SQL provides. IMO, the
>>> best
>>>>>>> approach would be to leverage that and build a SQL parser (maybe
>>>>> together
>>>>>>> with a logical optimizer) on top of the Table API. Parser (and
>>>>> optimizer)
>>>>>>> could be built using Apache Calcite which is providing exactly
>>> this.
>>>>>>> Since the Table API is still a fairly new component and not very
>>>>> feature
>>>>>>> rich, it might make sense to extend and strengthen it before
>>> putting
>>>>>>> something major on top.
>>>>>>>
>>>>>>> Cheers, Fabian
>>>>>>>
>>>>>>> 2015-05-26 21:38 GMT+02:00 Timo Walther <[hidden email]>:
>>>>>>>
>>>>>>>> Hey everyone,
>>>>>>>>
>>>>>>>> I would be interested in having a complete SQL API in Flink.
>> How
>>> is
>>>>> the
>>>>>>>> status there? Is someone already working on it? If not, I would
>>>> like
>>>>> to
>>>>>>>> work on it. I found
>>>> http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf
>>>>>> but
>>>>>>>> I couldn't find anything on the mailing list or Jira.
>> Otherwise I
>>>>> would
>>>>>>>> open an issue and start a discussion about it there.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Timo
>>>>>>>>

Reply | Threaded
Open this post in threaded view
|

Re: SQL on Flink

Ufuk Celebi-2

On 27 May 2015, at 17:05, Timo Walther <[hidden email]> wrote:

> It's rather passion for the future of the project than passion for SQL ;-)
>
> I always try to think like someone from the economy. And IMO the guys from economy are still thinking in SQL. If you want to persuade someone coming from the SQL world, you should offer a SQL interface to run legacy code first (similar to Hadoop operators). Rewriting old queries in Table API is not very convenient.
>
> I share Stephans opinion. Building both APIs concurrently would act as a good source to test and extend the Table API. Currently, the Table API is half-done, but I think the goal is to have SQL functionality. I can implement an SQL operator and extend the Table API if functionality is missing.

Very exiting! :-) +1

As suggested, I think the best thing is to do this hand-in-hand with the Table API. I don't think that there was any real disagreement. Everyone agrees that the SQL layer should be built on top of the Table API, which is great for both the Table API and the SQL layer. :-)

Reply | Threaded
Open this post in threaded view
|

Re: SQL on Flink

Kostas Tzoumas-2
very excited to see this starting!

On Wed, May 27, 2015 at 6:06 PM, Ufuk Celebi <[hidden email]> wrote:

>
> On 27 May 2015, at 17:05, Timo Walther <[hidden email]> wrote:
>
> > It's rather passion for the future of the project than passion for SQL
> ;-)
> >
> > I always try to think like someone from the economy. And IMO the guys
> from economy are still thinking in SQL. If you want to persuade someone
> coming from the SQL world, you should offer a SQL interface to run legacy
> code first (similar to Hadoop operators). Rewriting old queries in Table
> API is not very convenient.
> >
> > I share Stephans opinion. Building both APIs concurrently would act as a
> good source to test and extend the Table API. Currently, the Table API is
> half-done, but I think the goal is to have SQL functionality. I can
> implement an SQL operator and extend the Table API if functionality is
> missing.
>
> Very exiting! :-) +1
>
> As suggested, I think the best thing is to do this hand-in-hand with the
> Table API. I don't think that there was any real disagreement. Everyone
> agrees that the SQL layer should be built on top of the Table API, which is
> great for both the Table API and the SQL layer. :-)
>
>
Reply | Threaded
Open this post in threaded view
|

Re: SQL on Flink

Aljoscha Krettek-2
+1 to what ufuk said. :D
On May 27, 2015 6:13 PM, "Kostas Tzoumas" <[hidden email]> wrote:

> very excited to see this starting!
>
> On Wed, May 27, 2015 at 6:06 PM, Ufuk Celebi <[hidden email]> wrote:
>
> >
> > On 27 May 2015, at 17:05, Timo Walther <[hidden email]> wrote:
> >
> > > It's rather passion for the future of the project than passion for SQL
> > ;-)
> > >
> > > I always try to think like someone from the economy. And IMO the guys
> > from economy are still thinking in SQL. If you want to persuade someone
> > coming from the SQL world, you should offer a SQL interface to run legacy
> > code first (similar to Hadoop operators). Rewriting old queries in Table
> > API is not very convenient.
> > >
> > > I share Stephans opinion. Building both APIs concurrently would act as
> a
> > good source to test and extend the Table API. Currently, the Table API is
> > half-done, but I think the goal is to have SQL functionality. I can
> > implement an SQL operator and extend the Table API if functionality is
> > missing.
> >
> > Very exiting! :-) +1
> >
> > As suggested, I think the best thing is to do this hand-in-hand with the
> > Table API. I don't think that there was any real disagreement. Everyone
> > agrees that the SQL layer should be built on top of the Table API, which
> is
> > great for both the Table API and the SQL layer. :-)
> >
> >
>