Status of Flink-Calcite integration

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Status of Flink-Calcite integration

Haohui Mai
Hi,

We are currently exploring on building a streaming SQL solution on top of
Flink. We think that Flink is a good fit for the problem.

I'm curious about what is the status on the Flink-Calcite integration right
now? Is it being actively developed and on the road map?

We are also very open to contribute back if it aligns with the interests
from  the community.

Your answers are appreciated.

Thanks,
Haohui
Reply | Threaded
Open this post in threaded view
|

Re: Status of Flink-Calcite integration

Fabian Hueske-2
Hi Haohui,

the Flink community started about a year ago to port its Table API on top
of Apache Calcite and finalized the integration in June.
Now, Calcite is the central component of Flink's relational APIs. Flink
features SQL and the Table API, a language-integrated query (LINQ) API.
Both APIs are translated into a common logical representation, optimized by
Calcite and executed on Flink's DataStream or DataSet APIs depending on
stream or batch mode.
The translation of the Table API and SQL and how Calcite is used is
discussed in this blog post [1] and this presentation [2].

The streaming SQL part is currently limited to simple selections,
projections, and table function joins. The Table API features grouped
window aggregates in addition.
We have written a design document for our future Stream SQL efforts [3].
The design is based on the concept of dynamic tables which are derived from
streams using the stream-table duality. Dynamic tables are queried with
regular (batch) SQL and can be converted back into streams or persisted as
materialized views (either internally or in external kv-stores like
Cassandra or HBase).

The Table and SQL APIs have received a lot of contributions from many
different people and are very actively developed.
The Flink community is happy about everybody who wants to contribute and
extend and improve Flink's relational API.

Please let me know if you have any questions.

Cheers, Fabian

[1] http://flink.apache.org/news/2016/05/24/stream-sql.html
[2]
http://www.slideshare.net/FlinkForward/fabian-hueske-taking-a-look-under-the-hood-of-apache-flinks-relational-apis
[3]
https://docs.google.com/document/d/1qVVt_16kdaZQ8RTfA_f4konQPW4tnl8THw6rzGUdaqU

2016-12-20 5:27 GMT+01:00 Haohui Mai <[hidden email]>:

> Hi,
>
> We are currently exploring on building a streaming SQL solution on top of
> Flink. We think that Flink is a good fit for the problem.
>
> I'm curious about what is the status on the Flink-Calcite integration right
> now? Is it being actively developed and on the road map?
>
> We are also very open to contribute back if it aligns with the interests
> from  the community.
>
> Your answers are appreciated.
>
> Thanks,
> Haohui
>
Reply | Threaded
Open this post in threaded view
|

Re: Status of Flink-Calcite integration

Haohui Mai
Hi Fabian,

Thanks for the reply. The design document is quite interesting down the
road.

For our business needs we are interested in adding supports like windowing
and joins in the streaming SQL part. The question is that how does it fit
in the long-term road map? I quickly skimmed through the code, it can be
done via mapping the corresponding RexNode to the window() and the join()
operator to the DataStream. Is it the desired way to do it?

Your thoughts are highly appreciated.

Regards,
Haohui

On Tue, Dec 20, 2016 at 12:10 AM Fabian Hueske <[hidden email]> wrote:

> Hi Haohui,
>
> the Flink community started about a year ago to port its Table API on top
> of Apache Calcite and finalized the integration in June.
> Now, Calcite is the central component of Flink's relational APIs. Flink
> features SQL and the Table API, a language-integrated query (LINQ) API.
> Both APIs are translated into a common logical representation, optimized by
> Calcite and executed on Flink's DataStream or DataSet APIs depending on
> stream or batch mode.
> The translation of the Table API and SQL and how Calcite is used is
> discussed in this blog post [1] and this presentation [2].
>
> The streaming SQL part is currently limited to simple selections,
> projections, and table function joins. The Table API features grouped
> window aggregates in addition.
> We have written a design document for our future Stream SQL efforts [3].
> The design is based on the concept of dynamic tables which are derived from
> streams using the stream-table duality. Dynamic tables are queried with
> regular (batch) SQL and can be converted back into streams or persisted as
> materialized views (either internally or in external kv-stores like
> Cassandra or HBase).
>
> The Table and SQL APIs have received a lot of contributions from many
> different people and are very actively developed.
> The Flink community is happy about everybody who wants to contribute and
> extend and improve Flink's relational API.
>
> Please let me know if you have any questions.
>
> Cheers, Fabian
>
> [1] http://flink.apache.org/news/2016/05/24/stream-sql.html
> [2]
>
> http://www.slideshare.net/FlinkForward/fabian-hueske-taking-a-look-under-the-hood-of-apache-flinks-relational-apis
> [3]
>
> https://docs.google.com/document/d/1qVVt_16kdaZQ8RTfA_f4konQPW4tnl8THw6rzGUdaqU
>
> 2016-12-20 5:27 GMT+01:00 Haohui Mai <[hidden email]>:
>
> > Hi,
> >
> > We are currently exploring on building a streaming SQL solution on top of
> > Flink. We think that Flink is a good fit for the problem.
> >
> > I'm curious about what is the status on the Flink-Calcite integration
> right
> > now? Is it being actively developed and on the road map?
> >
> > We are also very open to contribute back if it aligns with the interests
> > from  the community.
> >
> > Your answers are appreciated.
> >
> > Thanks,
> > Haohui
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Status of Flink-Calcite integration

Fabian Hueske-2
Hi Haohui,

sorry for the delayed reply.
Defining joins and windowing on streams with SQL is certainly on the road
map of Flink.

However, the challenge here is not so much in mapping RelNodes to Flink's
DataStream operators.
It is rather to have the right SQL syntax and semantics to define common
streaming operations.
For example the windows defined by DataStream.keyBy.window() are not
equivalent to SQL's OVER / WINDOW clause but rather a special kind of GROUP
BY.
Stream-Stream joins need some kind of window-bounds which cannot easily
defined in SQL.
Moreover, SQL does not have syntax to control the behavior of a query for
late arriving data or to produce preliminary results.

The Calcite community has Stream SQL on their road map [1] and is working
on built-in functions for common types of windows [2] (see TUMBLE, HOP,
SESSION in [1]).
The document I shared aims to overcome many of these issues by converting
streams into dynamic tables and applying standard SQL on these tables.

Best, Fabian

[1] http://calcite.apache.org/docs/stream.html
[2] https://issues.apache.org/jira/browse/CALCITE-1345

2016-12-20 20:20 GMT+01:00 Haohui Mai <[hidden email]>:

> Hi Fabian,
>
> Thanks for the reply. The design document is quite interesting down the
> road.
>
> For our business needs we are interested in adding supports like windowing
> and joins in the streaming SQL part. The question is that how does it fit
> in the long-term road map? I quickly skimmed through the code, it can be
> done via mapping the corresponding RexNode to the window() and the join()
> operator to the DataStream. Is it the desired way to do it?
>
> Your thoughts are highly appreciated.
>
> Regards,
> Haohui
>
> On Tue, Dec 20, 2016 at 12:10 AM Fabian Hueske <[hidden email]> wrote:
>
> > Hi Haohui,
> >
> > the Flink community started about a year ago to port its Table API on top
> > of Apache Calcite and finalized the integration in June.
> > Now, Calcite is the central component of Flink's relational APIs. Flink
> > features SQL and the Table API, a language-integrated query (LINQ) API.
> > Both APIs are translated into a common logical representation, optimized
> by
> > Calcite and executed on Flink's DataStream or DataSet APIs depending on
> > stream or batch mode.
> > The translation of the Table API and SQL and how Calcite is used is
> > discussed in this blog post [1] and this presentation [2].
> >
> > The streaming SQL part is currently limited to simple selections,
> > projections, and table function joins. The Table API features grouped
> > window aggregates in addition.
> > We have written a design document for our future Stream SQL efforts [3].
> > The design is based on the concept of dynamic tables which are derived
> from
> > streams using the stream-table duality. Dynamic tables are queried with
> > regular (batch) SQL and can be converted back into streams or persisted
> as
> > materialized views (either internally or in external kv-stores like
> > Cassandra or HBase).
> >
> > The Table and SQL APIs have received a lot of contributions from many
> > different people and are very actively developed.
> > The Flink community is happy about everybody who wants to contribute and
> > extend and improve Flink's relational API.
> >
> > Please let me know if you have any questions.
> >
> > Cheers, Fabian
> >
> > [1] http://flink.apache.org/news/2016/05/24/stream-sql.html
> > [2]
> >
> > http://www.slideshare.net/FlinkForward/fabian-hueske-
> taking-a-look-under-the-hood-of-apache-flinks-relational-apis
> > [3]
> >
> > https://docs.google.com/document/d/1qVVt_16kdaZQ8RTfA_
> f4konQPW4tnl8THw6rzGUdaqU
> >
> > 2016-12-20 5:27 GMT+01:00 Haohui Mai <[hidden email]>:
> >
> > > Hi,
> > >
> > > We are currently exploring on building a streaming SQL solution on top
> of
> > > Flink. We think that Flink is a good fit for the problem.
> > >
> > > I'm curious about what is the status on the Flink-Calcite integration
> > right
> > > now? Is it being actively developed and on the road map?
> > >
> > > We are also very open to contribute back if it aligns with the
> interests
> > > from  the community.
> > >
> > > Your answers are appreciated.
> > >
> > > Thanks,
> > > Haohui
> > >
> >
>