Worst-case optimal join processing on Streams

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Worst-case optimal join processing on Streams

Laurens VIJNCK
Dear folks,

DISCLAIMER: With this mail, my sole intention is to establish contact with the community and trade ideas on how to realize the goal described below.

I'm a starting PhD researcher in distributed systems and databases who is particularly interested in worst-case optimal (multiway) join processing on streams. I have performed preliminary tests with a new join algorithm that shows rather promising results. However, the limitation is that the algorithm operates in a centralized fashion. My goal is to extend the capabilities of the algorithm to operate in a distributed environment. To showcase my results, I want to implement a proof-of-concept in Apache Flink. I know this is a rather ambitious project, hence why I am reaching out to the community.

I have traversed most of the application development documentation on the website (e.g., [1, 2, 3, 4]) but I am now eager the learn more about the internals thereof. Specifically, I want to gain some more insights in the lifecycle of a query in Flink. Is there some additional documentation available on this subject?

Thanks in advance.

[1] https://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html
[2] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/dynamic_tables.html
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/streaming/joins.html
[4] https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals

Kind regards,

Laurens Vijnck
Reply | Threaded
Open this post in threaded view
|

Re: Worst-case optimal join processing on Streams

Kurt Young
Hi Laurens,

Good to hear that you are interested with optimizing Flink's join strategy.
If you want
to learn more about the lifecycle of a query in Flink, I would
recommend you to read
the original design doc of Flink Table&SQL module [1], hope it helps.

Best,
Kurt

[1]
https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI/


On Sat, Dec 14, 2019 at 10:52 PM Laurens VIJNCK <[hidden email]>
wrote:

> Dear folks,
>
> DISCLAIMER: With this mail, my sole intention is to establish contact with
> the community and trade ideas on how to realize the goal described below.
>
> I'm a starting PhD researcher in distributed systems and databases who is
> particularly interested in worst-case optimal (multiway) join processing on
> streams. I have performed preliminary tests with a new join algorithm that
> shows rather promising results. However, the limitation is that the
> algorithm operates in a centralized fashion. My goal is to extend the
> capabilities of the algorithm to operate in a distributed environment. To
> showcase my results, I want to implement a proof-of-concept in Apache
> Flink. I know this is a rather ambitious project, hence why I am reaching
> out to the community.
>
> I have traversed most of the application development documentation on the
> website (e.g., [1, 2, 3, 4]) but I am now eager the learn more about the
> internals thereof. Specifically, I want to gain some more insights in the
> lifecycle of a query in Flink. Is there some additional documentation
> available on this subject?
>
> Thanks in advance.
>
> [1] https://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/dynamic_tables.html
> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/streaming/joins.html
> [4] https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals
>
> Kind regards,
>
> Laurens Vijnck
>