Dear folks,
DISCLAIMER: With this mail, my sole intention is to establish contact with the community and trade ideas on how to realize the goal described below. I'm a starting PhD researcher in distributed systems and databases who is particularly interested in worst-case optimal (multiway) join processing on streams. I have performed preliminary tests with a new join algorithm that shows rather promising results. However, the limitation is that the algorithm operates in a centralized fashion. My goal is to extend the capabilities of the algorithm to operate in a distributed environment. To showcase my results, I want to implement a proof-of-concept in Apache Flink. I know this is a rather ambitious project, hence why I am reaching out to the community. I have traversed most of the application development documentation on the website (e.g., [1, 2, 3, 4]) but I am now eager the learn more about the internals thereof. Specifically, I want to gain some more insights in the lifecycle of a query in Flink. Is there some additional documentation available on this subject? Thanks in advance. [1] https://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html [2] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/dynamic_tables.html [3] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/streaming/joins.html [4] https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals Kind regards, Laurens Vijnck |
Hi Laurens,
Good to hear that you are interested with optimizing Flink's join strategy. If you want to learn more about the lifecycle of a query in Flink, I would recommend you to read the original design doc of Flink Table&SQL module [1], hope it helps. Best, Kurt [1] https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI/ On Sat, Dec 14, 2019 at 10:52 PM Laurens VIJNCK <[hidden email]> wrote: > Dear folks, > > DISCLAIMER: With this mail, my sole intention is to establish contact with > the community and trade ideas on how to realize the goal described below. > > I'm a starting PhD researcher in distributed systems and databases who is > particularly interested in worst-case optimal (multiway) join processing on > streams. I have performed preliminary tests with a new join algorithm that > shows rather promising results. However, the limitation is that the > algorithm operates in a centralized fashion. My goal is to extend the > capabilities of the algorithm to operate in a distributed environment. To > showcase my results, I want to implement a proof-of-concept in Apache > Flink. I know this is a rather ambitious project, hence why I am reaching > out to the community. > > I have traversed most of the application development documentation on the > website (e.g., [1, 2, 3, 4]) but I am now eager the learn more about the > internals thereof. Specifically, I want to gain some more insights in the > lifecycle of a query in Flink. Is there some additional documentation > available on this subject? > > Thanks in advance. > > [1] https://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html > [2] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/dynamic_tables.html > [3] > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/streaming/joins.html > [4] https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals > > Kind regards, > > Laurens Vijnck > |
Free forum by Nabble | Edit this page |