Hi,
I was reading some FLIP documents related to the new design of the Flink Schedule [1] and unification of batch and stream [2]. Then I created two different programs to learn how Flink optimizes the Query Plan in Batch and in Stream mode (and how much further it goes). One using batch [3] and one using Stream [4]. During the code debugging and also as it is depicted on the document [2], the batch program uses the org.apache.flink.optimizer.Optimizer class which generates a "org.apache.flink.optimizer.plan.OptimizedPlan" while stream program uses the "org.apache.flink.streaming.api.graph.StreamGraph" and every transformation inside the packet "org.apache.flink.streaming.api.transformations". When I am showing the execution plan with "env.getExecutionPlan()" I see exactly I have written on the Flink program (which it is expected). However, I was looking for where I can see the optimized plan. I mean decisions of operators reordering based on cost or statistics. For batch I could find the "org.apache.flink.optimizer.costs.CostEstimator" and "org.apache.flink.optimizer.DataStatistics". But for Stream I only found the creation of the plan. How can I debug that? Or have a better understanding of what Flink is doing. Do you advise me to read some other reference about this? Kind Regards, Felipe [1] Group-aware scheduling for Flink - https://docs.google.com/document/d/1q7NOqt05HIN-PlKEEPB36JiuU1Iu9fnxxVGJzylhsxU/edit#heading=h.k15nfgsa5bnk [2] Unified Core API for Streaming and Batch - https://docs.google.com/document/d/1G0NUIaaNJvT6CMrNCP6dRXGv88xNhDQqZFrQEuJ0rVU/edit# [3] https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/batch/MatrixMultiplication.java [4] https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/stream/SensorsReadingMqttJoinQEP.java *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com <https://felipeogutierrez.blogspot.com>* |
Hi Felipe,
for streaming Flink currently does not optimize the data flow graph. I think the best reference is actually going through the code as you've done for the batch case. Cheers, Till On Wed, Dec 19, 2018 at 3:14 PM Felipe Gutierrez < [hidden email]> wrote: > Hi, > > I was reading some FLIP documents related to the new design of the Flink > Schedule [1] and unification of batch and stream [2]. Then I created two > different programs to learn how Flink optimizes the Query Plan in Batch and > in Stream mode (and how much further it goes). One using batch [3] and one > using Stream [4]. During the code debugging and also as it is depicted on > the document [2], the batch program uses the > org.apache.flink.optimizer.Optimizer class which generates a > "org.apache.flink.optimizer.plan.OptimizedPlan" while stream program uses > the "org.apache.flink.streaming.api.graph.StreamGraph" and every > transformation inside the packet > "org.apache.flink.streaming.api.transformations". > > When I am showing the execution plan with "env.getExecutionPlan()" I see > exactly I have written on the Flink program (which it is expected). > However, I was looking for where I can see the optimized plan. I mean > decisions of operators reordering based on cost or statistics. For batch I > could find the "org.apache.flink.optimizer.costs.CostEstimator" and > "org.apache.flink.optimizer.DataStatistics". But for Stream I only found > the creation of the plan. How can I debug that? Or have a better > understanding of what Flink is doing. Do you advise me to read some other > reference about this? > > Kind Regards, > Felipe > > [1] Group-aware scheduling for Flink - > https://docs.google.com/document/d/1q7NOqt05HIN-PlKEEPB36JiuU1Iu9fnxxVGJzylhsxU/edit#heading=h.k15nfgsa5bnk > [2] Unified Core API for Streaming and Batch - > https://docs.google.com/document/d/1G0NUIaaNJvT6CMrNCP6dRXGv88xNhDQqZFrQEuJ0rVU/edit# > [3] > https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/batch/MatrixMultiplication.java > [4] > https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/stream/SensorsReadingMqttJoinQEP.java > > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > |
Free forum by Nabble | Edit this page |