Hi all,
I have a question regarding partitioning. Does calling the withPartitioner() method on a coGroup operation has the same effect as performing partitionCustom on both datasets beforehand? i.e. Is 1. a.coGroup(b).where(…).equalTo(…).withPartitioner(…).with(…) equivalent to: 1. DataSet a = aa.partitionCustom(…) 2. DataSet b = bb.partitionCustom(…) 3. a.coGroup(b).where(…).equalTo(…).with(…) Do both snippets perform the same low-level physical partitioning? Thank you, Giannis |
Hi Giannis,
logically the resulting plans should be identical, meaning that they both will use the custom partitioner to create the partitions and then co group both inputs. Physically, the latter plan adds an additional partition operator before the coGroup operator. You can see this is you call env.getExecutionPlan() and then use Flink's plan visualizer [1]. The partition operator adds another task which instantiates another thread. Consequently, coGroup(b).where(...).equalTo(...).withPartitioner(...) should be slightly more efficient. [1] https://flink.apache.org/visualizer/ Cheers, Till On Sun, Dec 2, 2018 at 1:35 PM Giannis Evagorou <[hidden email]> wrote: > Hi all, > > I have a question regarding partitioning. > > Does calling the withPartitioner() method on a coGroup operation has the > same effect as performing partitionCustom on both datasets beforehand? > i.e. > > Is > > 1. a.coGroup(b).where(…).equalTo(…).withPartitioner(…).with(…) > > equivalent to: > > > 1. DataSet a = aa.partitionCustom(…) > 2. DataSet b = bb.partitionCustom(…) > 3. a.coGroup(b).where(…).equalTo(…).with(…) > > > Do both snippets perform the same low-level physical partitioning? > > Thank you, > Giannis > > |
Hi Till,
Thank you for your answer. Giannis ________________________________ From: Till Rohrmann <[hidden email]> Sent: Monday, December 3, 2018 1:46 PM To: [hidden email] Subject: Re: withPartitioner() vs calling partitionCustom() beforehand Hi Giannis, logically the resulting plans should be identical, meaning that they both will use the custom partitioner to create the partitions and then co group both inputs. Physically, the latter plan adds an additional partition operator before the coGroup operator. You can see this is you call env.getExecutionPlan() and then use Flink's plan visualizer [1]. The partition operator adds another task which instantiates another thread. Consequently, coGroup(b).where(...).equalTo(...).withPartitioner(...) should be slightly more efficient. [1] https://flink.apache.org/visualizer/ Cheers, Till On Sun, Dec 2, 2018 at 1:35 PM Giannis Evagorou <[hidden email]> wrote: > Hi all, > > I have a question regarding partitioning. > > Does calling the withPartitioner() method on a coGroup operation has the > same effect as performing partitionCustom on both datasets beforehand? > i.e. > > Is > > 1. a.coGroup(b).where(…).equalTo(…).withPartitioner(…).with(…) > > equivalent to: > > > 1. DataSet a = aa.partitionCustom(…) > 2. DataSet b = bb.partitionCustom(…) > 3. a.coGroup(b).where(…).equalTo(…).with(…) > > > Do both snippets perform the same low-level physical partitioning? > > Thank you, > Giannis > > |
Free forum by Nabble | Edit this page |