Hi all,
I've got one question considering custom partitioning in DataStream API. Is it possible to define/change parallelism when doing 'partitionCustom' transformation? As far as I discovered there is no way how to call 'setParallelism()' on the 'PartitionTransformation' because it's hidden from the user. In another thread (http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Range-partitioning-td10947.html) I found a short notice about custom partitioner - 'a partitioning will only be valid to the point that you change the parallelism.' So the possibility of changing parallelism in the subsequent transformation won't solve my problem. How to make sure that both partitioner and number of partitions are valid for subsequent transformations? Thank you |
Hi,
it should be possible to set the parallelism on the actual downstream operation. The partitioning operation is just an intermediate. Cheers, Aljoscha On Mon, 18 Jul 2016 at 20:20 vanekjar <[hidden email]> wrote: > Hi all, > > I've got one question considering custom partitioning in DataStream API. Is > it possible to define/change parallelism when doing 'partitionCustom' > transformation? > > As far as I discovered there is no way how to call 'setParallelism()' on > the > 'PartitionTransformation' because it's hidden from the user. > > In another thread > ( > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Range-partitioning-td10947.html > ) > I found a short notice about custom partitioner - 'a partitioning will only > be valid to the point that you change the parallelism.' So the possibility > of changing parallelism in the subsequent transformation won't solve my > problem. > > How to make sure that both partitioner and number of partitions are valid > for subsequent transformations? > > Thank you > > > > -- > View this message in context: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DataStream-partitionCustom-define-parallelism-tp12597.html > Sent from the Apache Flink Mailing List archive. mailing list archive at > Nabble.com. > |
Are you sure about that? The mentioned discussion about range partitioner says exactly opposite: It says the the downstream operation will use "random shuffle" when changing parallelism. Is 'customPartition()' different case? |
Hi,
I think that was just related to the DataSet API. If I'm not mistaken changing the parallelism should work after a "partitionCustom()". Cheers, Aljoscha On Tue, 19 Jul 2016 at 19:25 Jaromir Vanek <[hidden email]> wrote: > Aljoscha Krettek-2 wrote > > Hi, > > it should be possible to set the parallelism on the actual downstream > > operation. The partitioning operation is just an intermediate. > > > > Cheers, > > Aljoscha > > Are you sure about that? The mentioned discussion > < > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Range-partitioning-tp10947p10953.html > > > about range partitioner says exactly opposite: > > > > A partitioning will only be valid to the point that you change the > > parallelism. > > > > In the modified program the data will be correctly partitioned (lets say > > into 8 partitions if the default parallelism is 8). > > After the partitioning, the 8 partitions have to be reduced to 3 > > partitions as defined by the map-partition > > operator with parallelism 3. This is done by randomly shuffling which > > destroys the range-partitioning. > > It says the the downstream operation will use "random shuffle" when > changing > parallelism. Is 'customPartition()' different case? > > > > -- > View this message in context: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DataStream-partitionCustom-define-parallelism-tp12597p12617.html > Sent from the Apache Flink Mailing List archive. mailing list archive at > Nabble.com. > |
Free forum by Nabble | Edit this page |