(DEPRECATED) Apache Flink Mailing List archive.

DataStream.partitionCustom() - define parallelism

Classic

List

Threaded

4 messages Options

Jaromir Vanek

DataStream.partitionCustom() - define parallelism

Hi all,

I've got one question considering custom partitioning in DataStream API. Is it possible to define/change parallelism when doing 'partitionCustom' transformation?

As far as I discovered there is no way how to call 'setParallelism()' on the 'PartitionTransformation' because it's hidden from the user.

In another thread (http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Range-partitioning-td10947.html) I found a short notice about custom partitioner - 'a partitioning will only be valid to the point that you change the parallelism.' So the possibility of changing parallelism in the subsequent transformation won't solve my problem.

How to make sure that both partitioner and number of partitions are valid for subsequent transformations?

Thank you

Aljoscha Krettek-2

Re: DataStream.partitionCustom() - define parallelism

Hi,
it should be possible to set the parallelism on the actual downstream
operation. The partitioning operation is just an intermediate.

Cheers,
Aljoscha

On Mon, 18 Jul 2016 at 20:20 vanekjar <[hidden email]> wrote:

> Hi all,
>
> I've got one question considering custom partitioning in DataStream API. Is
> it possible to define/change parallelism when doing 'partitionCustom'
> transformation?
>
> As far as I discovered there is no way how to call 'setParallelism()' on
> the
> 'PartitionTransformation' because it's hidden from the user.
>
> In another thread
> (
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Range-partitioning-td10947.html
> )
> I found a short notice about custom partitioner - 'a partitioning will only
> be valid to the point that you change the parallelism.' So the possibility
> of changing parallelism in the subsequent transformation won't solve my
> problem.
>
> How to make sure that both partitioner and number of partitions are valid
> for subsequent transformations?
>
> Thank you
>
>
>
> --
> View this message in context:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DataStream-partitionCustom-define-parallelism-tp12597.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at
> Nabble.com.
>

Jaromir Vanek

Re: DataStream.partitionCustom() - define parallelism

Aljoscha Krettek-2 wrote

Hi,
it should be possible to set the parallelism on the actual downstream
operation. The partitioning operation is just an intermediate.

Cheers,
Aljoscha

Are you sure about that? The mentioned discussion about range partitioner says exactly opposite:

A partitioning will only be valid to the point that you change the parallelism.

In the modified program the data will be correctly partitioned (lets say into 8 partitions if the default parallelism is 8).
After the partitioning, the 8 partitions have to be reduced to 3 partitions as defined by the map-partition
operator with parallelism 3. This is done by randomly shuffling which destroys the range-partitioning.

It says the the downstream operation will use "random shuffle" when changing parallelism. Is 'customPartition()' different case?

Aljoscha Krettek-2

Re: DataStream.partitionCustom() - define parallelism

Hi,
I think that was just related to the DataSet API. If I'm not mistaken
changing the parallelism should work after a "partitionCustom()".

Cheers,
Aljoscha

On Tue, 19 Jul 2016 at 19:25 Jaromir Vanek <[hidden email]> wrote:

> Aljoscha Krettek-2 wrote
> > Hi,
> > it should be possible to set the parallelism on the actual downstream
> > operation. The partitioning operation is just an intermediate.
> >
> > Cheers,
> > Aljoscha
>
> Are you sure about that? The mentioned discussion
> <
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Range-partitioning-tp10947p10953.html
> >
> about range partitioner says exactly opposite:
>
>
> > A partitioning will only be valid to the point that you change the
> > parallelism.
> >
> > In the modified program the data will be correctly partitioned (lets say
> > into 8 partitions if the default parallelism is 8).
> > After the partitioning, the 8 partitions have to be reduced to 3
> > partitions as defined by the map-partition
> > operator with parallelism 3. This is done by randomly shuffling which
> > destroys the range-partitioning.
>
> It says the the downstream operation will use "random shuffle" when
> changing
> parallelism. Is 'customPartition()' different case?
>
>
>
> --
> View this message in context:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DataStream-partitionCustom-define-parallelism-tp12597p12617.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at
> Nabble.com.
>