Disable local data transportation

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Disable local data transportation

Chris Miller
 

Hi all,

let's have a look at a simple Join with two DataSources and parallelism
p=5.

The whole Job consists of 3 parts:

1. DataSource Task

2. Join Task

3. DataSink Task

In the first task, the data is provided and prepared for the Join task.
In particular each DataSource task creates a ResultPartition which is
divided into 5 subpartitions. Since 1/5 of the Join Task will be located
in the same node, one of these subpartitions does not have to be shipped
over the network.

This one subpartition will be shipped to a LocalInputChannel (not
RemoteInputChannel) and therefore will not get in touch with the
network.

Now I made some changes in the network part for my research and would
like them to affect all subpartitions.

Question:

Is there a feature build into flink to completely disable the local
stuff and send all subpartitions via network even if they have the same
location and destination?

If not - does anyone have an idea where to tweak this?

Thanks.

Chris

 
Reply | Threaded
Open this post in threaded view
|

Re: Disable local data transportation

Zhijiang(wangzhijiang999)
Hi Chris,

I am not sure why you do not want to use local channel. Are there any problems for local channel in your case?

The root cause of local channel is determined by scheduler which schedules both producer and consumer tasks into the same task manager. So if you want to change this behaviour, it is better to change the logic or limit in scheduler instead of network stack.  Another simple way for your requirement is setting only one slot per task manager, then there would be only one task running in each task manager.

Best,
Zhijiang


------------------------------------------------------------------
From:Chris Miller <[hidden email]>
Send Time:2019年1月14日(星期一) 00:18
To:dev <[hidden email]>
Subject:Disable local data transportation



Hi all,

let's have a look at a simple Join with two DataSources and parallelism
p=5.

The whole Job consists of 3 parts:

1. DataSource Task

2. Join Task

3. DataSink Task

In the first task, the data is provided and prepared for the Join task.
In particular each DataSource task creates a ResultPartition which is
divided into 5 subpartitions. Since 1/5 of the Join Task will be located
in the same node, one of these subpartitions does not have to be shipped
over the network.

This one subpartition will be shipped to a LocalInputChannel (not
RemoteInputChannel) and therefore will not get in touch with the
network.

Now I made some changes in the network part for my research and would
like them to affect all subpartitions.

Question:

Is there a feature build into flink to completely disable the local
stuff and send all subpartitions via network even if they have the same
location and destination?

If not - does anyone have an idea where to tweak this?

Thanks.

Chris


Reply | Threaded
Open this post in threaded view
|

Re: Disable local data transportation

Biao Liu
HI Chris,
I'm not sure what you want to test. As far as I know there isn't an option
that forcing the data must be through network. And I don't think it's a
generic feature we should support. I think zhijiang has given a good
suggestion. Changing the runtime codes would be a fast way to satisfy the
requirement. Another choice is that changing the code of Execution.java.
Force generating the "LocationType.REMOTE" type of
"ResultPartitionLocation". It probably works.

zhijiang <[hidden email]> 于2019年1月14日周一 下午3:44写道:

> Hi Chris,
>
> I am not sure why you do not want to use local channel. Are there any
> problems for local channel in your case?
>
> The root cause of local channel is determined by scheduler which schedules
> both producer and consumer tasks into the same task manager. So if you want
> to change this behaviour, it is better to change the logic or limit in
> scheduler instead of network stack.  Another simple way for your
> requirement is setting only one slot per task manager, then there would be
> only one task running in each task manager.
>
> Best,
> Zhijiang
>
>
> ------------------------------------------------------------------
> From:Chris Miller <[hidden email]>
> Send Time:2019年1月14日(星期一) 00:18
> To:dev <[hidden email]>
> Subject:Disable local data transportation
>
>
>
> Hi all,
>
> let's have a look at a simple Join with two DataSources and parallelism
> p=5.
>
> The whole Job consists of 3 parts:
>
> 1. DataSource Task
>
> 2. Join Task
>
> 3. DataSink Task
>
> In the first task, the data is provided and prepared for the Join task.
> In particular each DataSource task creates a ResultPartition which is
> divided into 5 subpartitions. Since 1/5 of the Join Task will be located
> in the same node, one of these subpartitions does not have to be shipped
> over the network.
>
> This one subpartition will be shipped to a LocalInputChannel (not
> RemoteInputChannel) and therefore will not get in touch with the
> network.
>
> Now I made some changes in the network part for my research and would
> like them to affect all subpartitions.
>
> Question:
>
> Is there a feature build into flink to completely disable the local
> stuff and send all subpartitions via network even if they have the same
> location and destination?
>
> If not - does anyone have an idea where to tweak this?
>
> Thanks.
>
> Chris
>
>
>