Hi all, let's have a look at a simple Join with two DataSources and parallelism p=5. The whole Job consists of 3 parts: 1. DataSource Task 2. Join Task 3. DataSink Task In the first task, the data is provided and prepared for the Join task. In particular each DataSource task creates a ResultPartition which is divided into 5 subpartitions. Since 1/5 of the Join Task will be located in the same node, one of these subpartitions does not have to be shipped over the network. This one subpartition will be shipped to a LocalInputChannel (not RemoteInputChannel) and therefore will not get in touch with the network. Now I made some changes in the network part for my research and would like them to affect all subpartitions. Question: Is there a feature build into flink to completely disable the local stuff and send all subpartitions via network even if they have the same location and destination? If not - does anyone have an idea where to tweak this? Thanks. Chris |
Hi Chris,
I am not sure why you do not want to use local channel. Are there any problems for local channel in your case? The root cause of local channel is determined by scheduler which schedules both producer and consumer tasks into the same task manager. So if you want to change this behaviour, it is better to change the logic or limit in scheduler instead of network stack. Another simple way for your requirement is setting only one slot per task manager, then there would be only one task running in each task manager. Best, Zhijiang ------------------------------------------------------------------ From:Chris Miller <[hidden email]> Send Time:2019年1月14日(星期一) 00:18 To:dev <[hidden email]> Subject:Disable local data transportation Hi all, let's have a look at a simple Join with two DataSources and parallelism p=5. The whole Job consists of 3 parts: 1. DataSource Task 2. Join Task 3. DataSink Task In the first task, the data is provided and prepared for the Join task. In particular each DataSource task creates a ResultPartition which is divided into 5 subpartitions. Since 1/5 of the Join Task will be located in the same node, one of these subpartitions does not have to be shipped over the network. This one subpartition will be shipped to a LocalInputChannel (not RemoteInputChannel) and therefore will not get in touch with the network. Now I made some changes in the network part for my research and would like them to affect all subpartitions. Question: Is there a feature build into flink to completely disable the local stuff and send all subpartitions via network even if they have the same location and destination? If not - does anyone have an idea where to tweak this? Thanks. Chris |
HI Chris,
I'm not sure what you want to test. As far as I know there isn't an option that forcing the data must be through network. And I don't think it's a generic feature we should support. I think zhijiang has given a good suggestion. Changing the runtime codes would be a fast way to satisfy the requirement. Another choice is that changing the code of Execution.java. Force generating the "LocationType.REMOTE" type of "ResultPartitionLocation". It probably works. zhijiang <[hidden email]> 于2019年1月14日周一 下午3:44写道: > Hi Chris, > > I am not sure why you do not want to use local channel. Are there any > problems for local channel in your case? > > The root cause of local channel is determined by scheduler which schedules > both producer and consumer tasks into the same task manager. So if you want > to change this behaviour, it is better to change the logic or limit in > scheduler instead of network stack. Another simple way for your > requirement is setting only one slot per task manager, then there would be > only one task running in each task manager. > > Best, > Zhijiang > > > ------------------------------------------------------------------ > From:Chris Miller <[hidden email]> > Send Time:2019年1月14日(星期一) 00:18 > To:dev <[hidden email]> > Subject:Disable local data transportation > > > > Hi all, > > let's have a look at a simple Join with two DataSources and parallelism > p=5. > > The whole Job consists of 3 parts: > > 1. DataSource Task > > 2. Join Task > > 3. DataSink Task > > In the first task, the data is provided and prepared for the Join task. > In particular each DataSource task creates a ResultPartition which is > divided into 5 subpartitions. Since 1/5 of the Join Task will be located > in the same node, one of these subpartitions does not have to be shipped > over the network. > > This one subpartition will be shipped to a LocalInputChannel (not > RemoteInputChannel) and therefore will not get in touch with the > network. > > Now I made some changes in the network part for my research and would > like them to affect all subpartitions. > > Question: > > Is there a feature build into flink to completely disable the local > stuff and send all subpartitions via network even if they have the same > location and destination? > > If not - does anyone have an idea where to tweak this? > > Thanks. > > Chris > > > |
Free forum by Nabble | Edit this page |