Sajeev Ramakrishnan created FLINK-4902:
------------------------------------------
Summary: Flink Task Chain not getting input in a distributed manner
Key: FLINK-4902
URL:
https://issues.apache.org/jira/browse/FLINK-4902 Project: Flink
Issue Type: Bug
Components: DataSet API
Affects Versions: 1.1.0
Environment: RHEL 6.6
Reporter: Sajeev Ramakrishnan
Dear Team,
I have the following tasks chained as a single subtask.
left outer join -> filter -> map -> flatMap.
The input to this would be two streams
memberPlan - 22 million
groupPlan - 1 million.
I am running the entire job with parallelism 16. Before this task chain, I am doing two left outer joins.
The problem is that one slot is getting 22 million and rest 15 slots are getting the input from groupPlan.
This is making the entire execution very slow, probably 4 hours slower.
Can you please throw some light on this.
Regards,
Sajeev
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)