Haibo Suen created FLINK-11256:
----------------------------------
Summary: Referencing StreamNode objects directly in StreamEdge causes the sizes of JobGraph and TDD to become unnecessarily large
Key: FLINK-11256
URL:
https://issues.apache.org/jira/browse/FLINK-11256 Project: Flink
Issue Type: Bug
Components: Streaming
Affects Versions: 1.7.1, 1.7.0
Reporter: Haibo Suen
Assignee: Haibo Suen
When a job graph is generated from StreamGraph, StreamEdge(s) on the stream graph are serialized to StreamConfig and stored into the job graph. After that, the serialized bytes will be included in the TDD and distributed to TM. Because StreamEdge directly reference to StreamNode objects including sourceVertex and targetVertex, these objects are also written transitively on serializing StreamEdge. But these StreamNode objects are not needed at runtime. For a large size topology, this will causes JobGraph/TDD to become much larger than that actually need, and more likely to occur rpc timeout when transmitted.
In Streamedge, only the ID of StreamNode should be stored to avoid this situation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)