[jira] [Created] (FLINK-11256) Referencing StreamNode objects directly in StreamEdge causes the sizes of JobGraph and TDD to become unnecessarily large

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-11256) Referencing StreamNode objects directly in StreamEdge causes the sizes of JobGraph and TDD to become unnecessarily large

Shang Yuanchun (Jira)
Haibo Suen created FLINK-11256:
----------------------------------

             Summary: Referencing StreamNode objects directly in StreamEdge causes the sizes of JobGraph and TDD to become unnecessarily large
                 Key: FLINK-11256
                 URL: https://issues.apache.org/jira/browse/FLINK-11256
             Project: Flink
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.7.1, 1.7.0
            Reporter: Haibo Suen
            Assignee: Haibo Suen


When a job graph is generated from StreamGraph, StreamEdge(s) on the stream graph are serialized to StreamConfig and stored into the job graph. After that, the serialized bytes will be included in the TDD and distributed to TM. Because StreamEdge directly reference to StreamNode objects including sourceVertex and targetVertex, these objects are also written transitively on serializing StreamEdge. But these StreamNode objects are not needed at runtime. For a large size topology, this will causes JobGraph/TDD to become much larger than that actually need, and more likely to occur rpc timeout when transmitted.

In Streamedge, only the ID of StreamNode should be stored to avoid this situation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)