(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Created] (FLINK-4175) Broadcast data sent increases with # slots per TM

Classic

List

Threaded

1 message

Shang Yuanchun (Jira)

[jira] [Created] (FLINK-4175) Broadcast data sent increases with # slots per TM

Felix Neutatz created FLINK-4175:
------------------------------------

Summary: Broadcast data sent increases with # slots per TM
Key: FLINK-4175
URL: https://issues.apache.org/jira/browse/FLINK-4175
Project: Flink
Issue Type: Improvement
Components: Core, TaskManager
Affects Versions: 1.0.3
Reporter: Felix Neutatz
Assignee: Felix Neutatz

Problem:
we experience some unexpected increase of data sent over the network for broadcasts with increasing number of slots per Taskmanager.

We provided a benchmark [1]. It not only increases the size of data sent over the network but also hurts performance as seen in the preliminary results below. In this results cloud-11 has 25 nodes and ibm-power has 8 nodes with scaling the number of slots per node from 1 - 16.

+-----------------------+--------------+-------------+
| suite | name | median_time |
+=======================+==============+=============+
| broadcast.cloud-11 | broadcast.01 | 8796 |
| broadcast.cloud-11 | broadcast.02 | 14802 |
| broadcast.cloud-11 | broadcast.04 | 30173 |
| broadcast.cloud-11 | broadcast.08 | 56936 |
| broadcast.cloud-11 | broadcast.16 | 117507 |
| broadcast.ibm-power-1 | broadcast.01 | 6807 |
| broadcast.ibm-power-1 | broadcast.02 | 8443 |
| broadcast.ibm-power-1 | broadcast.04 | 11823 |
| broadcast.ibm-power-1 | broadcast.08 | 21655 |
| broadcast.ibm-power-1 | broadcast.16 | 37426 |
+-----------------------+--------------+-------------+

After looking into the code base it, it seems that the data is de-serialized only once per TM, but the actual data is sent for all slots running the operator with broadcast vars and just gets discarded in case its already de-serialized.

We do not see a reason the data can't be shared among the slots of a TM and therefore just sent once.

[1] https://github.com/TU-Berlin-DIMA/flink-broadcast

This Jira will continue the discussion started here: https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3C1465386300767.94345@...%3E

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)