Chesnay Schepler created FLINK-1927:
---------------------------------------
Summary: [Py] Rework operator distribution
Key: FLINK-1927
URL:
https://issues.apache.org/jira/browse/FLINK-1927 Project: Flink
Issue Type: Improvement
Components: Python API
Affects Versions: 0.9
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
Priority: Minor
Fix For: 0.9
Currently, the python operator is created when execution the python plan file, serialized using dill and saved as a byte[] in the java function. It is then deserialized at runtime on each node.
The current implementation is fairly hacky, and imposes certain limitations that make it hard to work with. Chaining, or generally saving other user-code, always requires a separate deserialization step after deserializing the operator.
These issues can be easily circumvented by rebuilding the (python) plan on each node, instead of serializing the operator. The plan creation is deterministic, and every operator is uniquely identified by an ID that is already known to the java function.
This change will allow us to easily support custom serializers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)