[jira] [Commented] (FLINK-939) Distribute required JAR files with seperate service

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (FLINK-939) Distribute required JAR files with seperate service

Shang Yuanchun (Jira)

    [ https://issues.apache.org/jira/browse/FLINK-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032485#comment-14032485 ]

Stephan Ewen commented on FLINK-939:
------------------------------------

After looking a little more into the code, I think we need something more powerful, like a blob manager. Issues we need to address are the following:

  1. Distribute large JAR files from (a) client to job manager and (b) job manager to task managers
  2. Ship large program functions (closures) from (a) client to job manager and (b) job manager to task managers
  3. Ship intermediate results from (a) task managers to job manager and (b) job manager to client

All those can be transferred in the form of blobs or large blocks/frames.

What we could use is a BlobManager on the JobManager that accepts requests "put(JobId, key, byte[])" and "get(jobid, key)". Clients and task managers can put data there and transmit the key via the RPC.

The BlobManager needs to store the data on disk, if needed, to prevent OutOfMemoryErrors. I would suggest to start initially with a simple service that has a "Map<JobID, Map<Key, Pair<Length, File>>>" and puts all received blobs on disk in the temp directory.

What is your opinion on that?

> Distribute required JAR files with seperate service
> ---------------------------------------------------
>
>                 Key: FLINK-939
>                 URL: https://issues.apache.org/jira/browse/FLINK-939
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Ufuk Celebi
>            Assignee: Daniel Warneke
>
> Currently, required user JAR files are distributed via the RPC service in {{JobGraph.writeRequiredJarFiles(DataOutput, AbstractJobVertex[])}}. The RPC service then tries to allocate a buffer on the client side heap to write the on-disk JAR, which [can lead to problems|https://github.com/apache/incubator-flink/pull/18].
> This should be replaced with a seperate service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)