[jira] [Commented] (FLINK-939) Distribute required JAR files with seperate service

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (FLINK-939) Distribute required JAR files with seperate service

Shang Yuanchun (Jira)

    [ https://issues.apache.org/jira/browse/FLINK-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034295#comment-14034295 ]

Stephan Ewen commented on FLINK-939:
------------------------------------

Daniel, great to hear it!

I think a first prototype can make the "in memory" assumption. We may need to go to a "blob on disk" solution fairly soon. We are already from a user with a JobManager (running with 512 mbytes in a yarn container) that receives a 90 mb jar file may hit a limit (with heap fragmentation, two copies of the array due to resizing, and other data structures occupying space). A "blob" on disk that uses something like a weak hash map to cache in memory might be a cool solution for the future.

Concerning the integration with RPC: I think the current one does not urgently need that right now (having enough memory solves the issue for the time being), so it may not be worth to invest time there. I am not 100& sure how far the akka branch is, so I would suggest to build it standalone in the current branch and then we can make use of it when we merge the akka branch.



> Distribute required JAR files with seperate service
> ---------------------------------------------------
>
>                 Key: FLINK-939
>                 URL: https://issues.apache.org/jira/browse/FLINK-939
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Ufuk Celebi
>            Assignee: Daniel Warneke
>
> Currently, required user JAR files are distributed via the RPC service in {{JobGraph.writeRequiredJarFiles(DataOutput, AbstractJobVertex[])}}. The RPC service then tries to allocate a buffer on the client side heap to write the on-disk JAR, which [can lead to problems|https://github.com/apache/incubator-flink/pull/18].
> This should be replaced with a seperate service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)