[jira] [Commented] (FLINK-944) Serialization problem of CollectionInputFormat

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (FLINK-944) Serialization problem of CollectionInputFormat

Shang Yuanchun (Jira)

    [ https://issues.apache.org/jira/browse/FLINK-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034067#comment-14034067 ]

ASF GitHub Bot commented on FLINK-944:
--------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/incubator-flink/pull/25

    FLINK-944 Changed serialization logic of CollectionInputFormat to use TypeSerializer

    The CollectionInputFormat did not support collection elements which didn't implement the Serializable interface. By using the TypeSerializer generated from the TypeInformation, we can circumvent this problem. This allows to use arbitrary classes in a collection data source.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/incubator-flink FLINK-944

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/25.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #25
   
----
commit 56a0730a4769d74f1a3381a07d0259ad99bdc2c4
Author: Till Rohrmann <[hidden email]>
Date:   2014-06-17T17:21:02Z

    CollectionInputFormat now uses the TypeSerializer to serialize the collection entries. This allows to use objects not implementing the Serializable interface as collection elements.

----


> Serialization problem of CollectionInputFormat
> ----------------------------------------------
>
>                 Key: FLINK-944
>                 URL: https://issues.apache.org/jira/browse/FLINK-944
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> The CollectionInputFormat uses only the standard serialization means provided by the JVM. Thus data types which are serializable with a TypeSerializer but does not implement the Serializable interface cannot be used with a CollectionDataSource. Even worse, if one uses an aggregation type such as a tuple, only the top level object will be checked for serializability. Consequently, it will crash at runtime.
> It would be more user friendly to not enforce that a used data type has to implement the Serializable interface. Instead we should use the generated TypeSerializer to do the serialization. That way, we are more flexible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)