[jira] [Created] (FLINK-1459) Collect DataSet to client

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-1459) Collect DataSet to client

Shang Yuanchun (Jira)
John Sandiford created FLINK-1459:
-------------------------------------

             Summary: Collect DataSet to client
                 Key: FLINK-1459
                 URL: https://issues.apache.org/jira/browse/FLINK-1459
             Project: Flink
          Issue Type: Improvement
            Reporter: John Sandiford


Hi, I may well have missed something obvious here but I cannot find an easy way to extract the values in a DataSet to the client.  Spark has collect, collectAsMap etc...  

(I need to pass the values from a small aggregated DataSet back to a machine learning library which is controlling the iterations.)

The only way I could find to do this was to implement my own in memory OutputFormat.  This is not ideal, but does work.

Many thanks, John
 


val env = ExecutionEnvironment.getExecutionEnvironment

  val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)

  val result = data.reduce((a, b) => a)
  val valuesOnClient = result.???

  env.execute("Simple example")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Created] (FLINK-1459) Collect DataSet to client

aalexandrov
There is already an ongoing discussion and an issue open about that:

http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Gather-a-distributed-dataset-td3216.html

I am sadly currently time-pressed with other things, but if nobody else
handles this, I expect to be able to work on that within two weeks.

Regards,
Alex

2015-01-28 10:46 GMT+01:00 John Sandiford (JIRA) <[hidden email]>:

> John Sandiford created FLINK-1459:
> -------------------------------------
>
>              Summary: Collect DataSet to client
>                  Key: FLINK-1459
>                  URL: https://issues.apache.org/jira/browse/FLINK-1459
>              Project: Flink
>           Issue Type: Improvement
>             Reporter: John Sandiford
>
>
> Hi, I may well have missed something obvious here but I cannot find an
> easy way to extract the values in a DataSet to the client.  Spark has
> collect, collectAsMap etc...
>
> (I need to pass the values from a small aggregated DataSet back to a
> machine learning library which is controlling the iterations.)
>
> The only way I could find to do this was to implement my own in memory
> OutputFormat.  This is not ideal, but does work.
>
> Many thanks, John
>
>
>
> val env = ExecutionEnvironment.getExecutionEnvironment
>
>   val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)
>
>   val result = data.reduce((a, b) => a)
>   val valuesOnClient = result.???
>
>   env.execute("Simple example")
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>