[jira] [Commented] (FLINK-668) API Proposal - NamedDataSets

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (FLINK-668) API Proposal - NamedDataSets

Shang Yuanchun (Jira)

    [ https://issues.apache.org/jira/browse/FLINK-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035919#comment-14035919 ]

Markus Holzemer commented on FLINK-668:
---------------------------------------

The discussion on this topic is continued in a newer issue. (FLINK-947)

> API Proposal - NamedDataSets
> ----------------------------
>
>                 Key: FLINK-668
>                 URL: https://issues.apache.org/jira/browse/FLINK-668
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> @StephanEwen, @aljoscha and me were discussing a further stage / alternative version of the new Java API that we called NamedDataSets. Instead of dealing with specific types that are checked on compile time, users should be able to just use names of fields to operate on. The types would be checked not on compile time but on pre flight time. That would give a feeling more similiar to what SQL is like.
> Currently users often have to remember what position in the tuple a specific field has, which can get a little bit annoying when dealing with bigger queries. Using names instead would perhaps make this more manageable.
> I have created a first proposal for the syntax that we can use as a basis for disussion:
> ```
> NamedDataSet nds = get3TupleDataSet(env).named("ID", "Number", "Comment");
>
> NamedDataSet join = get3TupleDataSet(env).named("ID", "Number", "Comment");
>
> NamedDataSet join_result = nds.join(join).where("ID").equalTo("ID");
>
> NamedDataSet group_result = nds.groupBy("ID");
> // to apply a udf
> NamedDataSet reduceDs = nds.get("ID", "Number", "Comment").types(Integer.class, Long.class, String.class)
> .groupBy(1).reduce(new Tuple3Reduce("B-)")).named("ID", "Number", "Comment");
>
> reduceDs.get("ID", "Number", "Comment").types(Integer.class, Long.class, String.class).print();
> env.execute();
> ```
> My current development progress can be looked at here:
> https://github.com/markus-h/stratosphere/compare/named_dataset
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/668
> Created by: [markus-h|https://github.com/markus-h]
> Labels: enhancement, java api, user satisfaction,
> Created at: Tue Apr 08 13:31:59 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)