how can handle left outer join for any two dataset this dataset inlcude any filed number

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

how can handle left outer join for any two dataset this dataset inlcude any filed number

hager sallah
how can handle left outer join for any two dataset this dataset inlcude any filed number
example on two dataset data set one
 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<Tuple4<Integer, String, String,String>> customer=env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv")
.fieldDelimiter('|')
.includeFields("11110000").ignoreFirstLine() .types(Integer.class,String.class,String.class,String.class);dataset two ExecutionEnvironment orders = ExecutionEnvironment.getExecutionEnvironment();
DataSet<Tuple3<Integer, String, String> customer=env.readCsvFile("/home/hadoop/Desktop/Dataset/order.csv")
.fieldDelimiter('|')
.includeFields("11110000").ignoreFirstLine() .types(Integer.class,String.class,String.classs); 
Reply | Threaded
Open this post in threaded view
|

Re: how can handle left outer join for any two dataset this dataset inlcude any filed number

aalexandrov
Hey there,

Please use the user mailing list for user-related questions (this list is
for Flink internals only).

At the moment outer joins are not directly supported in Flink, but there
are good indications that this will change in the next 4-8 weeks. For the
time being, you can use a CoGroup with a custom UDF to implement the
semantics of a left outer join.

If you dig through the mailing list archives for the past 2-3 weeks and
search for "outer join" you will find a thread discussing the details of
the workaround implementation.

Regards,
Alexander


2015-04-26 21:07 GMT+02:00 hager sallah <[hidden email]>:

> how can handle left outer join for any two dataset this dataset inlcude
> any filed number
> example on two dataset data set one
>  ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
> DataSet<Tuple4<Integer, String, String,String>>
> customer=env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv")
> .fieldDelimiter('|')
> .includeFields("11110000").ignoreFirstLine()
> .types(Integer.class,String.class,String.class,String.class);dataset two
> ExecutionEnvironment orders =
> ExecutionEnvironment.getExecutionEnvironment();
> DataSet<Tuple3<Integer, String, String>
> customer=env.readCsvFile("/home/hadoop/Desktop/Dataset/order.csv")
> .fieldDelimiter('|')
> .includeFields("11110000").ignoreFirstLine()
> .types(Integer.class,String.class,String.classs);
>