(DEPRECATED) Apache Flink Mailing List archive.

how can handle left outer join for any two dataset this dataset inlcude any filed number

Classic

List

Threaded

2 messages Options

hager sallah

how can handle left outer join for any two dataset this dataset inlcude any filed number

how can handle left outer join for any two dataset this dataset inlcude any filed number
example on two dataset data set one
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<Tuple4<Integer, String, String,String>> customer=env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv")
.fieldDelimiter('|')
.includeFields("11110000").ignoreFirstLine() .types(Integer.class,String.class,String.class,String.class);dataset two ExecutionEnvironment orders = ExecutionEnvironment.getExecutionEnvironment();
DataSet<Tuple3<Integer, String, String> customer=env.readCsvFile("/home/hadoop/Desktop/Dataset/order.csv")
.fieldDelimiter('|')
.includeFields("11110000").ignoreFirstLine() .types(Integer.class,String.class,String.classs);

aalexandrov

Re: how can handle left outer join for any two dataset this dataset inlcude any filed number

Hey there,

Please use the user mailing list for user-related questions (this list is
for Flink internals only).

At the moment outer joins are not directly supported in Flink, but there
are good indications that this will change in the next 4-8 weeks. For the
time being, you can use a CoGroup with a custom UDF to implement the
semantics of a left outer join.

If you dig through the mailing list archives for the past 2-3 weeks and
search for "outer join" you will find a thread discussing the details of
the workaround implementation.

Regards,
Alexander

2015-04-26 21:07 GMT+02:00 hager sallah <[hidden email]>:

> how can handle left outer join for any two dataset this dataset inlcude
> any filed number
> example on two dataset data set one
> ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
> DataSet<Tuple4<Integer, String, String,String>>
> customer=env.readCsvFile("/home/hadoop/Desktop/Dataset/customer.csv")
> .fieldDelimiter('|')
> .includeFields("11110000").ignoreFirstLine()
> .types(Integer.class,String.class,String.class,String.class);dataset two
> ExecutionEnvironment orders =
> ExecutionEnvironment.getExecutionEnvironment();
> DataSet<Tuple3<Integer, String, String>
> customer=env.readCsvFile("/home/hadoop/Desktop/Dataset/order.csv")
> .fieldDelimiter('|')
> .includeFields("11110000").ignoreFirstLine()
> .types(Integer.class,String.class,String.classs);
>