Very wide csv files

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Very wide csv files

Anton Solovev
Hi,

I'm working on https://issues.apache.org/jira/browse/FLINK-2186

As I understand, Flink cannot read wide-column files in tuple, but pojo
So far we must create that pojo manually, it's convenient when count of columns not so many
When it's over thousand - hardly seems possible

To solve this issue I see these ways:

-          Create an inputFormat that reads each column with proper type serializer and keeps them in common storage like Obejct[].

And keeps meta-information about field types. Some chunks of code of an attempt https://github.com/apache/flink/compare/master...tonycox:FLINK-2186

-          Use a complex combination of Tuples or/and Pojos

-          Somehow use a code generation to create a pojo with huge field count

What do you think?

Best regards,
Anton
Reply | Threaded
Open this post in threaded view
|

Re: Very wide csv files

Flavio Pompermaier
I usually use apache commons CSV for that, as you can see here (inside the
*parseWithApacheCommonsCsv* part of the if):

https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/datasourcemanager/importers/Csv2RowExample.java

I hope this could help!
Flavio

On Wed, Nov 23, 2016 at 2:48 PM, Anton Solovev <[hidden email]>
wrote:

> Hi,
>
> I'm working on https://issues.apache.org/jira/browse/FLINK-2186
>
> As I understand, Flink cannot read wide-column files in tuple, but pojo
> So far we must create that pojo manually, it's convenient when count of
> columns not so many
> When it's over thousand - hardly seems possible
>
> To solve this issue I see these ways:
>
> -          Create an inputFormat that reads each column with proper type
> serializer and keeps them in common storage like Obejct[].
>
> And keeps meta-information about field types. Some chunks of code of an
> attempt https://github.com/apache/flink/compare/master...
> tonycox:FLINK-2186
>
> -          Use a complex combination of Tuples or/and Pojos
>
> -          Somehow use a code generation to create a pojo with huge field
> count
>
> What do you think?
>
> Best regards,
> Anton
>