load avro example

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

load avro example

Martin Neumann
Hej,

I'm looking for some example code on how to load a avro formatted data set.

I have the Avro Schema (its horrible twisted and nested on several levels)
and I want to load that into java classes to make it easier to process.

I'm using Flink 0.7 latest snapshot.


thanks for the help
cheers Martin
Reply | Threaded
Open this post in threaded view
|

Re: load avro example

Stephan Ewen
Hi Martin!

Using Avro-formatted data should be straightforward with the
AvroInputformat (
https://github.com/apache/incubator-flink/blob/master/flink-addons/flink-avro/src/main/java/org/apache/flink/api/java/io/AvroInputFormat.java
)

Just add "flink-avro" as a Maven dependency and create a DataSet using

ExecutionEnvironment.createInput(new AvroInputFormat<MyAvroType>(new Path(
"/path/to/file"), MyAvroType.class));

You can also have a look at line 225 here:
https://github.com/apache/incubator-flink/blob/master/flink-addons/flink-avro/src/test/java/org/apache/flink/api/avro/testjar/AvroExternalJarProgram.java#L225

Greetings,
Stephan


On Mon, Sep 22, 2014 at 2:16 PM, Martin Neumann <[hidden email]>
wrote:

> Hej,
>
> I'm looking for some example code on how to load a avro formatted data set.
>
> I have the Avro Schema (its horrible twisted and nested on several levels)
> and I want to load that into java classes to make it easier to process.
>
> I'm using Flink 0.7 latest snapshot.
>
>
> thanks for the help
> cheers Martin
>
Reply | Threaded
Open this post in threaded view
|

Re: load avro example

Martin Neumann
Hej,

Thanks for the example.

Is there a way to auto create the class (myUser in the example) directly
from the schema? The schema I have is 4 pages long, transforming it into a
class by hand will be painful.

cheers Martin

On Mon, Sep 22, 2014 at 2:34 PM, Stephan Ewen <[hidden email]> wrote:

> Hi Martin!
>
> Using Avro-formatted data should be straightforward with the
> AvroInputformat (
>
> https://github.com/apache/incubator-flink/blob/master/flink-addons/flink-avro/src/main/java/org/apache/flink/api/java/io/AvroInputFormat.java
> )
>
> Just add "flink-avro" as a Maven dependency and create a DataSet using
>
> ExecutionEnvironment.createInput(new AvroInputFormat<MyAvroType>(new Path(
> "/path/to/file"), MyAvroType.class));
>
> You can also have a look at line 225 here:
>
> https://github.com/apache/incubator-flink/blob/master/flink-addons/flink-avro/src/test/java/org/apache/flink/api/avro/testjar/AvroExternalJarProgram.java#L225
>
> Greetings,
> Stephan
>
>
> On Mon, Sep 22, 2014 at 2:16 PM, Martin Neumann <[hidden email]>
> wrote:
>
> > Hej,
> >
> > I'm looking for some example code on how to load a avro formatted data
> set.
> >
> > I have the Avro Schema (its horrible twisted and nested on several
> levels)
> > and I want to load that into java classes to make it easier to process.
> >
> > I'm using Flink 0.7 latest snapshot.
> >
> >
> > thanks for the help
> > cheers Martin
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: load avro example

Robert Metzger
Hi,

Yes, have a look here:
http://avro.apache.org/docs/1.7.7/gettingstartedjava.html
You can either add the avro maven plugin into your project or use the avro
tools for that.
You can also extract the schema from your avro files if you don't have it
as JSON.


By the way, I've added a JIRA ticket to document the avro input and the
other formats https://issues.apache.org/jira/browse/FLINK-1107.


On Mon, Sep 22, 2014 at 2:41 PM, Martin Neumann <[hidden email]>
wrote:

> Hej,
>
> Thanks for the example.
>
> Is there a way to auto create the class (myUser in the example) directly
> from the schema? The schema I have is 4 pages long, transforming it into a
> class by hand will be painful.
>
> cheers Martin
>
> On Mon, Sep 22, 2014 at 2:34 PM, Stephan Ewen <[hidden email]> wrote:
>
> > Hi Martin!
> >
> > Using Avro-formatted data should be straightforward with the
> > AvroInputformat (
> >
> >
> https://github.com/apache/incubator-flink/blob/master/flink-addons/flink-avro/src/main/java/org/apache/flink/api/java/io/AvroInputFormat.java
> > )
> >
> > Just add "flink-avro" as a Maven dependency and create a DataSet using
> >
> > ExecutionEnvironment.createInput(new AvroInputFormat<MyAvroType>(new
> Path(
> > "/path/to/file"), MyAvroType.class));
> >
> > You can also have a look at line 225 here:
> >
> >
> https://github.com/apache/incubator-flink/blob/master/flink-addons/flink-avro/src/test/java/org/apache/flink/api/avro/testjar/AvroExternalJarProgram.java#L225
> >
> > Greetings,
> > Stephan
> >
> >
> > On Mon, Sep 22, 2014 at 2:16 PM, Martin Neumann <[hidden email]>
> > wrote:
> >
> > > Hej,
> > >
> > > I'm looking for some example code on how to load a avro formatted data
> > set.
> > >
> > > I have the Avro Schema (its horrible twisted and nested on several
> > levels)
> > > and I want to load that into java classes to make it easier to process.
> > >
> > > I'm using Flink 0.7 latest snapshot.
> > >
> > >
> > > thanks for the help
> > > cheers Martin
> > >
> >
>