Keeping around temp datasets

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Keeping around temp datasets

aalexandrov
Hi there,

I have to implement some generic fallback strategy on top of a more
abstract DSL in order to keep datasets in a temp space (e.g. Tachyon). My
implementation is based on the 0.8 release. At the moment I am undecided
between three options:

   - BinaryInputFormat / BinaryOutputFormat
   - AvroInputFormat / AvroOutputformat
   - Something better (?)

What are your suggestions?
Reply | Threaded
Open this post in threaded view
|

Re: Keeping around temp datasets

Robert Metzger
I would not recommend using the AvroInput/Output format because its meant
to be used with Avro types (usually POJOs generated from an Avro schema).

I would use the TypeSerializerInputFormat / OutputFormat. Then you can be
sure that its able to read/write all types supported by our system.

On Tue, Jan 20, 2015 at 7:51 PM, Alexander Alexandrov <
[hidden email]> wrote:

> Hi there,
>
> I have to implement some generic fallback strategy on top of a more
> abstract DSL in order to keep datasets in a temp space (e.g. Tachyon). My
> implementation is based on the 0.8 release. At the moment I am undecided
> between three options:
>
>    - BinaryInputFormat / BinaryOutputFormat
>    - AvroInputFormat / AvroOutputformat
>    - Something better (?)
>
> What are your suggestions?
>
Reply | Threaded
Open this post in threaded view
|

Re: Keeping around temp datasets

Stephan Ewen
I agree, the type serializer IO formats should be the best match. They
would also work rather efficient.

On Tue, Jan 20, 2015 at 2:18 PM, Robert Metzger <[hidden email]> wrote:

> I would not recommend using the AvroInput/Output format because its meant
> to be used with Avro types (usually POJOs generated from an Avro schema).
>
> I would use the TypeSerializerInputFormat / OutputFormat. Then you can be
> sure that its able to read/write all types supported by our system.
>
> On Tue, Jan 20, 2015 at 7:51 PM, Alexander Alexandrov <
> [hidden email]> wrote:
>
> > Hi there,
> >
> > I have to implement some generic fallback strategy on top of a more
> > abstract DSL in order to keep datasets in a temp space (e.g. Tachyon). My
> > implementation is based on the 0.8 release. At the moment I am undecided
> > between three options:
> >
> >    - BinaryInputFormat / BinaryOutputFormat
> >    - AvroInputFormat / AvroOutputformat
> >    - Something better (?)
> >
> > What are your suggestions?
> >
>