Re: Greetings and question

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Greetings and question

Tzu-Li (Gordon) Tai
Hi Robert,

Thanks for your interest in contributing that.
AFAIK, I don’t think there’s any ongoing efforts yet in an ORC table sink. I’ll loop in Fabian (CC'ed) who might know more about this.
The only complicated consideration in designing sinks is to consider the delivery guarantees it will provide and how to provide them using Flink’s checkpointing mechanism.
I would suggest to open a JIRA (if there isn’t one already) and elaborate the details there to collect feedback before jumping right in.

Cheers,
Gordon

On 17 July 2017 at 3:47:02 AM, Robert Rapplean ([hidden email]) wrote:

Hey, everyone.  

I have a need for Flink to write to ORCFile tables in the near future.  
Could someone educate me on the current challenges that might make that  
hard to do? I've worked quite a bit with the HCat libraries, and may be  
overconfident about how complicated this is. Is anyone currently working on  
the issue?  

I'd go ahead and submit a Jira ticket for this, but am deterred by the  
thought that someone should have already created such a ticket, and  
wondering why it isn't already there. It may be a priority thing, but this  
is my personal priority at the moment.  

Best,  

Robert  
Reply | Threaded
Open this post in threaded view
|

Re: Greetings and question

Fabian Hueske-2
Hi Robert,

I don't think anybody is working on a ORC file sink.
Are you interested in a sink for data streams or a batch sink?

Implementing a batch sink shouldn't be very hard.
You can either implement an OutputFormat the internally uses the ORC Java
API or you try to use Flink's HadoopOutputFormat which can wrap Hadoop
OutputFormats.

If you need a streaming ORC sink, things become a bit more challenging
because you would need to integrate the sink with Flink's checkpointing
mechanism.
I would recommend to have a look at the BucketingSink and it's JavaDocs.

Best,
Fabian

2017-07-17 6:55 GMT+02:00 Tzu-Li (Gordon) Tai <[hidden email]>:

> Hi Robert,
>
> Thanks for your interest in contributing that.
> AFAIK, I don’t think there’s any ongoing efforts yet in an ORC table sink.
> I’ll loop in Fabian (CC'ed) who might know more about this.
> The only complicated consideration in designing sinks is to consider the
> delivery guarantees it will provide and how to provide them using Flink’s
> checkpointing mechanism.
> I would suggest to open a JIRA (if there isn’t one already) and elaborate
> the details there to collect feedback before jumping right in.
>
> Cheers,
> Gordon
>
> On 17 July 2017 at 3:47:02 AM, Robert Rapplean (
> [hidden email]) wrote:
>
> Hey, everyone.
>
> I have a need for Flink to write to ORCFile tables in the near future.
> Could someone educate me on the current challenges that might make that
> hard to do? I've worked quite a bit with the HCat libraries, and may be
> overconfident about how complicated this is. Is anyone currently working
> on
> the issue?
>
> I'd go ahead and submit a Jira ticket for this, but am deterred by the
> thought that someone should have already created such a ticket, and
> wondering why it isn't already there. It may be a priority thing, but this
> is my personal priority at the moment.
>
> Best,
>
> Robert
>
>