How to clean up resources in a UDF?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to clean up resources in a UDF?

Boyuan Zhang
Hi team,

I'm building a UDF by implementing AbstractRichFunction, where I want to do
some resource cleanup per input element when the processing result is
committed. I can perform such cleanup in streaming by implementing
*CheckpointListener.notifyCheckpointComplete() *but it seems like there is
no checkpoint mechanism in batch processing.
I'm wondering is* AbstractRichFunction.close() *the good place to do so?
How does flink deal with fault tolerance in batch?

Thanks for your help!
Reply | Threaded
Open this post in threaded view
|

Re: How to clean up resources in a UDF?

Aljoscha Krettek-2
Hi!

Yes, AbstractRichFunction.close() would be the right place to do
cleanup. This method is called both in case of successful finishing and
also in the case of failures.

For BATCH execution, Flink will do backtracking upwards from the failed
task(s) to see if intermediate results from previous tasks are still
available. If they are available, computation can restart from there.
Otherwise the whole job will have to be restarted.

Best,
Aljoscha

On 28.09.20 21:44, Boyuan Zhang wrote:

> Hi team,
>
> I'm building a UDF by implementing AbstractRichFunction, where I want to do
> some resource cleanup per input element when the processing result is
> committed. I can perform such cleanup in streaming by implementing
> *CheckpointListener.notifyCheckpointComplete() *but it seems like there is
> no checkpoint mechanism in batch processing.
> I'm wondering is* AbstractRichFunction.close() *the good place to do so?
> How does flink deal with fault tolerance in batch?
>
> Thanks for your help!
>

Reply | Threaded
Open this post in threaded view
|

Re: How to clean up resources in a UDF?

Boyuan Zhang
Thanks, Aljoscha! That's really helpful.

I think I only want to do my cleanup when the task successfully finishes,
which means the cleanup should only be invoked when the task is
guaranteed not to be executed again in one given batch execution. Is there
any way to do so?

Thanks for your help!

On Thu, Oct 1, 2020 at 2:55 AM Aljoscha Krettek <[hidden email]> wrote:

> Hi!
>
> Yes, AbstractRichFunction.close() would be the right place to do
> cleanup. This method is called both in case of successful finishing and
> also in the case of failures.
>
> For BATCH execution, Flink will do backtracking upwards from the failed
> task(s) to see if intermediate results from previous tasks are still
> available. If they are available, computation can restart from there.
> Otherwise the whole job will have to be restarted.
>
> Best,
> Aljoscha
>
> On 28.09.20 21:44, Boyuan Zhang wrote:
> > Hi team,
> >
> > I'm building a UDF by implementing AbstractRichFunction, where I want to
> do
> > some resource cleanup per input element when the processing result is
> > committed. I can perform such cleanup in streaming by implementing
> > *CheckpointListener.notifyCheckpointComplete() *but it seems like there
> is
> > no checkpoint mechanism in batch processing.
> > I'm wondering is* AbstractRichFunction.close() *the good place to do so?
> > How does flink deal with fault tolerance in batch?
> >
> > Thanks for your help!
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to clean up resources in a UDF?

Aljoscha Krettek-2
Unfortunately, there is no such hook right now. However, we're working
on this in the context of FLIP-134 [1] and FLIP-143 [2].

Best,
Aljoscha

[1] https://cwiki.apache.org/confluence/x/4i94CQ
[2] https://cwiki.apache.org/confluence/x/KEJ4CQ

On 01.10.20 20:35, Boyuan Zhang wrote:

> Thanks, Aljoscha! That's really helpful.
>
> I think I only want to do my cleanup when the task successfully finishes,
> which means the cleanup should only be invoked when the task is
> guaranteed not to be executed again in one given batch execution. Is there
> any way to do so?
>
> Thanks for your help!
>
> On Thu, Oct 1, 2020 at 2:55 AM Aljoscha Krettek <[hidden email]> wrote:
>
>> Hi!
>>
>> Yes, AbstractRichFunction.close() would be the right place to do
>> cleanup. This method is called both in case of successful finishing and
>> also in the case of failures.
>>
>> For BATCH execution, Flink will do backtracking upwards from the failed
>> task(s) to see if intermediate results from previous tasks are still
>> available. If they are available, computation can restart from there.
>> Otherwise the whole job will have to be restarted.
>>
>> Best,
>> Aljoscha
>>
>> On 28.09.20 21:44, Boyuan Zhang wrote:
>>> Hi team,
>>>
>>> I'm building a UDF by implementing AbstractRichFunction, where I want to
>> do
>>> some resource cleanup per input element when the processing result is
>>> committed. I can perform such cleanup in streaming by implementing
>>> *CheckpointListener.notifyCheckpointComplete() *but it seems like there
>> is
>>> no checkpoint mechanism in batch processing.
>>> I'm wondering is* AbstractRichFunction.close() *the good place to do so?
>>> How does flink deal with fault tolerance in batch?
>>>
>>> Thanks for your help!
>>>
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: How to clean up resources in a UDF?

Boyuan Zhang
Thanks, Aljoscha! The context is really helpful!

On Fri, Oct 2, 2020 at 1:19 AM Aljoscha Krettek <[hidden email]> wrote:

> Unfortunately, there is no such hook right now. However, we're working
> on this in the context of FLIP-134 [1] and FLIP-143 [2].
>
> Best,
> Aljoscha
>
> [1] https://cwiki.apache.org/confluence/x/4i94CQ
> [2] https://cwiki.apache.org/confluence/x/KEJ4CQ
>
> On 01.10.20 20:35, Boyuan Zhang wrote:
> > Thanks, Aljoscha! That's really helpful.
> >
> > I think I only want to do my cleanup when the task successfully finishes,
> > which means the cleanup should only be invoked when the task is
> > guaranteed not to be executed again in one given batch execution. Is
> there
> > any way to do so?
> >
> > Thanks for your help!
> >
> > On Thu, Oct 1, 2020 at 2:55 AM Aljoscha Krettek <[hidden email]>
> wrote:
> >
> >> Hi!
> >>
> >> Yes, AbstractRichFunction.close() would be the right place to do
> >> cleanup. This method is called both in case of successful finishing and
> >> also in the case of failures.
> >>
> >> For BATCH execution, Flink will do backtracking upwards from the failed
> >> task(s) to see if intermediate results from previous tasks are still
> >> available. If they are available, computation can restart from there.
> >> Otherwise the whole job will have to be restarted.
> >>
> >> Best,
> >> Aljoscha
> >>
> >> On 28.09.20 21:44, Boyuan Zhang wrote:
> >>> Hi team,
> >>>
> >>> I'm building a UDF by implementing AbstractRichFunction, where I want
> to
> >> do
> >>> some resource cleanup per input element when the processing result is
> >>> committed. I can perform such cleanup in streaming by implementing
> >>> *CheckpointListener.notifyCheckpointComplete() *but it seems like there
> >> is
> >>> no checkpoint mechanism in batch processing.
> >>> I'm wondering is* AbstractRichFunction.close() *the good place to do
> so?
> >>> How does flink deal with fault tolerance in batch?
> >>>
> >>> Thanks for your help!
> >>>
> >>
> >>
> >
>
>