Ignore operator failure

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Ignore operator failure

Dominik Wosiński
Hey,
I have a question that I have not been able to find an answer for in the
docs nor in any other source. Suppose we have a business system and we are
using Elasticsearch sink, but not for the purpose of business case, but
rather for keeping info on the data that is flowing through the system. The
Elasticsearch part is not crucial for the application, thus I would like to
keep application running even if the elastic itself is failing (for example
due to the external system being down). Is there a way to exclude some task
from checkpointing and ignore it's failure, so that the job is not
restarted if only one of the sinks is down ??

Thanks in advance,
Best Regards,
Dom.
Reply | Threaded
Open this post in threaded view
|

Re: Ignore operator failure

vino yang
Hi Dom,

If you consider ignoring checkpoint failures, you can use this API:
setTolerableCheckpointFailureNumber[1].
But for Jobs with checkpoints enabled and failed operators containing
states, Flink can't ignore these failures without restarting Jobs.
Subsequent regional recovery may be appropriate for your scenario.
At this stage, if you don't want to restart because of non-critical
operators, you may need to customize related implementations so that those
exceptions are not thrown to the Flink framework.

Best,
Vino

[1]:
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L319



Dominik Wosiński <[hidden email]> 于2019年10月14日周一 下午8:16写道:

> Hey,
> I have a question that I have not been able to find an answer for in the
> docs nor in any other source. Suppose we have a business system and we are
> using Elasticsearch sink, but not for the purpose of business case, but
> rather for keeping info on the data that is flowing through the system. The
> Elasticsearch part is not crucial for the application, thus I would like to
> keep application running even if the elastic itself is failing (for example
> due to the external system being down). Is there a way to exclude some task
> from checkpointing and ignore it's failure, so that the job is not
> restarted if only one of the sinks is down ??
>
> Thanks in advance,
> Best Regards,
> Dom.
>