(DEPRECATED) Apache Flink Mailing List archive.

Check pointing for simple pipeline

Classic

List

Threaded

3 messages Options

Prasanna kumar

Jul 07, 2020; 12:43pm

Check pointing for simple pipeline

14 posts

Hi ,

I have pipeline. Source-> Map(JSON transform)-> Sink..

Both source and sink are Kafka.

What is the best checkpoint ing mechanism?

Is setting checkpoints incremental a good option? What should be careful
of?

I am running it on aws emr.

Will checkpoint slow the speed?

Thanks,
Prasanna.

Yun Tang

Jul 08, 2020; 3:25am

Re: Check pointing for simple pipeline

147 posts

Hi Prasanna

Using incremental checkpoint is always better than not as this is faster and less memory consumed.
However, incremental checkpoint is only supported by RocksDB state-backend.

Best
Yun Tang
________________________________
From: Prasanna kumar <[hidden email]>
Sent: Tuesday, July 7, 2020 20:43
To: [hidden email] <[hidden email]>; user <[hidden email]>
Subject: Check pointing for simple pipeline

Hi ,

I have pipeline. Source-> Map(JSON transform)-> Sink..

Both source and sink are Kafka.

What is the best checkpoint ing mechanism?

Is setting checkpoints incremental a good option? What should be careful of?

I am running it on aws emr.

Will checkpoint slow the speed?

Thanks,
Prasanna.

dwysakowicz

Jul 09, 2020; 9:40am

Re: Check pointing for simple pipeline

305 posts

Hi Prasanna,

I'd like to add my two cents here. I would not say using the incremental checkpoint is always the best choice. It might have its downsides when restoring from the checkpoint as it will have to apply all the deltas. Therefore restoring from a non-incremental checkpoint might be faster.

As Yun Tang, mentioned the incremental checkpoints are supported by RocksDB only. You don't necessarily need the RocksDB state backend in all cases. If you are sure that the state will fit into the memory (it is probably the case for such a simple job, especially if the map function is stateless), you should be good with the Filesystem state backend[1]. This state backend should be faster as it does not need to spill anything to disk and keeps everything in a deserialized form during the runtime.

You might also find this short post[2] helpful.

Best,

Dawid

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/state_backends.html#the-fsstatebackend

[2] https://www.ververica.com/blog/stateful-stream-processing-apache-flink-state-backends

On 08/07/2020 05:25, Yun Tang wrote:

Hi Prasanna

Using incremental checkpoint is always better than not as this is faster and less memory consumed.

However, incremental checkpoint is only supported by RocksDB state-backend.

Best

Yun Tang

From: Prasanna kumar [hidden email]
Sent: Tuesday, July 7, 2020 20:43
To: [hidden email] [hidden email]; user [hidden email]
Subject: Check pointing for simple pipeline

Hi ,

I have pipeline. Source-> Map(JSON transform)-> Sink..

Both source and sink are Kafka.

What is the best checkpoint ing mechanism?

Is setting checkpoints incremental a good option? What should be careful of?

I am running it on aws emr.

Will checkpoint slow the speed?

Thanks,

Prasanna.

... [show rest of quote]

signature.asc (849 bytes) Download Attachment