Question regarding end-to-end exactly once guarantee implementation using
2PC? As I understand how it operates, the pre-phase state is when the checkpoint is initiated and the checkpoint barrier advances from source to sink. Once the pre-phase is complete (and successful), then the next step in the process is where the "Sink" operator is expected to "Commit" the transaction (the data that was part of the checkpointed state). The datasource backed by the Sink operator is expected to commit the transaction. Say if the transaction times out and data cannot be committed, is there an option to roll back the checkpointed state without incurring the data loss? As of now, it does not work this way but I am trying to understand the challenges/limitations with respect to discarding the checkpoint in question? Once reason could be with respect to (what to do with) the subsequent checkpoints that might have advanced during this scenario? Anything else? Regards Vijay -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ |
Pinging again on this thread to see if anyone has any recommendations?
Particularly, I am interested to understand whether this scenario is also applicable for Kafka Connector? -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ |
In reply to this post by vijikarthi
Hi,
Sorry for late answer. > As I understand how it operates, the pre-phase state is when the checkpoint > is initiated and the checkpoint barrier advances from source to sink. Once > the pre-phase is complete (and successful), then the next step in the > process is where the "Sink" operator is expected to "Commit" the transaction > (the data that was part of the checkpointed state). The datasource backed by > the Sink operator is expected to commit the transaction. Yes you are correct. However keep in mind, that before we move to “Commit” phase, all of the participants of the pre commit must have completed it successfully. > Say if the transaction times out and data cannot be committed, is there an > option to roll back the checkpointed state without incurring the data loss? > As of now, it does not work this way but I am trying to understand the > challenges/limitations with respect to discarding the checkpoint in > question? No, it is impossible to rollback the data in case of timeout, because some of the other actors (different parallel instances of the sink) could have already committed the data. Once everybody acknowledged “pre-commit” AND the job master started “commit” phase, there is no going back, all commits must succeed eventually (intermittent failures are allowed) or there will be a data loss. > > Once reason could be with respect to (what to do with) the subsequent > checkpoints that might have advanced during this scenario? Anything else? Sorry, I’m not sure if understood the question. Depending on an external system, but generally if a transaction has timed out’ed, the data is lost, regardless of whether there was a more recent batch of records written in some more recent transaction, that could have be committed successfully. Piotrek > On 22 May 2019, at 02:30, vijikarthi <[hidden email]> wrote: > > Question regarding end-to-end exactly once guarantee implementation using > 2PC? > > As I understand how it operates, the pre-phase state is when the checkpoint > is initiated and the checkpoint barrier advances from source to sink. Once > the pre-phase is complete (and successful), then the next step in the > process is where the "Sink" operator is expected to "Commit" the transaction > (the data that was part of the checkpointed state). The datasource backed by > the Sink operator is expected to commit the transaction. > > Say if the transaction times out and data cannot be committed, is there an > option to roll back the checkpointed state without incurring the data loss? > As of now, it does not work this way but I am trying to understand the > challenges/limitations with respect to discarding the checkpoint in > question? > > Once reason could be with respect to (what to do with) the subsequent > checkpoints that might have advanced during this scenario? Anything else? > > Regards > Vijay > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ |
Free forum by Nabble | Edit this page |