hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure "exactly once."Does the DataSet job use the same mechanism to ensue "exactly once" if a map task is failed?thanks
|
No, it doesn't; periodic snapshots are not needed in DataSet programs, as
DataSets are of finite size and failed partitions can be replayed completely. On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <[hidden email]> wrote: > hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure > "exactly once."Does the DataSet job use the same mechanism to ensue > "exactly once" if a map task is failed?thanks > |
In reply to this post by Ma GuoWei
As Kostas mentioned the failure mechanisms for streaming and batch
processing are different, but you can expect exactly once processing guarantees from both of them. On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <[hidden email]> wrote: > hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure > "exactly once."Does the DataSet job use the same mechanism to ensue > "exactly once" if a map task is failed?thanks > |
In reply to this post by Kostas Tzoumas-2
DataExchangeMode is Piped
If Two operators use Piped Mode to exchange the data , Failed partitions have already send some data to the receiver before it failed.So Does Replaying all the failed partitions cause some duplicate records ? > Date: Thu, 9 Jul 2015 14:47:29 +0200 > Subject: Re: Does DataSet job also use Barriers to ensure "exactly once."? > From: [hidden email] > To: [hidden email] > > No, it doesn't; periodic snapshots are not needed in DataSet programs, as > DataSets are of finite size and failed partitions can be replayed > completely. > > > On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <[hidden email]> wrote: > > > hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure > > "exactly once."Does the DataSet job use the same mechanism to ensue > > "exactly once" if a map task is failed?thanks > > |
Currently, Flink restarts the entire job upon failure.
There is WIP that restricts this to all tasks involved in the pipeline of the failed task. Let's say we have pipelined MapReduce. If a mapper fails, the reducers that have received some data already have to be restarted as well. In that case, pipelined exchange works like "speculatively" starting the reducers early. It helps when no failure occurs. When a failure occurs, the reducers do still not start later than in a batch exchange mode, where they are started only once the mappers are done (and no failure can occur any more). On Thu, Jul 9, 2015 at 3:34 PM, 马国维 <[hidden email]> wrote: > DataExchangeMode is Piped > If Two operators use Piped Mode to exchange the data , Failed partitions > have already send some data to the receiver before it failed.So Does > Replaying all the failed partitions cause some duplicate records ? > > > > Date: Thu, 9 Jul 2015 14:47:29 +0200 > > Subject: Re: Does DataSet job also use Barriers to ensure "exactly > once."? > > From: [hidden email] > > To: [hidden email] > > > > No, it doesn't; periodic snapshots are not needed in DataSet programs, as > > DataSets are of finite size and failed partitions can be replayed > > completely. > > > > > > On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <[hidden email]> wrote: > > > > > hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure > > > "exactly once."Does the DataSet job use the same mechanism to ensue > > > "exactly once" if a map task is failed?thanks > > > > > |
DataSet<String> result = in.rebalance()
.map(new Mapper());In the case does the 'map' receive all the data then begin to worker?Will rebalance operator failed cause some duplicate record if the above answer is false ? > Date: Thu, 9 Jul 2015 15:40:18 +0200 > Subject: Re: Does DataSet job also use Barriers to ensure "exactly once."? > From: [hidden email] > To: [hidden email] > > Currently, Flink restarts the entire job upon failure. > > There is WIP that restricts this to all tasks involved in the pipeline of > the failed task. > > Let's say we have pipelined MapReduce. If a mapper fails, the reducers that > have received some data already have to be restarted as well. > > In that case, pipelined exchange works like "speculatively" starting the > reducers early. It helps when no failure occurs. > When a failure occurs, the reducers do still not start later than in a > batch exchange mode, where they are started only once the mappers are done > (and no failure can occur any more). > > > On Thu, Jul 9, 2015 at 3:34 PM, 马国维 <[hidden email]> wrote: > > > DataExchangeMode is Piped > > If Two operators use Piped Mode to exchange the data , Failed partitions > > have already send some data to the receiver before it failed.So Does > > Replaying all the failed partitions cause some duplicate records ? > > > > > > > Date: Thu, 9 Jul 2015 14:47:29 +0200 > > > Subject: Re: Does DataSet job also use Barriers to ensure "exactly > > once."? > > > From: [hidden email] > > > To: [hidden email] > > > > > > No, it doesn't; periodic snapshots are not needed in DataSet programs, as > > > DataSets are of finite size and failed partitions can be replayed > > > completely. > > > > > > > > > On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <[hidden email]> wrote: > > > > > > > hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure > > > > "exactly once."Does the DataSet job use the same mechanism to ensue > > > > "exactly once" if a map task is failed?thanks > > > > > > > > |
In pipelined execution, the mapper will start once it receives data. In
batch-only execution, the mapper will start once it received all data. In either case, there won't be any duplicate records. If an error occurs, the entire job will be restarted. As Stephan mentioned, we will soon have a per-task fault tolerance for non-pipelined tasks. Eventually, we also want to support pipelined jobs with persisted intermediate results. On Thu, Jul 9, 2015 at 4:13 PM, 马国维 <[hidden email]> wrote: > DataSet<String> result = in.rebalance() > .map(new Mapper());In the case does the 'map' > receive all the data then begin to worker?Will rebalance operator failed > cause some duplicate record if the above answer is false ? > > Date: Thu, 9 Jul 2015 15:40:18 +0200 > > Subject: Re: Does DataSet job also use Barriers to ensure "exactly > once."? > > From: [hidden email] > > To: [hidden email] > > > > Currently, Flink restarts the entire job upon failure. > > > > There is WIP that restricts this to all tasks involved in the pipeline of > > the failed task. > > > > Let's say we have pipelined MapReduce. If a mapper fails, the reducers > that > > have received some data already have to be restarted as well. > > > > In that case, pipelined exchange works like "speculatively" starting the > > reducers early. It helps when no failure occurs. > > When a failure occurs, the reducers do still not start later than in a > > batch exchange mode, where they are started only once the mappers are > done > > (and no failure can occur any more). > > > > > > On Thu, Jul 9, 2015 at 3:34 PM, 马国维 <[hidden email]> wrote: > > > > > DataExchangeMode is Piped > > > If Two operators use Piped Mode to exchange the data , Failed > partitions > > > have already send some data to the receiver before it failed.So Does > > > Replaying all the failed partitions cause some duplicate records ? > > > > > > > > > > Date: Thu, 9 Jul 2015 14:47:29 +0200 > > > > Subject: Re: Does DataSet job also use Barriers to ensure "exactly > > > once."? > > > > From: [hidden email] > > > > To: [hidden email] > > > > > > > > No, it doesn't; periodic snapshots are not needed in DataSet > programs, as > > > > DataSets are of finite size and failed partitions can be replayed > > > > completely. > > > > > > > > > > > > On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <[hidden email]> wrote: > > > > > > > > > hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure > > > > > "exactly once."Does the DataSet job use the same mechanism to ensue > > > > > "exactly once" if a map task is failed?thanks > > > > > > > > > > > > > |
In reply to this post by Ma GuoWei
Any operator in a batch job will receive all of its elements in one
complete successful run. The mapper starts its work immediately. On a failure, a fresh mapper is used, and all of the data is replayed. You can think of it as if there was only a single checkpoint at the very beginning (before any data was sent) that they fall back to. For mapper-internal state, there can be no duplicates. For the interaction with the outside world, there can always be duplicates, for example if the mapper inserts data into a database. The database would have data from the initial run (that failed or was canceled) and the recovery run. On Thu, Jul 9, 2015 at 4:13 PM, 马国维 <[hidden email]> wrote: > DataSet<String> result = in.rebalance() > .map(new Mapper());In the case does the 'map' > receive all the data then begin to worker?Will rebalance operator failed > cause some duplicate record if the above answer is false ? > > Date: Thu, 9 Jul 2015 15:40:18 +0200 > > Subject: Re: Does DataSet job also use Barriers to ensure "exactly > once."? > > From: [hidden email] > > To: [hidden email] > > > > Currently, Flink restarts the entire job upon failure. > > > > There is WIP that restricts this to all tasks involved in the pipeline of > > the failed task. > > > > Let's say we have pipelined MapReduce. If a mapper fails, the reducers > that > > have received some data already have to be restarted as well. > > > > In that case, pipelined exchange works like "speculatively" starting the > > reducers early. It helps when no failure occurs. > > When a failure occurs, the reducers do still not start later than in a > > batch exchange mode, where they are started only once the mappers are > done > > (and no failure can occur any more). > > > > > > On Thu, Jul 9, 2015 at 3:34 PM, 马国维 <[hidden email]> wrote: > > > > > DataExchangeMode is Piped > > > If Two operators use Piped Mode to exchange the data , Failed > partitions > > > have already send some data to the receiver before it failed.So Does > > > Replaying all the failed partitions cause some duplicate records ? > > > > > > > > > > Date: Thu, 9 Jul 2015 14:47:29 +0200 > > > > Subject: Re: Does DataSet job also use Barriers to ensure "exactly > > > once."? > > > > From: [hidden email] > > > > To: [hidden email] > > > > > > > > No, it doesn't; periodic snapshots are not needed in DataSet > programs, as > > > > DataSets are of finite size and failed partitions can be replayed > > > > completely. > > > > > > > > > > > > On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <[hidden email]> wrote: > > > > > > > > > hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure > > > > > "exactly once."Does the DataSet job use the same mechanism to ensue > > > > > "exactly once" if a map task is failed?thanks > > > > > > > > > > > > > |
thank you!I see.
thank all of you. 发自我的 iPhone > 在 2015年7月9日,下午10:48,Stephan Ewen <[hidden email]> 写道: > > Any operator in a batch job will receive all of its elements in one > complete successful run. > > The mapper starts its work immediately. On a failure, a fresh mapper is > used, and all of the data is replayed. You can think of it as if there was > only a single checkpoint at the very beginning (before any data was sent) > that they fall back to. For mapper-internal state, there can be no > duplicates. > > For the interaction with the outside world, there can always be duplicates, > for example if the mapper inserts data into a database. The database would > have data from the initial run (that failed or was canceled) and the > recovery run. > > > > >> On Thu, Jul 9, 2015 at 4:13 PM, 马国维 <[hidden email]> wrote: >> >> DataSet<String> result = in.rebalance() >> .map(new Mapper());In the case does the 'map' >> receive all the data then begin to worker?Will rebalance operator failed >> cause some duplicate record if the above answer is false ? >>> Date: Thu, 9 Jul 2015 15:40:18 +0200 >>> Subject: Re: Does DataSet job also use Barriers to ensure "exactly >> once."? >>> From: [hidden email] >>> To: [hidden email] >>> >>> Currently, Flink restarts the entire job upon failure. >>> >>> There is WIP that restricts this to all tasks involved in the pipeline of >>> the failed task. >>> >>> Let's say we have pipelined MapReduce. If a mapper fails, the reducers >> that >>> have received some data already have to be restarted as well. >>> >>> In that case, pipelined exchange works like "speculatively" starting the >>> reducers early. It helps when no failure occurs. >>> When a failure occurs, the reducers do still not start later than in a >>> batch exchange mode, where they are started only once the mappers are >> done >>> (and no failure can occur any more). >>> >>> >>>> On Thu, Jul 9, 2015 at 3:34 PM, 马国维 <[hidden email]> wrote: >>>> >>>> DataExchangeMode is Piped >>>> If Two operators use Piped Mode to exchange the data , Failed >> partitions >>>> have already send some data to the receiver before it failed.So Does >>>> Replaying all the failed partitions cause some duplicate records ? >>>> >>>> >>>>> Date: Thu, 9 Jul 2015 14:47:29 +0200 >>>>> Subject: Re: Does DataSet job also use Barriers to ensure "exactly >>>> once."? >>>>> From: [hidden email] >>>>> To: [hidden email] >>>>> >>>>> No, it doesn't; periodic snapshots are not needed in DataSet >> programs, as >>>>> DataSets are of finite size and failed partitions can be replayed >>>>> completely. >>>>> >>>>> >>>>>> On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <[hidden email]> wrote: >>>>>> >>>>>> hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure >>>>>> "exactly once."Does the DataSet job use the same mechanism to ensue >>>>>> "exactly once" if a map task is failed?thanks >> >> |
BTW: Duplicates on interaction with the outside world cannot be avoided in
the general case. If any program (batch or streaming) inserts data into some outside system (for example a database), then that outside system needs to cooperate to prevent duplicates. - Either, the system could eliminate duplicated based on a key - Or, the system must be part of the checkpoint transaction. We are working on some extension to make this very easy for outside systems that support transactions. Keep following the Flink announcements, if you want updates on this topic! On Thu, Jul 9, 2015 at 4:50 PM, 马国维 <[hidden email]> wrote: > thank you!I see. > thank all of you. > > 发自我的 iPhone > > > 在 2015年7月9日,下午10:48,Stephan Ewen <[hidden email]> 写道: > > > > Any operator in a batch job will receive all of its elements in one > > complete successful run. > > > > The mapper starts its work immediately. On a failure, a fresh mapper is > > used, and all of the data is replayed. You can think of it as if there > was > > only a single checkpoint at the very beginning (before any data was sent) > > that they fall back to. For mapper-internal state, there can be no > > duplicates. > > > > For the interaction with the outside world, there can always be > duplicates, > > for example if the mapper inserts data into a database. The database > would > > have data from the initial run (that failed or was canceled) and the > > recovery run. > > > > > > > > > >> On Thu, Jul 9, 2015 at 4:13 PM, 马国维 <[hidden email]> wrote: > >> > >> DataSet<String> result = in.rebalance() > >> .map(new Mapper());In the case does the 'map' > >> receive all the data then begin to worker?Will rebalance operator failed > >> cause some duplicate record if the above answer is false ? > >>> Date: Thu, 9 Jul 2015 15:40:18 +0200 > >>> Subject: Re: Does DataSet job also use Barriers to ensure "exactly > >> once."? > >>> From: [hidden email] > >>> To: [hidden email] > >>> > >>> Currently, Flink restarts the entire job upon failure. > >>> > >>> There is WIP that restricts this to all tasks involved in the pipeline > of > >>> the failed task. > >>> > >>> Let's say we have pipelined MapReduce. If a mapper fails, the reducers > >> that > >>> have received some data already have to be restarted as well. > >>> > >>> In that case, pipelined exchange works like "speculatively" starting > the > >>> reducers early. It helps when no failure occurs. > >>> When a failure occurs, the reducers do still not start later than in a > >>> batch exchange mode, where they are started only once the mappers are > >> done > >>> (and no failure can occur any more). > >>> > >>> > >>>> On Thu, Jul 9, 2015 at 3:34 PM, 马国维 <[hidden email]> wrote: > >>>> > >>>> DataExchangeMode is Piped > >>>> If Two operators use Piped Mode to exchange the data , Failed > >> partitions > >>>> have already send some data to the receiver before it failed.So Does > >>>> Replaying all the failed partitions cause some duplicate records ? > >>>> > >>>> > >>>>> Date: Thu, 9 Jul 2015 14:47:29 +0200 > >>>>> Subject: Re: Does DataSet job also use Barriers to ensure "exactly > >>>> once."? > >>>>> From: [hidden email] > >>>>> To: [hidden email] > >>>>> > >>>>> No, it doesn't; periodic snapshots are not needed in DataSet > >> programs, as > >>>>> DataSets are of finite size and failed partitions can be replayed > >>>>> completely. > >>>>> > >>>>> > >>>>>> On Thu, Jul 9, 2015 at 2:43 PM, 马国维 <[hidden email]> wrote: > >>>>>> > >>>>>> hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure > >>>>>> "exactly once."Does the DataSet job use the same mechanism to ensue > >>>>>> "exactly once" if a map task is failed?thanks > >> > >> > |
Free forum by Nabble | Edit this page |