Hello Everyone,
I would like to log certain stats during iterations in a bulk iterative job. The way I do this is store the things I want at each iteration and plan to flush everything to HDFS once all the iterations are done. To do that I would need to know when the last iteration is invoked in order to flush the data. However, the close() method in the RichMapFunction is executed at the end of each iteration. Is there anyway to know when I am in the last invocation? Or would you have a better suggestion to achieve what I am trying to do? Thank you and best regards, Tran Nam-Luc |
Hey Tran Nam-Luc,
there is currently no way to do this. The iteration sync tasks keeps track of iteration convergence/max number of iterations and signals termination to the iteration head. After this, the head flushes the produced result to the next task (after the iteration) and the intermediate iteration tasks finish w/o calling close again. Because there is no "final" no-op iteration happening, the iteration tasks don't know when the last iteration happened. I'm not sure what the best way is to implement this at the moment. What kind of stats are you recording? – Ufuk On 15 Jun 2015, at 15:53, Nam-Luc Tran <[hidden email]> wrote: > Hello Everyone, > > I would like to log certain stats during iterations in a bulk > iterative job. The way I do this is store the things I want at each > iteration and plan to flush everything to HDFS once all the iterations > are done. To do that I would need to know when the last iteration is > invoked in order to flush the data. However, the close() method in the > RichMapFunction is executed at the end of each iteration. > > Is there anyway to know when I am in the last invocation? Or would you > have a better suggestion to achieve what I am trying to do? > > Thank you and best regards, > > Tran Nam-Luc > > |
In reply to this post by Nam-Luc Tran
Hi Ufuk,
The kind of things we'd like to log are: time spent in the iteration, residual of the algorithm (convergence), current iteration. Best regards, Tran Nam-Luc At Monday, 15/06/2015 on 16:15 Ufuk Celebi wrote: Hey Tran Nam-Luc, there is currently no way to do this. The iteration sync tasks keeps track of iteration convergence/max number of iterations and signals termination to the iteration head. After this, the head flushes the produced result to the next task (after the iteration) and the intermediate iteration tasks finish w/o calling close again. Because there is no "final" no-op iteration happening, the iteration tasks don't know when the last iteration happened. I'm not sure what the best way is to implement this at the moment. What kind of stats are you recording? – Ufuk On 15 Jun 2015, at 15:53, Nam-Luc Tran wrote: > Hello Everyone, > > I would like to log certain stats during iterations in a bulk > iterative job. The way I do this is store the things I want at each > iteration and plan to flush everything to HDFS once all the iterations > are done. To do that I would need to know when the last iteration is > invoked in order to flush the data. However, the close() method in the > RichMapFunction is executed at the end of each iteration. > > Is there anyway to know when I am in the last invocation? Or would you > have a better suggestion to achieve what I am trying to do? > > Thank you and best regards, > > Tran Nam-Luc > > |
Are you running a fixed number of iterations or do you use a dynamic
termination criterion? For fixed iterations, you can get the id of the current iteration ... which allows you to find out when you are running the last iterations. Would it be feasible for you to just log these statistics to the log file? You can retrieve the statistics once the job has finished. On Mon, Jun 15, 2015 at 7:32 AM, Nam-Luc Tran <[hidden email]> wrote: > Hi Ufuk, > > The kind of things we'd like to log are: time spent in the iteration, > residual of the algorithm (convergence), current iteration. > > Best regards, > > Tran Nam-Luc > > > At Monday, 15/06/2015 on 16:15 Ufuk Celebi wrote: > > Hey Tran Nam-Luc, > > there is currently no way to do this. > > The iteration sync tasks keeps track of iteration convergence/max > number of iterations and signals termination to the iteration head. > After this, the head flushes the produced result to the next task > (after the iteration) and the intermediate iteration tasks finish w/o > calling close again. > > Because there is no "final" no-op iteration happening, the iteration > tasks don't know when the last iteration happened. > > I'm not sure what the best way is to implement this at the moment. > > What kind of stats are you recording? > > – Ufuk > > On 15 Jun 2015, at 15:53, Nam-Luc Tran wrote: > > > Hello Everyone, > > > > I would like to log certain stats during iterations in a bulk > > iterative job. The way I do this is store the things I want at each > > iteration and plan to flush everything to HDFS once all the > iterations > > are done. To do that I would need to know when the last iteration is > > invoked in order to flush the data. However, the close() method in > the > > RichMapFunction is executed at the end of each iteration. > > > > Is there anyway to know when I am in the last invocation? Or would > you > > have a better suggestion to achieve what I am trying to do? > > > > Thank you and best regards, > > > > Tran Nam-Luc > > > > > > > |
Hi Nam-Luc!
Having per-iteration statistics and accumulators is on the roadmap. The way I have done this so far is to create accumulators like shown below, which creates a new accumulator for each superstep: class MyFunction extends RichMapFunction<Long, Long>{ private LongCounter counter; public void open(Configuration cfg) { counter = getRuntimeContext().getLongCounter("counter" + getIterationRuntimeContext().getSuperstepNumber()) } . . . } On Sun, Jun 21, 2015 at 1:35 AM, Robert Metzger <[hidden email]> wrote: > Are you running a fixed number of iterations or do you use a dynamic > termination criterion? > For fixed iterations, you can get the id of the current iteration ... which > allows you to find out when you are running the last iterations. > > Would it be feasible for you to just log these statistics to the log file? > You can retrieve the statistics once the job has finished. > > On Mon, Jun 15, 2015 at 7:32 AM, Nam-Luc Tran <[hidden email]> > wrote: > > > Hi Ufuk, > > > > The kind of things we'd like to log are: time spent in the iteration, > > residual of the algorithm (convergence), current iteration. > > > > Best regards, > > > > Tran Nam-Luc > > > > > > At Monday, 15/06/2015 on 16:15 Ufuk Celebi wrote: > > > > Hey Tran Nam-Luc, > > > > there is currently no way to do this. > > > > The iteration sync tasks keeps track of iteration convergence/max > > number of iterations and signals termination to the iteration head. > > After this, the head flushes the produced result to the next task > > (after the iteration) and the intermediate iteration tasks finish w/o > > calling close again. > > > > Because there is no "final" no-op iteration happening, the iteration > > tasks don't know when the last iteration happened. > > > > I'm not sure what the best way is to implement this at the moment. > > > > What kind of stats are you recording? > > > > – Ufuk > > > > On 15 Jun 2015, at 15:53, Nam-Luc Tran wrote: > > > > > Hello Everyone, > > > > > > I would like to log certain stats during iterations in a bulk > > > iterative job. The way I do this is store the things I want at each > > > iteration and plan to flush everything to HDFS once all the > > iterations > > > are done. To do that I would need to know when the last iteration is > > > invoked in order to flush the data. However, the close() method in > > the > > > RichMapFunction is executed at the end of each iteration. > > > > > > Is there anyway to know when I am in the last invocation? Or would > > you > > > have a better suggestion to achieve what I am trying to do? > > > > > > Thank you and best regards, > > > > > > Tran Nam-Luc > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |