Iteration stats logging

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Iteration stats logging

Nam-Luc Tran
Hello Everyone,

I would like to log certain stats during iterations in a bulk
iterative job. The way I do this is store the things I want at each
iteration and plan to flush everything to HDFS once all the iterations
are done. To do that I would need to know when the last iteration is
invoked in order to flush the data. However, the close() method in the
RichMapFunction is executed at the end of each iteration.

Is there anyway to know when I am in the last invocation? Or would you
have a better suggestion to achieve what I am trying to do?

Thank you and best regards,

Tran Nam-Luc 


Reply | Threaded
Open this post in threaded view
|

Re: Iteration stats logging

Ufuk Celebi-2
Hey Tran Nam-Luc,

there is currently no way to do this.

The iteration sync tasks keeps track of iteration convergence/max number of iterations and signals termination to the iteration head. After this, the head flushes the produced result to the next task (after the iteration) and the intermediate iteration tasks finish w/o calling close again.

Because there is no "final" no-op iteration happening, the iteration tasks don't know when the last iteration happened.

I'm not sure what the best way is to implement this at the moment.

What kind of stats are you recording?

– Ufuk

On 15 Jun 2015, at 15:53, Nam-Luc Tran <[hidden email]> wrote:

> Hello Everyone,
>
> I would like to log certain stats during iterations in a bulk
> iterative job. The way I do this is store the things I want at each
> iteration and plan to flush everything to HDFS once all the iterations
> are done. To do that I would need to know when the last iteration is
> invoked in order to flush the data. However, the close() method in the
> RichMapFunction is executed at the end of each iteration.
>
> Is there anyway to know when I am in the last invocation? Or would you
> have a better suggestion to achieve what I am trying to do?
>
> Thank you and best regards,
>
> Tran Nam-Luc
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Iteration stats logging

Nam-Luc Tran
In reply to this post by Nam-Luc Tran
Hi Ufuk,

The kind of things we'd like to log are: time spent in the iteration,
residual of the algorithm (convergence), current iteration.

Best regards,

Tran Nam-Luc
 

At Monday, 15/06/2015 on 16:15 Ufuk Celebi wrote:

Hey Tran Nam-Luc,

there is currently no way to do this.

The iteration sync tasks keeps track of iteration convergence/max
number of iterations and signals termination to the iteration head.
After this, the head flushes the produced result to the next task
(after the iteration) and the intermediate iteration tasks finish w/o
calling close again.

Because there is no "final" no-op iteration happening, the iteration
tasks don't know when the last iteration happened.

I'm not sure what the best way is to implement this at the moment.

What kind of stats are you recording?

– Ufuk

On 15 Jun 2015, at 15:53, Nam-Luc Tran  wrote:

> Hello Everyone,
>
> I would like to log certain stats during iterations in a bulk
> iterative job. The way I do this is store the things I want at each
> iteration and plan to flush everything to HDFS once all the
iterations
> are done. To do that I would need to know when the last iteration is
> invoked in order to flush the data. However, the close() method in
the
> RichMapFunction is executed at the end of each iteration.
>
> Is there anyway to know when I am in the last invocation? Or would
you
> have a better suggestion to achieve what I am trying to do?
>
> Thank you and best regards,
>
> Tran Nam-Luc
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Iteration stats logging

Robert Metzger
Are you running a fixed number of iterations or do you use a dynamic
termination criterion?
For fixed iterations, you can get the id of the current iteration ... which
allows you to find out when you are running the last iterations.

Would it be feasible for you to just log these statistics to the log file?
You can retrieve the statistics once the job has finished.

On Mon, Jun 15, 2015 at 7:32 AM, Nam-Luc Tran <[hidden email]>
wrote:

> Hi Ufuk,
>
> The kind of things we'd like to log are: time spent in the iteration,
> residual of the algorithm (convergence), current iteration.
>
> Best regards,
>
> Tran Nam-Luc
>
>
> At Monday, 15/06/2015 on 16:15 Ufuk Celebi wrote:
>
> Hey Tran Nam-Luc,
>
> there is currently no way to do this.
>
> The iteration sync tasks keeps track of iteration convergence/max
> number of iterations and signals termination to the iteration head.
> After this, the head flushes the produced result to the next task
> (after the iteration) and the intermediate iteration tasks finish w/o
> calling close again.
>
> Because there is no "final" no-op iteration happening, the iteration
> tasks don't know when the last iteration happened.
>
> I'm not sure what the best way is to implement this at the moment.
>
> What kind of stats are you recording?
>
> – Ufuk
>
> On 15 Jun 2015, at 15:53, Nam-Luc Tran  wrote:
>
> > Hello Everyone,
> >
> > I would like to log certain stats during iterations in a bulk
> > iterative job. The way I do this is store the things I want at each
> > iteration and plan to flush everything to HDFS once all the
> iterations
> > are done. To do that I would need to know when the last iteration is
> > invoked in order to flush the data. However, the close() method in
> the
> > RichMapFunction is executed at the end of each iteration.
> >
> > Is there anyway to know when I am in the last invocation? Or would
> you
> > have a better suggestion to achieve what I am trying to do?
> >
> > Thank you and best regards,
> >
> > Tran Nam-Luc
> >
> >
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Iteration stats logging

Stephan Ewen
Hi Nam-Luc!

Having per-iteration statistics and accumulators is on the roadmap.

The way I have done this so far is to create accumulators like shown below,
which creates a new accumulator for each superstep:


class MyFunction extends RichMapFunction<Long, Long>{

    private LongCounter counter;

    public void open(Configuration cfg) {
        counter = getRuntimeContext().getLongCounter("counter" +
getIterationRuntimeContext().getSuperstepNumber())
    }

    . . .
}




On Sun, Jun 21, 2015 at 1:35 AM, Robert Metzger <[hidden email]> wrote:

> Are you running a fixed number of iterations or do you use a dynamic
> termination criterion?
> For fixed iterations, you can get the id of the current iteration ... which
> allows you to find out when you are running the last iterations.
>
> Would it be feasible for you to just log these statistics to the log file?
> You can retrieve the statistics once the job has finished.
>
> On Mon, Jun 15, 2015 at 7:32 AM, Nam-Luc Tran <[hidden email]>
> wrote:
>
> > Hi Ufuk,
> >
> > The kind of things we'd like to log are: time spent in the iteration,
> > residual of the algorithm (convergence), current iteration.
> >
> > Best regards,
> >
> > Tran Nam-Luc
> >
> >
> > At Monday, 15/06/2015 on 16:15 Ufuk Celebi wrote:
> >
> > Hey Tran Nam-Luc,
> >
> > there is currently no way to do this.
> >
> > The iteration sync tasks keeps track of iteration convergence/max
> > number of iterations and signals termination to the iteration head.
> > After this, the head flushes the produced result to the next task
> > (after the iteration) and the intermediate iteration tasks finish w/o
> > calling close again.
> >
> > Because there is no "final" no-op iteration happening, the iteration
> > tasks don't know when the last iteration happened.
> >
> > I'm not sure what the best way is to implement this at the moment.
> >
> > What kind of stats are you recording?
> >
> > – Ufuk
> >
> > On 15 Jun 2015, at 15:53, Nam-Luc Tran  wrote:
> >
> > > Hello Everyone,
> > >
> > > I would like to log certain stats during iterations in a bulk
> > > iterative job. The way I do this is store the things I want at each
> > > iteration and plan to flush everything to HDFS once all the
> > iterations
> > > are done. To do that I would need to know when the last iteration is
> > > invoked in order to flush the data. However, the close() method in
> > the
> > > RichMapFunction is executed at the end of each iteration.
> > >
> > > Is there anyway to know when I am in the last invocation? Or would
> > you
> > > have a better suggestion to achieve what I am trying to do?
> > >
> > > Thank you and best regards,
> > >
> > > Tran Nam-Luc
> > >
> > >
> >
> >
> >
>