Hi guys,
I was debugging an inputFormat and I discovered that there's no way to understand how many records have been processed in a split. So I added a counter in my input format incremented every nextRecord..do you think adding something to similar like "public int getProcessedRecordsCount()" to InputFormat interface could be useful? Or are you going to manage this count stat from the caller of nextRecord? Best, Flavio |
Hi Flavio,
we have a few recently started efforts to implement the collection of monitoring and runtime/data statistics. Counting the number of elements emitted by an operator (or data source) will be included. Do you want to count the number of produced tuples for monitoring the progress or do you see a different use case? 2014-11-28 9:37 GMT+01:00 Flavio Pompermaier <[hidden email]>: > Hi guys, > > I was debugging an inputFormat and I discovered that there's no way to > understand how many records have been processed in a split. > So I added a counter in my input format incremented every nextRecord..do > you think adding something to similar like "public int > getProcessedRecordsCount()" to InputFormat interface could be useful? > Or are you going to manage this count stat from the caller of nextRecord? > > Best, > Flavio > |
In my specific use case I was intererested in understanding why the scans
of the splits were taking a long time, so I was intrested in getting statistics about the number of records contained in each split and the rate/speed of its reading..do you think it could be something useful in general? On Dec 2, 2014 9:56 PM, "Fabian Hueske" <[hidden email]> wrote: > Hi Flavio, > > we have a few recently started efforts to implement the collection of > monitoring and runtime/data statistics. > Counting the number of elements emitted by an operator (or data source) > will be included. > > Do you want to count the number of produced tuples for monitoring the > progress or do you see a different use case? > > 2014-11-28 9:37 GMT+01:00 Flavio Pompermaier <[hidden email]>: > > > Hi guys, > > > > I was debugging an inputFormat and I discovered that there's no way to > > understand how many records have been processed in a split. > > So I added a counter in my input format incremented every nextRecord..do > > you think adding something to similar like "public int > > getProcessedRecordsCount()" to InputFormat interface could be useful? > > Or are you going to manage this count stat from the caller of nextRecord? > > > > Best, > > Flavio > > > |
Yes, sure.
Tracking records per split and UDF exec time per call (min, max, avg, or histogram) would be valuable information when debugging the performance of a program. 2014-12-02 22:08 GMT+01:00 Flavio Pompermaier <[hidden email]>: > In my specific use case I was intererested in understanding why the scans > of the splits were taking a long time, so I was intrested in getting > statistics about the number of records contained in each split and the > rate/speed of its reading..do you think it could be something useful in > general? > On Dec 2, 2014 9:56 PM, "Fabian Hueske" <[hidden email]> wrote: > > > Hi Flavio, > > > > we have a few recently started efforts to implement the collection of > > monitoring and runtime/data statistics. > > Counting the number of elements emitted by an operator (or data source) > > will be included. > > > > Do you want to count the number of produced tuples for monitoring the > > progress or do you see a different use case? > > > > 2014-11-28 9:37 GMT+01:00 Flavio Pompermaier <[hidden email]>: > > > > > Hi guys, > > > > > > I was debugging an inputFormat and I discovered that there's no way to > > > understand how many records have been processed in a split. > > > So I added a counter in my input format incremented every > nextRecord..do > > > you think adding something to similar like "public int > > > getProcessedRecordsCount()" to InputFormat interface could be useful? > > > Or are you going to manage this count stat from the caller of > nextRecord? > > > > > > Best, > > > Flavio > > > > > > |
Free forum by Nabble | Edit this page |