Hi,
I'll try to make it quick this time. I think we need to make information about the event time of an element and information about windows in which it resides accessible to the user. A simple example would be the aggregation of some user behaviour, for example: in = clickSource() analysedData = in .window(10 minutes).every(5 minutes) .groupBy("userId") .filter(is something interesting) .sum("something") analysedData.storeToMySystem() Now the results of this window aggregation tell me that at some point, there was some window and in this window some attribute summed up to this. This might not be very helpful. What might be helpful is the information that there occurred a spike in something at 12:45 on Wednesday. Therefore, I think we need to make this information available somehow. I only have some rough Ideas about how this might work, but I would first like to discuss whether others even think this necessary. So fire away... Cheers, Aljoscha |
Hi,
This was the exact need that motivated me to rework the windowing and introduce the StreamWindow abstraction which can hold any metadata that represents the current window. At this moment it only contains a unique id but this could be extended easily. When the user created a windoweddatastream by applying some windowed tranformation he can call .getDiscretizedStream() which will return a stream of StreamWindows from which the user can extract any metadata afterwards. So this is practically something we can add already easily and no need to rewrite any logic. Cheers, Gyula On Tuesday, May 12, 2015, Aljoscha Krettek <[hidden email]> wrote: > Hi, > I'll try to make it quick this time. I think we need to make > information about the event time of an element and information about > windows in which it resides accessible to the user. A simple example > would be the aggregation of some user behaviour, for example: > > in = clickSource() > > analysedData = in > .window(10 minutes).every(5 minutes) > .groupBy("userId") > .filter(is something interesting) > .sum("something") > > analysedData.storeToMySystem() > > Now the results of this window aggregation tell me that at some point, > there was some window and in this window some attribute summed up to > this. This might not be very helpful. What might be helpful is the > information that there occurred a spike in something at 12:45 on > Wednesday. Therefore, I think we need to make this information > available somehow. > > I only have some rough Ideas about how this might work, but I would > first like to discuss whether others even think this necessary. So > fire away... > > Cheers, > Aljoscha > |
In reply to this post by Aljoscha Krettek-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Hi, I think timestamping results of a time window operator is essential. Without timestamps in the results, it is not possible to execute two time window operators one after the other. Cheers, Bruno On 12.05.2015 18:30, Aljoscha Krettek wrote: > Hi, I'll try to make it quick this time. I think we need to make > information about the event time of an element and information > about windows in which it resides accessible to the user. A simple > example would be the aggregation of some user behaviour, for > example: > > in = clickSource() > > analysedData = in .window(10 minutes).every(5 minutes) > .groupBy("userId") .filter(is something interesting) > .sum("something") > > analysedData.storeToMySystem() > > Now the results of this window aggregation tell me that at some > point, there was some window and in this window some attribute > summed up to this. This might not be very helpful. What might be > helpful is the information that there occurred a spike in something > at 12:45 on Wednesday. Therefore, I think we need to make this > information available somehow. > > I only have some rough Ideas about how this might work, but I > would first like to discuss whether others even think this > necessary. So fire away... > > Cheers, Aljoscha > - -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dr. Bruno Cadonna Postdoctoral Researcher Databases and Information Systems Department of Computer Science Humboldt-Universität zu Berlin http://www.informatik.hu-berlin.de/~cadonnab ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJVUvstAAoJEKdCIJx7flKwHBcIAIeyrcNuGUVqGYHuQmoEPj2r nFqSW6DZVwoLmqip300kfOdCWiwcnN881xOi9M0+JEaXPo0RyEfCvhsVJuJU3TS7 /9WYSWtlk81UOkkBtJef4YnQUpB4fwIQGymgwx7fHNGCaTdd5qQ9YqAuGrhpPol0 ZEcuZgQJSFGoz9k0kSkzit/2Td6xdkXesKq73ZZDNqVJvyUBu1oZyBs6DE2MHYo/ waWM5iI1G/mKvZR9WaoYcTLi0I2D0zIcEp62zRial18I1D+GCcxYVjqB2dS1eS12 eXMGBcvA+/WFW3oz2Na5f44Us7pb/RA65UminiacoOnVE7n9eLwy3miFUyb+ssE= =o4iT -----END PGP SIGNATURE----- |
This is a pretty central question, actually (timestamping the results of
windows). Let us kick off a separate thread for this... On Wed, May 13, 2015 at 9:20 AM, Bruno Cadonna < [hidden email]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > I think timestamping results of a time window operator is essential. > Without timestamps in the results, it is not possible to execute two > time window operators one after the other. > > Cheers, > Bruno > > On 12.05.2015 18:30, Aljoscha Krettek wrote: > > Hi, I'll try to make it quick this time. I think we need to make > > information about the event time of an element and information > > about windows in which it resides accessible to the user. A simple > > example would be the aggregation of some user behaviour, for > > example: > > > > in = clickSource() > > > > analysedData = in .window(10 minutes).every(5 minutes) > > .groupBy("userId") .filter(is something interesting) > > .sum("something") > > > > analysedData.storeToMySystem() > > > > Now the results of this window aggregation tell me that at some > > point, there was some window and in this window some attribute > > summed up to this. This might not be very helpful. What might be > > helpful is the information that there occurred a spike in something > > at 12:45 on Wednesday. Therefore, I think we need to make this > > information available somehow. > > > > I only have some rough Ideas about how this might work, but I > > would first like to discuss whether others even think this > > necessary. So fire away... > > > > Cheers, Aljoscha > > > > - -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Dr. Bruno Cadonna > Postdoctoral Researcher > > Databases and Information Systems > Department of Computer Science > Humboldt-Universität zu Berlin > > http://www.informatik.hu-berlin.de/~cadonnab > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > > iQEcBAEBAgAGBQJVUvstAAoJEKdCIJx7flKwHBcIAIeyrcNuGUVqGYHuQmoEPj2r > nFqSW6DZVwoLmqip300kfOdCWiwcnN881xOi9M0+JEaXPo0RyEfCvhsVJuJU3TS7 > /9WYSWtlk81UOkkBtJef4YnQUpB4fwIQGymgwx7fHNGCaTdd5qQ9YqAuGrhpPol0 > ZEcuZgQJSFGoz9k0kSkzit/2Td6xdkXesKq73ZZDNqVJvyUBu1oZyBs6DE2MHYo/ > waWM5iI1G/mKvZR9WaoYcTLi0I2D0zIcEp62zRial18I1D+GCcxYVjqB2dS1eS12 > eXMGBcvA+/WFW3oz2Na5f44Us7pb/RA65UminiacoOnVE7n9eLwy3miFUyb+ssE= > =o4iT > -----END PGP SIGNATURE----- > |
I think this is that thread :)
But as I said it is just a matter of what we want to add, and we can already do it. On Wed, May 13, 2015 at 11:37 AM, Stephan Ewen <[hidden email]> wrote: > This is a pretty central question, actually (timestamping the results of > windows). Let us kick off a separate thread for this... > > On Wed, May 13, 2015 at 9:20 AM, Bruno Cadonna < > [hidden email]> wrote: > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi, > > > > I think timestamping results of a time window operator is essential. > > Without timestamps in the results, it is not possible to execute two > > time window operators one after the other. > > > > Cheers, > > Bruno > > > > On 12.05.2015 18:30, Aljoscha Krettek wrote: > > > Hi, I'll try to make it quick this time. I think we need to make > > > information about the event time of an element and information > > > about windows in which it resides accessible to the user. A simple > > > example would be the aggregation of some user behaviour, for > > > example: > > > > > > in = clickSource() > > > > > > analysedData = in .window(10 minutes).every(5 minutes) > > > .groupBy("userId") .filter(is something interesting) > > > .sum("something") > > > > > > analysedData.storeToMySystem() > > > > > > Now the results of this window aggregation tell me that at some > > > point, there was some window and in this window some attribute > > > summed up to this. This might not be very helpful. What might be > > > helpful is the information that there occurred a spike in something > > > at 12:45 on Wednesday. Therefore, I think we need to make this > > > information available somehow. > > > > > > I only have some rough Ideas about how this might work, but I > > > would first like to discuss whether others even think this > > > necessary. So fire away... > > > > > > Cheers, Aljoscha > > > > > > > - -- > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > Dr. Bruno Cadonna > > Postdoctoral Researcher > > > > Databases and Information Systems > > Department of Computer Science > > Humboldt-Universität zu Berlin > > > > http://www.informatik.hu-berlin.de/~cadonnab > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.11 (GNU/Linux) > > > > iQEcBAEBAgAGBQJVUvstAAoJEKdCIJx7flKwHBcIAIeyrcNuGUVqGYHuQmoEPj2r > > nFqSW6DZVwoLmqip300kfOdCWiwcnN881xOi9M0+JEaXPo0RyEfCvhsVJuJU3TS7 > > /9WYSWtlk81UOkkBtJef4YnQUpB4fwIQGymgwx7fHNGCaTdd5qQ9YqAuGrhpPol0 > > ZEcuZgQJSFGoz9k0kSkzit/2Td6xdkXesKq73ZZDNqVJvyUBu1oZyBs6DE2MHYo/ > > waWM5iI1G/mKvZR9WaoYcTLi0I2D0zIcEp62zRial18I1D+GCcxYVjqB2dS1eS12 > > eXMGBcvA+/WFW3oz2Na5f44Us7pb/RA65UminiacoOnVE7n9eLwy3miFUyb+ssE= > > =o4iT > > -----END PGP SIGNATURE----- > > > |
Okay, I thought that this thread is about how to make timestamps and window
information of a record accessible to the user code. This involves how represent the information in which window a record is, whether to attach it to that records, etc The semantics of what happens if you have multiple windows after another (on time and other metrics) is orthogonal to that, as far as I can tell. This deals with the question how and if timestamps are re-assigned, how punctuations and watermarks propagate... On Wed, May 13, 2015 at 11:40 AM, Gyula Fóra <[hidden email]> wrote: > I think this is that thread :) > > But as I said it is just a matter of what we want to add, and we can > already do it. > > On Wed, May 13, 2015 at 11:37 AM, Stephan Ewen <[hidden email]> wrote: > > > This is a pretty central question, actually (timestamping the results of > > windows). Let us kick off a separate thread for this... > > > > On Wed, May 13, 2015 at 9:20 AM, Bruno Cadonna < > > [hidden email]> wrote: > > > > > -----BEGIN PGP SIGNED MESSAGE----- > > > Hash: SHA1 > > > > > > Hi, > > > > > > I think timestamping results of a time window operator is essential. > > > Without timestamps in the results, it is not possible to execute two > > > time window operators one after the other. > > > > > > Cheers, > > > Bruno > > > > > > On 12.05.2015 18:30, Aljoscha Krettek wrote: > > > > Hi, I'll try to make it quick this time. I think we need to make > > > > information about the event time of an element and information > > > > about windows in which it resides accessible to the user. A simple > > > > example would be the aggregation of some user behaviour, for > > > > example: > > > > > > > > in = clickSource() > > > > > > > > analysedData = in .window(10 minutes).every(5 minutes) > > > > .groupBy("userId") .filter(is something interesting) > > > > .sum("something") > > > > > > > > analysedData.storeToMySystem() > > > > > > > > Now the results of this window aggregation tell me that at some > > > > point, there was some window and in this window some attribute > > > > summed up to this. This might not be very helpful. What might be > > > > helpful is the information that there occurred a spike in something > > > > at 12:45 on Wednesday. Therefore, I think we need to make this > > > > information available somehow. > > > > > > > > I only have some rough Ideas about how this might work, but I > > > > would first like to discuss whether others even think this > > > > necessary. So fire away... > > > > > > > > Cheers, Aljoscha > > > > > > > > > > - -- > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > Dr. Bruno Cadonna > > > Postdoctoral Researcher > > > > > > Databases and Information Systems > > > Department of Computer Science > > > Humboldt-Universität zu Berlin > > > > > > http://www.informatik.hu-berlin.de/~cadonnab > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > -----BEGIN PGP SIGNATURE----- > > > Version: GnuPG v1.4.11 (GNU/Linux) > > > > > > iQEcBAEBAgAGBQJVUvstAAoJEKdCIJx7flKwHBcIAIeyrcNuGUVqGYHuQmoEPj2r > > > nFqSW6DZVwoLmqip300kfOdCWiwcnN881xOi9M0+JEaXPo0RyEfCvhsVJuJU3TS7 > > > /9WYSWtlk81UOkkBtJef4YnQUpB4fwIQGymgwx7fHNGCaTdd5qQ9YqAuGrhpPol0 > > > ZEcuZgQJSFGoz9k0kSkzit/2Td6xdkXesKq73ZZDNqVJvyUBu1oZyBs6DE2MHYo/ > > > waWM5iI1G/mKvZR9WaoYcTLi0I2D0zIcEp62zRial18I1D+GCcxYVjqB2dS1eS12 > > > eXMGBcvA+/WFW3oz2Na5f44Us7pb/RA65UminiacoOnVE7n9eLwy3miFUyb+ssE= > > > =o4iT > > > -----END PGP SIGNATURE----- > > > > > > |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Hi, Aljoscha originally wanted to discuss whether it is necessary <quote> to make information about the event time of an element and information about windows in which it resides accessible to the user. </quote> I guess, we all agree that this information is necessary to timestamp the results and that timestamping the result is important. Don't we? Cheers, Bruno On 13.05.2015 11:45, Stephan Ewen wrote: > Okay, I thought that this thread is about how to make timestamps > and window information of a record accessible to the user code. > This involves how represent the information in which window a > record is, whether to attach it to that records, etc > > The semantics of what happens if you have multiple windows after > another (on time and other metrics) is orthogonal to that, as far > as I can tell. This deals with the question how and if timestamps > are re-assigned, how punctuations and watermarks propagate... > > On Wed, May 13, 2015 at 11:40 AM, Gyula Fóra <[hidden email]> > wrote: > >> I think this is that thread :) >> >> But as I said it is just a matter of what we want to add, and we >> can already do it. >> >> On Wed, May 13, 2015 at 11:37 AM, Stephan Ewen <[hidden email]> >> wrote: >> >>> This is a pretty central question, actually (timestamping the >>> results of windows). Let us kick off a separate thread for >>> this... >>> >>> On Wed, May 13, 2015 at 9:20 AM, Bruno Cadonna < >>> [hidden email]> wrote: >>> > Hi, > > I think timestamping results of a time window operator is > essential. Without timestamps in the results, it is not possible to > execute two time window operators one after the other. > > Cheers, Bruno > > On 12.05.2015 18:30, Aljoscha Krettek wrote: >>>>>> Hi, I'll try to make it quick this time. I think we need >>>>>> to make information about the event time of an element >>>>>> and information about windows in which it resides >>>>>> accessible to the user. A simple example would be the >>>>>> aggregation of some user behaviour, for example: >>>>>> >>>>>> in = clickSource() >>>>>> >>>>>> analysedData = in .window(10 minutes).every(5 minutes) >>>>>> .groupBy("userId") .filter(is something interesting) >>>>>> .sum("something") >>>>>> >>>>>> analysedData.storeToMySystem() >>>>>> >>>>>> Now the results of this window aggregation tell me that >>>>>> at some point, there was some window and in this window >>>>>> some attribute summed up to this. This might not be very >>>>>> helpful. What might be helpful is the information that >>>>>> there occurred a spike in something at 12:45 on >>>>>> Wednesday. Therefore, I think we need to make this >>>>>> information available somehow. >>>>>> >>>>>> I only have some rough Ideas about how this might work, >>>>>> but I would first like to discuss whether others even >>>>>> think this necessary. So fire away... >>>>>> >>>>>> Cheers, Aljoscha >>>>>> > >>>> >>> >> > - -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dr. Bruno Cadonna Postdoctoral Researcher Databases and Information Systems Department of Computer Science Humboldt-Universität zu Berlin http://www.informatik.hu-berlin.de/~cadonnab ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJVUzbZAAoJEKdCIJx7flKwwEAH/jrhs2h2Ii1cdRWO6E0/h86Y fHztPR7aYaNjNP9ya7TnYnCDyeAvgOITYEJ5DyjpbXu1yaqJ4ZKi8HF5EZQpamRe ZGA/3gTLf+JGtI9swTHCPIVFh6BMTJ5R1L8lx4SplqHCjgqulT+TrpCEKyP1t9BA cTpL6NjtqEXBOk9pPg3IkBwTG3IPR3IlOyxIjIDsYw8O75Csp/Gb9k7QWzlN8g9B jxzOAyi5ASDBJtvJVupzSkAT/HY9ugo6+kCJGp+BWFqxyfnl0u0tVYE22r1INWjL x+qPAvF0EePMDYFeOXpRAqPnz6/R8Z8QidhsEGjiVqjctUyiXxgJYLN3RL/DYqI= =d4Zq -----END PGP SIGNATURE----- |
Free forum by Nabble | Edit this page |