i think this job would be chained completely and never do any serialization.
On 04.08.2015 14:25, Matthias J. Sax wrote: > Works for batch job, too. See enclosed. > > On 08/04/2015 01:34 PM, Matthias J. Sax wrote: >> Yes, that is was the program does. However, streaming is not lazy so >> deserialization should have happened. >> >> I will try a batch job, later today. >> >> On 08/04/2015 01:27 PM, Chesnay Schepler wrote: >>> so I'm not to much into the streaming API, but as i see it this program >>> creates an infinite number of tuples and then counts them, right? >>> >>> The problem with serialization as i understand it is that the receiver >>> can't tell how many Tuple0 are sent, since you never actually read any >>> data when deserializing a tuple. it's even more likely that it's not >>> even attempted. >>> >>> As such, I'd be curious to see what happens when you create a batch job >>> that with a limited number of starting tuples. >>> >>> On 04.08.2015 13:08, Matthias J. Sax wrote: >>>> Hi, >>>> >>>> I just opened a PR for this. https://github.com/apache/flink/pull/983 >>>> >>>> However, I was not able to "reproduce" serialization issues... I tested >>>> Tuple0 (see enclosed code) in a cluster, and the program worked. Do I >>>> miss anything? >>>> >>>> -Matthias >>>> >>>> >>>> >>>> On 08/03/2015 01:01 AM, Matthias J. Sax wrote: >>>>> Thanks for the advice about Tuple0. >>>>> >>>>> I personally don't see any advantage in having "flink-tuple" project. Do >>>>> I miss anything about it? Furthermore, I am not sure if it is a good >>>>> idea the have too many too small projects. >>>>> >>>>> >>>>> On 08/03/2015 12:48 AM, Stephan Ewen wrote: >>>>>> Tuple0 would need special serialization and comparator logic. If >>>>>> that is >>>>>> given, I see no reason not to support it. >>>>>> >>>>>> There is BTW, the request to create a dedicated "flink-tuple" >>>>>> project, that >>>>>> only contains the tuple classes. Any opinions on that? >>>>>> >>>>>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax < >>>>>> [hidden email]> wrote: >>>>>> >>>>>>> Thanks for the explanation! >>>>>>> >>>>>>> As I mentioned before, Tuple0 might also be helpful for streaming. >>>>>>> And I >>>>>>> guess I will need it for Storm compatibility layer, too. (I need to >>>>>>> double check, but Storm supports zero-attribute-tuples, too). >>>>>>> >>>>>>> With regard to the information I collected during the discussion, I >>>>>>> vote >>>>>>> for keeping Tuple0 in Flink core, and fix the serialization problem. >>>>>>> Should we have another JIRA for this? Or should I extend the existing >>>>>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457) >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> >>>>>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote: >>>>>>>> First of all, it was a really good idea to start a discussion >>>>>>>> about this. >>>>>>>> >>>>>>>> So the general idea behind Tuple0 was this: >>>>>>>> >>>>>>>> The Python API maps python tuples to flink tuples. Python can have >>>>>>>> empty >>>>>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What >>>>>>>> i did >>>>>>>> not wanna do is create some non-Tuple object to represent empty >>>>>>>> tuples, >>>>>>>> I'd rather have them treated the same, because it's less work and >>>>>>>> creates simpler code. >>>>>>>> >>>>>>>> When transferring the plan to java, certain parameters for operations >>>>>>>> are tuples, which can be empty aswell. >>>>>>>> This is where the Tuple0 class is really useful, because these empty >>>>>>>> tuples go through the same logic as other tuples. >>>>>>>> This is also why i want to keep the class, at least in the python >>>>>>>> project, for now. >>>>>>>> >>>>>>>> For the actual program execution, I need a new solution. Funny story, >>>>>>>> while writing this reply i noticed that the Python API can't handle >>>>>>>> Tuple0 at runtime aswell. ha...ha... -.- >>>>>>>> >>>>>>>> Guess I now know what I'm working on next. >>>>>>>> >>>>>>>> On 02.08.2015 21:24, Matthias J. Sax wrote: >>>>>>>>> Can you elaborate how and why Python used Tuple0? If it cannot be >>>>>>>>> serialized similar to regular Tuples, what is the usage in >>>>>>>>> Python? Right >>>>>>>>> now it seems, as there is no special serialization code for Tuple0. >>>>>>>>> >>>>>>>>> I just want to understand the topic in detail. >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> >>>>>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote: >>>>>>>>>> I think a Tuple0 cannot be implemented like the current tuples, at >>>>>>> least >>>>>>>>>> with respect to runtime serialization. >>>>>>>>>> >>>>>>>>>> The system makes the assumption that it makes progress in consuming >>>>>>>>>> bytes >>>>>>>>>> when deserializing values. If a Tuple= never consumes data from the >>>>>>> byte >>>>>>>>>> stream, this assumption is broken. It would need at least one >>>>>>>>>> marker >>>>>>>>>> byte. >>>>>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax < >>>>>>>>>> [hidden email]> wrote: >>>>>>>>>> >>>>>>>>>>> I just double checked. Scala does not have type Tuple0. IMHO, >>>>>>>>>>> it would >>>>>>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is >>>>>>>>>>> for >>>>>>>>>>> consistency reason with Scala in the first place, right? Please >>>>>>>>>>> give >>>>>>>>>>> feedback. >>>>>>>>>>> >>>>>>>>>>> -Matthias >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote: >>>>>>>>>>>> I see. >>>>>>>>>>>> >>>>>>>>>>>> I think that it might be useful to have Tuple0, because in rare >>>>>>> cases, >>>>>>>>>>>> you only want to "notify" a downstream operators (taking about >>>>>>>>>>>> streaming) that something happened but there is no actual data >>>>>>>>>>>> to be >>>>>>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it >>>>>>>>>>>> should be >>>>>>>>>>>> removed completely for consistency IMHO. >>>>>>>>>>>> >>>>>>>>>>>> I will open a JIRA for it. >>>>>>>>>>>> >>>>>>>>>>>> -Matthias >>>>>>>>>>>> >>>>>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote: >>>>>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it >>>>>>>>>>>>> could >>>>>>>>>>>>> be that the system freaks out. >>>>>>>>>>>>> >>>>>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote: >>>>>>>>>>>>>> there's no specific reason. it was added fairly recently by me >>>>>>>>>>>>>> (mid of >>>>>>>>>>>>>> april), and you're most likely the second person to use it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> i didn't integrate into all our tuple related stuff because, >>>>>>>>>>>>>> well, >>>>>>> i >>>>>>>>>>>>>> never thought anyone would actually need it, so i saved >>>>>>>>>>>>>> myself the >>>>>>>>>>>>>> trouble. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int >>>>>>>>>>>>>>> arity) >>>>>>>>>>>>>>> does >>>>>>>>>>>>>>> not support arity zero? There is a class Tuple0, but it >>>>>>>>>>>>>>> cannot be >>>>>>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing >>>>>>>>>>>>>>> feature (I >>>>>>>>>>> would >>>>>>>>>>>>>>> like to have it). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>> |
I set parallelism of map to 4 (and I double checked, that the 4 mappers
are running on different machines). Furthermore, fromElements() source has parallelism of 1. Thus, some data is going over the network for sure. On 08/04/2015 02:31 PM, Chesnay Schepler wrote: > i think this job would be chained completely and never do any > serialization. > > On 04.08.2015 14:25, Matthias J. Sax wrote: >> Works for batch job, too. See enclosed. >> >> On 08/04/2015 01:34 PM, Matthias J. Sax wrote: >>> Yes, that is was the program does. However, streaming is not lazy so >>> deserialization should have happened. >>> >>> I will try a batch job, later today. >>> >>> On 08/04/2015 01:27 PM, Chesnay Schepler wrote: >>>> so I'm not to much into the streaming API, but as i see it this program >>>> creates an infinite number of tuples and then counts them, right? >>>> >>>> The problem with serialization as i understand it is that the receiver >>>> can't tell how many Tuple0 are sent, since you never actually read any >>>> data when deserializing a tuple. it's even more likely that it's not >>>> even attempted. >>>> >>>> As such, I'd be curious to see what happens when you create a batch job >>>> that with a limited number of starting tuples. >>>> >>>> On 04.08.2015 13:08, Matthias J. Sax wrote: >>>>> Hi, >>>>> >>>>> I just opened a PR for this. https://github.com/apache/flink/pull/983 >>>>> >>>>> However, I was not able to "reproduce" serialization issues... I >>>>> tested >>>>> Tuple0 (see enclosed code) in a cluster, and the program worked. Do I >>>>> miss anything? >>>>> >>>>> -Matthias >>>>> >>>>> >>>>> >>>>> On 08/03/2015 01:01 AM, Matthias J. Sax wrote: >>>>>> Thanks for the advice about Tuple0. >>>>>> >>>>>> I personally don't see any advantage in having "flink-tuple" >>>>>> project. Do >>>>>> I miss anything about it? Furthermore, I am not sure if it is a good >>>>>> idea the have too many too small projects. >>>>>> >>>>>> >>>>>> On 08/03/2015 12:48 AM, Stephan Ewen wrote: >>>>>>> Tuple0 would need special serialization and comparator logic. If >>>>>>> that is >>>>>>> given, I see no reason not to support it. >>>>>>> >>>>>>> There is BTW, the request to create a dedicated "flink-tuple" >>>>>>> project, that >>>>>>> only contains the tuple classes. Any opinions on that? >>>>>>> >>>>>>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax < >>>>>>> [hidden email]> wrote: >>>>>>> >>>>>>>> Thanks for the explanation! >>>>>>>> >>>>>>>> As I mentioned before, Tuple0 might also be helpful for streaming. >>>>>>>> And I >>>>>>>> guess I will need it for Storm compatibility layer, too. (I need to >>>>>>>> double check, but Storm supports zero-attribute-tuples, too). >>>>>>>> >>>>>>>> With regard to the information I collected during the discussion, I >>>>>>>> vote >>>>>>>> for keeping Tuple0 in Flink core, and fix the serialization >>>>>>>> problem. >>>>>>>> Should we have another JIRA for this? Or should I extend the >>>>>>>> existing >>>>>>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457) >>>>>>>> >>>>>>>> -Matthias >>>>>>>> >>>>>>>> >>>>>>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote: >>>>>>>>> First of all, it was a really good idea to start a discussion >>>>>>>>> about this. >>>>>>>>> >>>>>>>>> So the general idea behind Tuple0 was this: >>>>>>>>> >>>>>>>>> The Python API maps python tuples to flink tuples. Python can have >>>>>>>>> empty >>>>>>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What >>>>>>>>> i did >>>>>>>>> not wanna do is create some non-Tuple object to represent empty >>>>>>>>> tuples, >>>>>>>>> I'd rather have them treated the same, because it's less work and >>>>>>>>> creates simpler code. >>>>>>>>> >>>>>>>>> When transferring the plan to java, certain parameters for >>>>>>>>> operations >>>>>>>>> are tuples, which can be empty aswell. >>>>>>>>> This is where the Tuple0 class is really useful, because these >>>>>>>>> empty >>>>>>>>> tuples go through the same logic as other tuples. >>>>>>>>> This is also why i want to keep the class, at least in the python >>>>>>>>> project, for now. >>>>>>>>> >>>>>>>>> For the actual program execution, I need a new solution. Funny >>>>>>>>> story, >>>>>>>>> while writing this reply i noticed that the Python API can't >>>>>>>>> handle >>>>>>>>> Tuple0 at runtime aswell. ha...ha... -.- >>>>>>>>> >>>>>>>>> Guess I now know what I'm working on next. >>>>>>>>> >>>>>>>>> On 02.08.2015 21:24, Matthias J. Sax wrote: >>>>>>>>>> Can you elaborate how and why Python used Tuple0? If it cannot be >>>>>>>>>> serialized similar to regular Tuples, what is the usage in >>>>>>>>>> Python? Right >>>>>>>>>> now it seems, as there is no special serialization code for >>>>>>>>>> Tuple0. >>>>>>>>>> >>>>>>>>>> I just want to understand the topic in detail. >>>>>>>>>> >>>>>>>>>> -Matthias >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote: >>>>>>>>>>> I think a Tuple0 cannot be implemented like the current >>>>>>>>>>> tuples, at >>>>>>>> least >>>>>>>>>>> with respect to runtime serialization. >>>>>>>>>>> >>>>>>>>>>> The system makes the assumption that it makes progress in >>>>>>>>>>> consuming >>>>>>>>>>> bytes >>>>>>>>>>> when deserializing values. If a Tuple= never consumes data >>>>>>>>>>> from the >>>>>>>> byte >>>>>>>>>>> stream, this assumption is broken. It would need at least one >>>>>>>>>>> marker >>>>>>>>>>> byte. >>>>>>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a >>>>>>>>>>> tuple0. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax < >>>>>>>>>>> [hidden email]> wrote: >>>>>>>>>>> >>>>>>>>>>>> I just double checked. Scala does not have type Tuple0. IMHO, >>>>>>>>>>>> it would >>>>>>>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is >>>>>>>>>>>> for >>>>>>>>>>>> consistency reason with Scala in the first place, right? Please >>>>>>>>>>>> give >>>>>>>>>>>> feedback. >>>>>>>>>>>> >>>>>>>>>>>> -Matthias >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote: >>>>>>>>>>>>> I see. >>>>>>>>>>>>> >>>>>>>>>>>>> I think that it might be useful to have Tuple0, because in >>>>>>>>>>>>> rare >>>>>>>> cases, >>>>>>>>>>>>> you only want to "notify" a downstream operators (taking about >>>>>>>>>>>>> streaming) that something happened but there is no actual data >>>>>>>>>>>>> to be >>>>>>>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it >>>>>>>>>>>>> should be >>>>>>>>>>>>> removed completely for consistency IMHO. >>>>>>>>>>>>> >>>>>>>>>>>>> I will open a JIRA for it. >>>>>>>>>>>>> >>>>>>>>>>>>> -Matthias >>>>>>>>>>>>> >>>>>>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote: >>>>>>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a >>>>>>>>>>>>>> program, it >>>>>>>>>>>>>> could >>>>>>>>>>>>>> be that the system freaks out. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote: >>>>>>>>>>>>>>> there's no specific reason. it was added fairly recently >>>>>>>>>>>>>>> by me >>>>>>>>>>>>>>> (mid of >>>>>>>>>>>>>>> april), and you're most likely the second person to use it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> i didn't integrate into all our tuple related stuff because, >>>>>>>>>>>>>>> well, >>>>>>>> i >>>>>>>>>>>>>>> never thought anyone would actually need it, so i saved >>>>>>>>>>>>>>> myself the >>>>>>>>>>>>>>> trouble. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int >>>>>>>>>>>>>>>> arity) >>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>> not support arity zero? There is a class Tuple0, but it >>>>>>>>>>>>>>>> cannot be >>>>>>>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing >>>>>>>>>>>>>>>> feature (I >>>>>>>>>>>> would >>>>>>>>>>>>>>>> like to have it). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>> > |
Free forum by Nabble | Edit this page |