Tuple

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Tuple

Chesnay Schepler
i think this job would be chained completely and never do any serialization.

On 04.08.2015 14:25, Matthias J. Sax wrote:

> Works for batch job, too. See enclosed.
>
> On 08/04/2015 01:34 PM, Matthias J. Sax wrote:
>> Yes, that is was the program does. However, streaming is not lazy so
>> deserialization should have happened.
>>
>> I will try a batch job, later today.
>>
>> On 08/04/2015 01:27 PM, Chesnay Schepler wrote:
>>> so I'm not to much into the streaming API, but as i see it this program
>>> creates an infinite number of tuples and then counts them, right?
>>>
>>> The problem with serialization as i understand it is that the receiver
>>> can't tell how many Tuple0 are sent, since you never actually read any
>>> data when deserializing a tuple. it's even more likely that it's not
>>> even attempted.
>>>
>>> As such, I'd be curious to see what happens when you create a batch job
>>> that with a limited number of starting tuples.
>>>
>>> On 04.08.2015 13:08, Matthias J. Sax wrote:
>>>> Hi,
>>>>
>>>> I just opened a PR for this. https://github.com/apache/flink/pull/983
>>>>
>>>> However, I was not able to "reproduce" serialization issues... I tested
>>>> Tuple0 (see enclosed code) in a cluster, and the program worked. Do I
>>>> miss anything?
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>> On 08/03/2015 01:01 AM, Matthias J. Sax wrote:
>>>>> Thanks for the advice about Tuple0.
>>>>>
>>>>> I personally don't see any advantage in having "flink-tuple" project. Do
>>>>> I miss anything about it? Furthermore, I am not sure if it is a good
>>>>> idea the have too many too small projects.
>>>>>
>>>>>
>>>>> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
>>>>>> Tuple0 would need special serialization and comparator logic. If
>>>>>> that is
>>>>>> given, I see no reason not to support it.
>>>>>>
>>>>>> There is BTW, the request to create a dedicated "flink-tuple"
>>>>>> project, that
>>>>>> only contains the tuple classes. Any opinions on that?
>>>>>>
>>>>>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
>>>>>> [hidden email]> wrote:
>>>>>>
>>>>>>> Thanks for the explanation!
>>>>>>>
>>>>>>> As I mentioned before, Tuple0 might also be helpful for streaming.
>>>>>>> And I
>>>>>>> guess I will need it for Storm compatibility layer, too. (I need to
>>>>>>> double check, but Storm supports zero-attribute-tuples, too).
>>>>>>>
>>>>>>> With regard to the information I collected during the discussion, I
>>>>>>> vote
>>>>>>> for keeping Tuple0 in Flink core, and fix the serialization problem.
>>>>>>> Should we have another JIRA for this? Or should I extend the existing
>>>>>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
>>>>>>>> First of all, it was a really good idea to start a discussion
>>>>>>>> about this.
>>>>>>>>
>>>>>>>> So the general idea behind Tuple0 was this:
>>>>>>>>
>>>>>>>> The Python API maps python tuples to flink tuples. Python can have
>>>>>>>> empty
>>>>>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What
>>>>>>>> i did
>>>>>>>> not wanna do is create some non-Tuple object to represent empty
>>>>>>>> tuples,
>>>>>>>> I'd rather have them treated the same, because it's less work and
>>>>>>>> creates simpler code.
>>>>>>>>
>>>>>>>> When transferring the plan to java, certain parameters for operations
>>>>>>>> are tuples, which can be empty aswell.
>>>>>>>> This is where the Tuple0 class is really useful, because these empty
>>>>>>>> tuples go through the same logic as other tuples.
>>>>>>>> This is also why i want to keep the class, at least in the python
>>>>>>>> project, for now.
>>>>>>>>
>>>>>>>> For the actual program execution, I need a new solution. Funny story,
>>>>>>>> while writing this reply i noticed that the Python API can't handle
>>>>>>>> Tuple0 at runtime aswell. ha...ha... -.-
>>>>>>>>
>>>>>>>> Guess I now know what I'm working on next.
>>>>>>>>
>>>>>>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
>>>>>>>>> Can you elaborate how and why Python used Tuple0? If it cannot be
>>>>>>>>> serialized similar to regular Tuples, what is the usage in
>>>>>>>>> Python? Right
>>>>>>>>> now it seems, as there is no special serialization code for Tuple0.
>>>>>>>>>
>>>>>>>>> I just want to understand the topic in detail.
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>>>>>>>>> I think a Tuple0 cannot be implemented like the current tuples, at
>>>>>>> least
>>>>>>>>>> with respect to runtime serialization.
>>>>>>>>>>
>>>>>>>>>> The system makes the assumption that it makes progress in consuming
>>>>>>>>>> bytes
>>>>>>>>>> when deserializing values. If a Tuple= never consumes data from the
>>>>>>> byte
>>>>>>>>>> stream, this assumption is broken. It would need at least one
>>>>>>>>>> marker
>>>>>>>>>> byte.
>>>>>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a tuple0.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>>> I just double checked. Scala does not have type Tuple0. IMHO,
>>>>>>>>>>> it would
>>>>>>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is
>>>>>>>>>>> for
>>>>>>>>>>> consistency reason with Scala in the first place, right? Please
>>>>>>>>>>> give
>>>>>>>>>>> feedback.
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>>>>>>>>> I see.
>>>>>>>>>>>>
>>>>>>>>>>>> I think that it might be useful to have Tuple0, because in rare
>>>>>>> cases,
>>>>>>>>>>>> you only want to "notify" a downstream operators (taking about
>>>>>>>>>>>> streaming) that something happened but there is no actual data
>>>>>>>>>>>> to be
>>>>>>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it
>>>>>>>>>>>> should be
>>>>>>>>>>>> removed completely for consistency IMHO.
>>>>>>>>>>>>
>>>>>>>>>>>> I will open a JIRA for it.
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a program, it
>>>>>>>>>>>>> could
>>>>>>>>>>>>> be that the system freaks out.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>>>>>>>>>> there's no specific reason. it was added fairly recently by me
>>>>>>>>>>>>>> (mid of
>>>>>>>>>>>>>> april), and you're most likely the second person to use it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i didn't integrate into all our tuple related stuff because,
>>>>>>>>>>>>>> well,
>>>>>>> i
>>>>>>>>>>>>>> never thought anyone would actually need it, so i saved
>>>>>>>>>>>>>> myself the
>>>>>>>>>>>>>> trouble.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int
>>>>>>>>>>>>>>> arity)
>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>> not support arity zero? There is a class Tuple0, but it
>>>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing
>>>>>>>>>>>>>>> feature (I
>>>>>>>>>>> would
>>>>>>>>>>>>>>> like to have it).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>

Reply | Threaded
Open this post in threaded view
|

Re: Tuple

Matthias J. Sax
I set parallelism of map to 4 (and I double checked, that the 4 mappers
are running on different machines). Furthermore, fromElements() source
has parallelism of 1. Thus, some data is going over the network for sure.


On 08/04/2015 02:31 PM, Chesnay Schepler wrote:

> i think this job would be chained completely and never do any
> serialization.
>
> On 04.08.2015 14:25, Matthias J. Sax wrote:
>> Works for batch job, too. See enclosed.
>>
>> On 08/04/2015 01:34 PM, Matthias J. Sax wrote:
>>> Yes, that is was the program does. However, streaming is not lazy so
>>> deserialization should have happened.
>>>
>>> I will try a batch job, later today.
>>>
>>> On 08/04/2015 01:27 PM, Chesnay Schepler wrote:
>>>> so I'm not to much into the streaming API, but as i see it this program
>>>> creates an infinite number of tuples and then counts them, right?
>>>>
>>>> The problem with serialization as i understand it is that the receiver
>>>> can't tell how many Tuple0 are sent, since you never actually read any
>>>> data when deserializing a tuple. it's even more likely that it's not
>>>> even attempted.
>>>>
>>>> As such, I'd be curious to see what happens when you create a batch job
>>>> that with a limited number of starting tuples.
>>>>
>>>> On 04.08.2015 13:08, Matthias J. Sax wrote:
>>>>> Hi,
>>>>>
>>>>> I just opened a PR for this. https://github.com/apache/flink/pull/983
>>>>>
>>>>> However, I was not able to "reproduce" serialization issues... I
>>>>> tested
>>>>> Tuple0 (see enclosed code) in a cluster, and the program worked. Do I
>>>>> miss anything?
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>>
>>>>> On 08/03/2015 01:01 AM, Matthias J. Sax wrote:
>>>>>> Thanks for the advice about Tuple0.
>>>>>>
>>>>>> I personally don't see any advantage in having "flink-tuple"
>>>>>> project. Do
>>>>>> I miss anything about it? Furthermore, I am not sure if it is a good
>>>>>> idea the have too many too small projects.
>>>>>>
>>>>>>
>>>>>> On 08/03/2015 12:48 AM, Stephan Ewen wrote:
>>>>>>> Tuple0 would need special serialization and comparator logic. If
>>>>>>> that is
>>>>>>> given, I see no reason not to support it.
>>>>>>>
>>>>>>> There is BTW, the request to create a dedicated "flink-tuple"
>>>>>>> project, that
>>>>>>> only contains the tuple classes. Any opinions on that?
>>>>>>>
>>>>>>> On Mon, Aug 3, 2015 at 12:45 AM, Matthias J. Sax <
>>>>>>> [hidden email]> wrote:
>>>>>>>
>>>>>>>> Thanks for the explanation!
>>>>>>>>
>>>>>>>> As I mentioned before, Tuple0 might also be helpful for streaming.
>>>>>>>> And I
>>>>>>>> guess I will need it for Storm compatibility layer, too. (I need to
>>>>>>>> double check, but Storm supports zero-attribute-tuples, too).
>>>>>>>>
>>>>>>>> With regard to the information I collected during the discussion, I
>>>>>>>> vote
>>>>>>>> for keeping Tuple0 in Flink core, and fix the serialization
>>>>>>>> problem.
>>>>>>>> Should we have another JIRA for this? Or should I extend the
>>>>>>>> existing
>>>>>>>> JIRA? (https://issues.apache.org/jira/browse/FLINK-2457)
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/03/2015 12:22 AM, Chesnay Schepler wrote:
>>>>>>>>> First of all, it was a really good idea to start a discussion
>>>>>>>>> about this.
>>>>>>>>>
>>>>>>>>> So the general idea behind Tuple0 was this:
>>>>>>>>>
>>>>>>>>> The Python API maps python tuples to flink tuples. Python can have
>>>>>>>>> empty
>>>>>>>>> tuples, so i thought "well duh, let's make a Tuple0 class!". What
>>>>>>>>> i did
>>>>>>>>> not wanna do is create some non-Tuple object to represent empty
>>>>>>>>> tuples,
>>>>>>>>> I'd rather have them treated the same, because it's less work and
>>>>>>>>> creates simpler code.
>>>>>>>>>
>>>>>>>>> When transferring the plan to java, certain parameters for
>>>>>>>>> operations
>>>>>>>>> are tuples, which can be empty aswell.
>>>>>>>>> This is where the Tuple0 class is really useful, because these
>>>>>>>>> empty
>>>>>>>>> tuples go through the same logic as other tuples.
>>>>>>>>> This is also why i want to keep the class, at least in the python
>>>>>>>>> project, for now.
>>>>>>>>>
>>>>>>>>> For the actual program execution, I need a new solution. Funny
>>>>>>>>> story,
>>>>>>>>> while writing this reply i noticed that the Python API can't
>>>>>>>>> handle
>>>>>>>>> Tuple0 at runtime aswell. ha...ha... -.-
>>>>>>>>>
>>>>>>>>> Guess I now know what I'm working on next.
>>>>>>>>>
>>>>>>>>> On 02.08.2015 21:24, Matthias J. Sax wrote:
>>>>>>>>>> Can you elaborate how and why Python used Tuple0? If it cannot be
>>>>>>>>>> serialized similar to regular Tuples, what is the usage in
>>>>>>>>>> Python? Right
>>>>>>>>>> now it seems, as there is no special serialization code for
>>>>>>>>>> Tuple0.
>>>>>>>>>>
>>>>>>>>>> I just want to understand the topic in detail.
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 08/01/2015 03:38 PM, Stephan Ewen wrote:
>>>>>>>>>>> I think a Tuple0 cannot be implemented like the current
>>>>>>>>>>> tuples, at
>>>>>>>> least
>>>>>>>>>>> with respect to runtime serialization.
>>>>>>>>>>>
>>>>>>>>>>> The system makes the assumption that it makes progress in
>>>>>>>>>>> consuming
>>>>>>>>>>> bytes
>>>>>>>>>>> when deserializing values. If a Tuple= never consumes data
>>>>>>>>>>> from the
>>>>>>>> byte
>>>>>>>>>>> stream, this assumption is broken. It would need at least one
>>>>>>>>>>> marker
>>>>>>>>>>> byte.
>>>>>>>>>>> Then it effectively is a Tuple1<Byte> disgusing itself as a
>>>>>>>>>>> tuple0.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Aug 1, 2015 at 1:38 PM, Matthias J. Sax <
>>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I just double checked. Scala does not have type Tuple0. IMHO,
>>>>>>>>>>>> it would
>>>>>>>>>>>> be best to remove Tuple0 for consistency. Having Tuple types is
>>>>>>>>>>>> for
>>>>>>>>>>>> consistency reason with Scala in the first place, right? Please
>>>>>>>>>>>> give
>>>>>>>>>>>> feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 08/01/2015 01:04 PM, Matthias J. Sax wrote:
>>>>>>>>>>>>> I see.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that it might be useful to have Tuple0, because in
>>>>>>>>>>>>> rare
>>>>>>>> cases,
>>>>>>>>>>>>> you only want to "notify" a downstream operators (taking about
>>>>>>>>>>>>> streaming) that something happened but there is no actual data
>>>>>>>>>>>>> to be
>>>>>>>>>>>>> processed. Furthermore, if Flink cannot deal with Tuple0 it
>>>>>>>>>>>>> should be
>>>>>>>>>>>>> removed completely for consistency IMHO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will open a JIRA for it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/31/2015 10:44 PM, Chesnay Schepler wrote:
>>>>>>>>>>>>>> also, I'm not sure if I ever sent a Tuple0 through a
>>>>>>>>>>>>>> program, it
>>>>>>>>>>>>>> could
>>>>>>>>>>>>>> be that the system freaks out.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 31.07.2015 22:40, Chesnay Schepler wrote:
>>>>>>>>>>>>>>> there's no specific reason. it was added fairly recently
>>>>>>>>>>>>>>> by me
>>>>>>>>>>>>>>> (mid of
>>>>>>>>>>>>>>> april), and you're most likely the second person to use it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> i didn't integrate into all our tuple related stuff because,
>>>>>>>>>>>>>>> well,
>>>>>>>> i
>>>>>>>>>>>>>>> never thought anyone would actually need it, so i saved
>>>>>>>>>>>>>>> myself the
>>>>>>>>>>>>>>> trouble.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is there any specific reason, why Tuple.getTupleClass(int
>>>>>>>>>>>>>>>> arity)
>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>> not support arity zero? There is a class Tuple0, but it
>>>>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>>>>> generator by Tuple.getTupleClass(...). Is it a missing
>>>>>>>>>>>>>>>> feature (I
>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> like to have it).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>
>


signature.asc (836 bytes) Download Attachment
12