Hello to my squirrels,
I ran into an NPE for some iterations code and it looks like what's described in FLINK-2443 <https://issues.apache.org/jira/browse/FLINK-2443>. I'm trying to understand the problem and I could really use your help :) So far, it seems that the exception is caused by a null value returned by CompactingHashTable.*getMatchFor*(PT probeSideRecord). This method returns null in the following cases: - when the hash table is "closed" - when the segment is done - if the serializer actually returns a null record It seems that on the join/cogroup driver side there is no check or special handling when the build side record is null, i.e. the null record is still passed to the join function. Is this correct and if not, what should the driver do in this case? Thank you! Cheers, Vasia. |
Hi Vasia,
I looked into the code. A serializer should never return null when deserializing. Either it does not detect that something went wrong with the deserialization or it should throw an exception. Regarding the handling of null returns in the Drivers. If there is no entry in the HT for a certain key, the HT will return null which is expected. If a CoGroupWithSolutionSet*Driver receives a null value, it gives an empty iterator to the user function. The JoinWithSolutionSet*Driver calls the join function with a null value. Both behaviors are expected. A join with a solution set is actually an outer join and a join function in such a join needs to be able to handle null values on the solution set side. Cheers, Fabian 2015-09-15 17:41 GMT+02:00 Vasiliki Kalavri <[hidden email]>: > Hello to my squirrels, > > I ran into an NPE for some iterations code and it looks like what's > described in FLINK-2443 <https://issues.apache.org/jira/browse/FLINK-2443 > >. > I'm trying to understand the problem and I could really use your help :) > > So far, it seems that the exception is caused by a null value returned by > CompactingHashTable.*getMatchFor*(PT probeSideRecord). > > This method returns null in the following cases: > - when the hash table is "closed" > - when the segment is done > - if the serializer actually returns a null record > > It seems that on the join/cogroup driver side there is no check or special > handling when the build side record is null, i.e. the null record is still > passed to the join function. > Is this correct and if not, what should the driver do in this case? > > Thank you! > > Cheers, > Vasia. > |
Hi,
thanks a lot Fabian! I didn't know that join with the solution set is an outer join. That's a surprise :) So, if I understand correctly, I should have a null value when my other input to the join contains some key that doesn't exist in the solution set, right? That's not the case in my application; I'm not generating any new keys. Also, when setting the solutionSetUnManaged option, the exception doesn't occur anymore. Are the join semantics different when the solution set is in unmanaged memory? Cheers, Vasia. On 16 September 2015 at 16:50, Fabian Hueske <[hidden email]> wrote: > Hi Vasia, > > I looked into the code. A serializer should never return null when > deserializing. Either it does not detect that something went wrong with the > deserialization or it should throw an exception. > > Regarding the handling of null returns in the Drivers. If there is no entry > in the HT for a certain key, the HT will return null which is expected. > If a CoGroupWithSolutionSet*Driver receives a null value, it gives an empty > iterator to the user function. The JoinWithSolutionSet*Driver calls the > join function with a null value. Both behaviors are expected. A join with a > solution set is actually an outer join and a join function in such a join > needs to be able to handle null values on the solution set side. > > Cheers, Fabian > > > 2015-09-15 17:41 GMT+02:00 Vasiliki Kalavri <[hidden email]>: > > > Hello to my squirrels, > > > > I ran into an NPE for some iterations code and it looks like what's > > described in FLINK-2443 < > https://issues.apache.org/jira/browse/FLINK-2443 > > >. > > I'm trying to understand the problem and I could really use your help :) > > > > So far, it seems that the exception is caused by a null value returned by > > CompactingHashTable.*getMatchFor*(PT probeSideRecord). > > > > This method returns null in the following cases: > > - when the hash table is "closed" > > - when the segment is done > > - if the serializer actually returns a null record > > > > It seems that on the join/cogroup driver side there is no check or > special > > handling when the build side record is null, i.e. the null record is > still > > passed to the join function. > > Is this correct and if not, what should the driver do in this case? > > > > Thank you! > > > > Cheers, > > Vasia. > > > |
Yes, probing the HashTable with a key that does not exist will yield a join
function call with a null value (or empty iterator in case of CoGroup). The semantics of the join are the same regardless of the hash table implementation. The fact that the error only occurs with the managed HT, indicates that there is a bug somewhere :-( 2015-09-16 17:26 GMT+02:00 Vasiliki Kalavri <[hidden email]>: > Hi, > > thanks a lot Fabian! > > I didn't know that join with the solution set is an outer join. That's a > surprise :) > > So, if I understand correctly, I should have a null value when my other > input to the join contains some key that doesn't exist in the solution set, > right? That's not the case in my application; I'm not generating any new > keys. > > Also, when setting the solutionSetUnManaged option, the exception doesn't > occur anymore. Are the join semantics different when the solution set is in > unmanaged memory? > > Cheers, > Vasia. > > > On 16 September 2015 at 16:50, Fabian Hueske <[hidden email]> wrote: > > > Hi Vasia, > > > > I looked into the code. A serializer should never return null when > > deserializing. Either it does not detect that something went wrong with > the > > deserialization or it should throw an exception. > > > > Regarding the handling of null returns in the Drivers. If there is no > entry > > in the HT for a certain key, the HT will return null which is expected. > > If a CoGroupWithSolutionSet*Driver receives a null value, it gives an > empty > > iterator to the user function. The JoinWithSolutionSet*Driver calls the > > join function with a null value. Both behaviors are expected. A join > with a > > solution set is actually an outer join and a join function in such a join > > needs to be able to handle null values on the solution set side. > > > > Cheers, Fabian > > > > > > 2015-09-15 17:41 GMT+02:00 Vasiliki Kalavri <[hidden email]>: > > > > > Hello to my squirrels, > > > > > > I ran into an NPE for some iterations code and it looks like what's > > > described in FLINK-2443 < > > https://issues.apache.org/jira/browse/FLINK-2443 > > > >. > > > I'm trying to understand the problem and I could really use your help > :) > > > > > > So far, it seems that the exception is caused by a null value returned > by > > > CompactingHashTable.*getMatchFor*(PT probeSideRecord). > > > > > > This method returns null in the following cases: > > > - when the hash table is "closed" > > > - when the segment is done > > > - if the serializer actually returns a null record > > > > > > It seems that on the join/cogroup driver side there is no check or > > special > > > handling when the build side record is null, i.e. the null record is > > still > > > passed to the join function. > > > Is this correct and if not, what should the driver do in this case? > > > > > > Thank you! > > > > > > Cheers, > > > Vasia. > > > > > > |
Thanks Fabian! At least now I know the bug is probably not in the driver
where I was looking :) On 16 September 2015 at 17:33, Fabian Hueske <[hidden email]> wrote: > Yes, probing the HashTable with a key that does not exist will yield a join > function call with a null value (or empty iterator in case of CoGroup). > > The semantics of the join are the same regardless of the hash table > implementation. > The fact that the error only occurs with the managed HT, indicates that > there is a bug somewhere :-( > > 2015-09-16 17:26 GMT+02:00 Vasiliki Kalavri <[hidden email]>: > > > Hi, > > > > thanks a lot Fabian! > > > > I didn't know that join with the solution set is an outer join. That's a > > surprise :) > > > > So, if I understand correctly, I should have a null value when my other > > input to the join contains some key that doesn't exist in the solution > set, > > right? That's not the case in my application; I'm not generating any new > > keys. > > > > Also, when setting the solutionSetUnManaged option, the exception doesn't > > occur anymore. Are the join semantics different when the solution set is > in > > unmanaged memory? > > > > Cheers, > > Vasia. > > > > > > On 16 September 2015 at 16:50, Fabian Hueske <[hidden email]> wrote: > > > > > Hi Vasia, > > > > > > I looked into the code. A serializer should never return null when > > > deserializing. Either it does not detect that something went wrong with > > the > > > deserialization or it should throw an exception. > > > > > > Regarding the handling of null returns in the Drivers. If there is no > > entry > > > in the HT for a certain key, the HT will return null which is expected. > > > If a CoGroupWithSolutionSet*Driver receives a null value, it gives an > > empty > > > iterator to the user function. The JoinWithSolutionSet*Driver calls the > > > join function with a null value. Both behaviors are expected. A join > > with a > > > solution set is actually an outer join and a join function in such a > join > > > needs to be able to handle null values on the solution set side. > > > > > > Cheers, Fabian > > > > > > > > > 2015-09-15 17:41 GMT+02:00 Vasiliki Kalavri <[hidden email] > >: > > > > > > > Hello to my squirrels, > > > > > > > > I ran into an NPE for some iterations code and it looks like what's > > > > described in FLINK-2443 < > > > https://issues.apache.org/jira/browse/FLINK-2443 > > > > >. > > > > I'm trying to understand the problem and I could really use your help > > :) > > > > > > > > So far, it seems that the exception is caused by a null value > returned > > by > > > > CompactingHashTable.*getMatchFor*(PT probeSideRecord). > > > > > > > > This method returns null in the following cases: > > > > - when the hash table is "closed" > > > > - when the segment is done > > > > - if the serializer actually returns a null record > > > > > > > > It seems that on the join/cogroup driver side there is no check or > > > special > > > > handling when the build side record is null, i.e. the null record is > > > still > > > > passed to the join function. > > > > Is this correct and if not, what should the driver do in this case? > > > > > > > > Thank you! > > > > > > > > Cheers, > > > > Vasia. > > > > > > > > > > |
Free forum by Nabble | Edit this page |