Hi, I was using the RowSerializer (package
org.apache.flink.api.java.typeutils.runtime;) recently to serialize Rows to file (for reading them back in the future). I observed a strange behavior that I would like to double check with you in case this is a serious problem to be addressed: When the rowserializer is used to convert data back, there is no check for the consistency of the data (e.g., size of the object that was serialized, checksum…). This leads to situation that for random reads
of bytes it can happen that inconsistent objects are deserialized, which of course can lead to inconsistent data. For example if we serialize object of the form (Int, Long, Double, String String) – if we have available only 1/3 of the bytes, we can end up reading back objects as (0,0,0,null, null) – this is not the only
example of how the object can be incorrectly deserialized …rather than having an error for this process. Hence, I wanted to double check if this is an intended behavior for some reason and if we should consider fixing the rowserializer to guarantee integrity of the objects that are deserialized.
Best regards, Dr. Radu Tudoran Staff Research Engineer - Big Data Expert IT R&D Division HUAWEI TECHNOLOGIES Duesseldorf GmbH German Research Center Munich Office Riesstrasse 25, 80992 München E-mail:
[hidden email] Mobile: +49 15209084330 Telephone: +49 891588344173
HUAWEI TECHNOLOGIES Duesseldorf GmbH
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information
contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or
email immediately and delete it! |
Hi Radu,
none of Flink's serializers adds checksums to ensure data integrity. It would be possible to implement a wrapping serializer that adds a checksum to each record, but that would be at the cost of performance. Not sure if this is done at some point in Flink, maybe for savepoints. Best, Fabian 2018-02-23 14:44 GMT+01:00 Radu Tudoran <[hidden email]>: > Hi, > > > > I was using the RowSerializer (package org.apache.flink.api.java. > typeutils.runtime;) recently to serialize Rows to file (for reading them > back in the future). > > I observed a strange behavior that I would like to double check with you > in case this is a serious problem to be addressed: > > When the rowserializer is used to convert data back, there is no check for > the consistency of the data (e.g., size of the object that was serialized, > checksum…). This leads to situation that for random reads of bytes it can > happen that inconsistent objects are deserialized, which of course can lead > to inconsistent data. > > > > For example if we serialize object of the form (Int, Long, Double, String > String) – if we have available only 1/3 of the bytes, we can end up reading > back objects as (0,0,0,null, null) – this is not the only example of how > the object can be incorrectly deserialized …rather than having an error for > this process. > > > > Hence, I wanted to double check if this is an intended behavior for some > reason and if we should consider fixing the rowserializer to guarantee > integrity of the objects that are deserialized. > > > > Best regards, > > > > Dr. Radu Tudoran > > Staff Research Engineer - Big Data Expert > > IT R&D Division > > > > [image: cid:image007.jpg@01CD52EB.AD060EE0] > > HUAWEI TECHNOLOGIES Duesseldorf GmbH > > German Research Center > > Munich Office > > Riesstrasse 25, 80992 > <https://maps.google.com/?q=Riesstrasse+25,+80992&entry=gmail&source=g> > München > > > > E-mail: *[hidden email] <[hidden email]>* > > Mobile: +49 15209084330 <+49%201520%209084330> > > Telephone: +49 891588344173 <+49%2089%201588344173> > > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH > Hansaallee 205, 40549 Düsseldorf, Germany > <https://maps.google.com/?q=Hansaallee+205,+40549+D%C3%BCsseldorf,+Germany&entry=gmail&source=g>, > www.huawei.com > Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063, > Managing Director: Bo PENG, Qiuen Peng, Shengli Wang > Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063, > Geschäftsführer: Bo PENG, Qiuen Peng, Shengli Wang > > This e-mail and its attachments contain confidential information from > HUAWEI, which is intended only for the person or entity whose address is > listed above. Any use of the information contained herein in any way > (including, but not limited to, total or partial disclosure, reproduction, > or dissemination) by persons other than the intended recipient(s) is > prohibited. If you receive this e-mail in error, please notify the sender > by phone or email immediately and delete it! > > > |
Hi,
Thanks for the confirmation. In this case should we just leave things as they are - and if anyone is interested in having consistent processing each can build the wrapper you mention? -----Original Message----- From: Fabian Hueske [mailto:[hidden email]] Sent: Monday, February 26, 2018 9:34 AM To: [hidden email] Subject: Re: RowSerializer Hi Radu, none of Flink's serializers adds checksums to ensure data integrity. It would be possible to implement a wrapping serializer that adds a checksum to each record, but that would be at the cost of performance. Not sure if this is done at some point in Flink, maybe for savepoints. Best, Fabian 2018-02-23 14:44 GMT+01:00 Radu Tudoran <[hidden email]>: > Hi, > > > > I was using the RowSerializer (package org.apache.flink.api.java. > typeutils.runtime;) recently to serialize Rows to file (for reading > them back in the future). > > I observed a strange behavior that I would like to double check with > you in case this is a serious problem to be addressed: > > When the rowserializer is used to convert data back, there is no check > for the consistency of the data (e.g., size of the object that was > serialized, checksum…). This leads to situation that for random reads > of bytes it can happen that inconsistent objects are deserialized, > which of course can lead to inconsistent data. > > > > For example if we serialize object of the form (Int, Long, Double, > String > String) – if we have available only 1/3 of the bytes, we can end up > reading back objects as (0,0,0,null, null) – this is not the only > example of how the object can be incorrectly deserialized …rather than > having an error for this process. > > > > Hence, I wanted to double check if this is an intended behavior for > some reason and if we should consider fixing the rowserializer to > guarantee integrity of the objects that are deserialized. > > > > Best regards, > > > > Dr. Radu Tudoran > > Staff Research Engineer - Big Data Expert > > IT R&D Division > > > > [image: cid:image007.jpg@01CD52EB.AD060EE0] > > HUAWEI TECHNOLOGIES Duesseldorf GmbH > > German Research Center > > Munich Office > > Riesstrasse 25, 80992 > <https://maps.google.com/?q=Riesstrasse+25,+80992&entry=gmail&source=g > > > München > > > > E-mail: *[hidden email] <[hidden email]>* > > Mobile: +49 15209084330 <+49%201520%209084330> > > Telephone: +49 891588344173 <+49%2089%201588344173> > > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH > Hansaallee 205, 40549 Düsseldorf, Germany > <https://maps.google.com/?q=Hansaallee+205,+40549+D%C3%BCsseldorf,+Ger > many&entry=gmail&source=g>, > www.huawei.com > Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063, > Managing Director: Bo PENG, Qiuen Peng, Shengli Wang Sitz der > Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063, > Geschäftsführer: Bo PENG, Qiuen Peng, Shengli Wang > > This e-mail and its attachments contain confidential information from > HUAWEI, which is intended only for the person or entity whose address > is listed above. Any use of the information contained herein in any > way (including, but not limited to, total or partial disclosure, > reproduction, or dissemination) by persons other than the intended > recipient(s) is prohibited. If you receive this e-mail in error, > please notify the sender by phone or email immediately and delete it! > > > |
Free forum by Nabble | Edit this page |