Jeffrey Charles created FLINK-21350:
---------------------------------------
Summary: ParquetInputFormat incorrectly interprets timestamps encoded in microseconds as timestamps encoded in milliseconds
Key: FLINK-21350
URL:
https://issues.apache.org/jira/browse/FLINK-21350 Project: Flink
Issue Type: Bug
Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.12.1, 1.12.0
Reporter: Jeffrey Charles
Given a parquet file with a schema that has a field with a physical type of INT64 and a logical type of TIMESTAMP_MICROS, all of the ParquetInputFormat sub-classes deserialize the timestamp as tens of thousands of years in the future.
Looking at the code in [
https://github.com/apache/flink/blob/release-1.12.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/utils/RowConverter.java#L326,] it looks to me like the row converter is interpreting the field value as if it contained milliseconds and not microseconds. Specifically both millisecond and microsecond processing share the same code path to instantiate a java.sql.timestamp which takes a millisecond value in its constructor and the microsecond case statement is passing it a value in microseconds. I tested a change locally where I divide the value by 1000 in the microseconds case statement and that results in a timestamp with the expected value.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)