Yun Gao created FLINK-20295:
-------------------------------
Summary: File Source lost data when reading from directories created by FileSystemTableSink with JSON format
Key: FLINK-20295
URL:
https://issues.apache.org/jira/browse/FLINK-20295 Project: Flink
Issue Type: Bug
Reporter: Yun Gao
Fix For: 1.12.0
Attachments: compaction.tgz
When testing the compaction functionality of the FileSystemTableSink, I found that when using json format, the produced directories could not be read correctly by the file source, namely only a part of records are read.
By checking the produced directories, the number of the records in it is the same as expected, thus it seems to be the issue of the source side.
The issue only exists for JSON format.
The data is produced by [FileCompactionTest|
https://github.com/gaoyunhaii/flink1.12test/blob/main/src/main/java/FileCompactionTest.java] and read by [FileCompactionCheckTest|
https://github.com/gaoyunhaii/flink1.12test/blob/main/src/main/java/FileCompactionCheckTest.java] . An example directories tar file of 8000 records are also attached.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)