[jira] [Created] (FLINK-22202) Thread safety in ParquetColumnarRowInputFormat

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-22202) Thread safety in ParquetColumnarRowInputFormat

Shang Yuanchun (Jira)
Jingsong Lee created FLINK-22202:
------------------------------------

             Summary: Thread safety in ParquetColumnarRowInputFormat
                 Key: FLINK-22202
                 URL: https://issues.apache.org/jira/browse/FLINK-22202
             Project: Flink
          Issue Type: Bug
          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
            Reporter: Jingsong Lee
            Assignee: Jingsong Lee
             Fix For: 1.13.0


In a {{VectorizedColumnBatch}}, the dictionary will be lazied deserialized. 

If there are multiple batches at the same time, there may be thread safety problems, because the deserialization of the dictionary depends on some internal structures.

We need set numBatchesToCirculate to 1 for ParquetColumnarRowInputFormat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)