[jira] [Created] (FLINK-16296) Improve performance of BaseRowSerializer#serialize() for GenericRow

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-16296) Improve performance of BaseRowSerializer#serialize() for GenericRow

Shang Yuanchun (Jira)
Jark Wu created FLINK-16296:
-------------------------------

             Summary: Improve performance of BaseRowSerializer#serialize() for GenericRow
                 Key: FLINK-16296
                 URL: https://issues.apache.org/jira/browse/FLINK-16296
             Project: Flink
          Issue Type: Improvement
          Components: Table SQL / Runtime
            Reporter: Jark Wu


Currently, when serialize a {{GenericRow}} using {{BaseRowSerializer#serialize()}} , there will be 2 memory copy. The first is GenericRow -> BinaryRow, the second is  BinaryRow -> DataOutputView.

However, in theory, we can serialize GenericRow into DataOutputView directly, because we already get all the column values and types. We can serialize the null bit part for all columns and then the fix-part for all columns and then the variable lenght part.

For example, when the column is a BinaryString, we can serialize the pos and length, and calcute the new variable part length, and then serialize the next column. If there is a generic type in the row, then it will fallback into previous way. But generic type in SQL is rare.

This is a general improvements and can be benefit for every operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)