[jira] [Created] (FLINK-16818) Optimize data skew when flink write data to hive dynamic partition table

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-16818) Optimize data skew when flink write data to hive dynamic partition table

Shang Yuanchun (Jira)
Jun Zhang created FLINK-16818:
---------------------------------

             Summary: Optimize data skew when flink write data to hive dynamic partition table
                 Key: FLINK-16818
                 URL: https://issues.apache.org/jira/browse/FLINK-16818
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / Hive
    Affects Versions: 1.10.0
         Environment: {code:java}
 {code}
            Reporter: Jun Zhang
             Fix For: 1.11.0


I read the source table data of hive through flink sql, and then write the target table of hive. The target table is a partitioned table. When the data of a partition is particularly large, data skew occurs, resulting in a particularly long execution time.

By default Configuration, the same sql, hive on spark takes five minutes, and flink takes about 40 minutes.

example:

 
{code:java}
// the schema of myparttable

name string,
age int,
PARTITIONED BY (
type string,
day string
)

INSERT OVERWRITE myparttable SELECT name, age, type,day from sourcetable;
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)