Jun Zhang created FLINK-16818:
---------------------------------
Summary: Optimize data skew when flink write data to hive dynamic partition table
Key: FLINK-16818
URL:
https://issues.apache.org/jira/browse/FLINK-16818 Project: Flink
Issue Type: Improvement
Components: Connectors / Hive
Affects Versions: 1.10.0
Environment: {code:java}
{code}
Reporter: Jun Zhang
Fix For: 1.11.0
I read the source table data of hive through flink sql, and then write the target table of hive. The target table is a partitioned table. When the data of a partition is particularly large, data skew occurs, resulting in a particularly long execution time.
By default Configuration, the same sql, hive on spark takes five minutes, and flink takes about 40 minutes.
example:
{code:java}
// the schema of myparttable
name string,
age int,
PARTITIONED BY (
type string,
day string
)
INSERT OVERWRITE myparttable SELECT name, age, type,day from sourcetable;
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)