(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Created] (FLINK-16818) Optimize data skew when flink write data to hive dynamic partition table

Classic

List

Threaded

1 message

Shang Yuanchun (Jira)

[jira] [Created] (FLINK-16818) Optimize data skew when flink write data to hive dynamic partition table

Jun Zhang created FLINK-16818:
---------------------------------

Summary: Optimize data skew when flink write data to hive dynamic partition table
Key: FLINK-16818
URL: https://issues.apache.org/jira/browse/FLINK-16818
Project: Flink
Issue Type: Improvement
Components: Connectors / Hive
Affects Versions: 1.10.0
Environment: {code:java}
{code}
Reporter: Jun Zhang
Fix For: 1.11.0

I read the source table data of hive through flink sql, and then write the target table of hive. The target table is a partitioned table. When the data of a partition is particularly large, data skew occurs, resulting in a particularly long execution time.

By default Configuration, the same sql, hive on spark takes five minutes, and flink takes about 40 minutes.

example:

{code:java}
// the schema of myparttable

name string,
age int,
PARTITIONED BY (
type string,
day string
)

INSERT OVERWRITE myparttable SELECT name, age, type,day from sourcetable;
{code}

--
This message was sent by Atlassian Jira
(v8.3.4#803005)