Hive Streaming write compaction

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Hive Streaming write compaction

chenqin
Hi there,

We are testing out writing Kafka to hive table as parquet format.
Currently, we have seen user has to choose to create lots of small files in
min level folder to gain latency benefits. I recall FF2020 Global folks
mentioned implement compaction logic during the checkpointing time. Wonder
how that goes? Love collaborate on this topic.

Chen
Pinterest
Reply | Threaded
Open this post in threaded view
|

Re: Hive Streaming write compaction

Kurt Young
We just added this feature to 1.12 [1][2], it would be great that you can
download the 1.12 RC to test
it out, and give us some feedback.

In case you will wonder why I linked 2 jiras, it's because both FileSystem
& Hive connector share
the same option options and also the implementations.

[1] https://issues.apache.org/jira/browse/FLINK-19875
[2] https://issues.apache.org/jira/browse/FLINK-19886

Best,
Kurt


On Thu, Nov 19, 2020 at 2:31 PM Chen Qin <[hidden email]> wrote:

> Hi there,
>
> We are testing out writing Kafka to hive table as parquet format.
> Currently, we have seen user has to choose to create lots of small files in
> min level folder to gain latency benefits. I recall FF2020 Global folks
> mentioned implement compaction logic during the checkpointing time. Wonder
> how that goes? Love collaborate on this topic.
>
> Chen
> Pinterest
>
Reply | Threaded
Open this post in threaded view
|

Re: Hive Streaming write compaction

Jingsong Li
In reply to this post by chenqin
Hi Chen,

Table Filesystem/Hive sink file compaction has been merged into master,
detail in [1]. It is included in Flink 1.12.

Hope you can have a try and test.

[1]https://issues.apache.org/jira/browse/FLINK-19345

Best,
Jingsong

On Thu, Nov 19, 2020 at 2:31 PM Chen Qin <[hidden email]> wrote:

> Hi there,
>
> We are testing out writing Kafka to hive table as parquet format.
> Currently, we have seen user has to choose to create lots of small files in
> min level folder to gain latency benefits. I recall FF2020 Global folks
> mentioned implement compaction logic during the checkpointing time. Wonder
> how that goes? Love collaborate on this topic.
>
> Chen
> Pinterest
>


--
Best, Jingsong Lee