[jira] [Created] (FLINK-20538) sink.rolling-policy.file-size does not work in filesystem connector

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-20538) sink.rolling-policy.file-size does not work in filesystem connector

Shang Yuanchun (Jira)
zhuxiaoshang created FLINK-20538:
------------------------------------

             Summary: sink.rolling-policy.file-size does not work in filesystem connector
                 Key: FLINK-20538
                 URL: https://issues.apache.org/jira/browse/FLINK-20538
             Project: Flink
          Issue Type: Bug
          Components: Connectors / FileSystem
    Affects Versions: 1.11.1
            Reporter: zhuxiaoshang


When I use sql filesystem connector to write data to hdfs,and set sink.rolling-policy.file-size to 50MB.But seems not working, there are still 100MB+ size files.

My table ddl is :

 
{code:java}
CREATE TABLE cpc_bd_recall_log_hdfs (
   log_timestamp BIGINT,
   ip STRING,
   `raw` STRING,
   `day` STRING, `hour` STRING,`minute` STRING
) PARTITIONED BY (`day` , `hour` ,`minute`) WITH (
   'connector'='filesystem',
   'path'='hdfs://xxx/test.db/hdfs_test',
   'format'='parquet',
   'parquet.compression'='SNAPPY',
   'sink.rolling-policy.file-size' = '50MB',
   'sink.partition-commit.policy.kind' = 'success-file',
   'sink.partition-commit.delay'='60s'
);
{code}
the hdfs files are:

 

 
{code:java}
     0 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/_SUCCESS
-rw-r--r--   3 hadoop hadoop     31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-0-2500
-rw-r--r--   3 hadoop hadoop    121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-0-2501
-rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-1-2499
-rw-r--r--   3 hadoop hadoop    122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-1-2500
-rw-r--r--   3 hadoop hadoop     31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-10-2501
-rw-r--r--   3 hadoop hadoop    121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-10-2502
-rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-11-2500
-rw-r--r--   3 hadoop hadoop    122.2 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-11-2501
-rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-12-2500
-rw-r--r--   3 hadoop hadoop    122.2 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-12-2501
-rw-r--r--   3 hadoop hadoop     31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-13-2499
-rw-r--r--   3 hadoop hadoop    122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-13-2500
-rw-r--r--   3 hadoop hadoop     31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-14-2500
-rw-r--r--   3 hadoop hadoop    122.1 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-14-2501
-rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-15-2498
-rw-r--r--   3 hadoop hadoop    121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-15-2499
-rw-r--r--   3 hadoop hadoop     31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-16-2501
-rw-r--r--   3 hadoop hadoop    122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-16-2502
-rw-r--r--   3 hadoop hadoop     31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-17-2500
-rw-r--r--   3 hadoop hadoop    122.5 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-17-2501
-rw-r--r--   3 hadoop hadoop     31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-18-2500
-rw-r--r--   3 hadoop hadoop    121.7 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-18-2501
-rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-19-2501
-rw-r--r--   3 hadoop hadoop    121.7 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-19-2502
-rw-r--r--   3 hadoop hadoop     31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-2-2499
-rw-r--r--   3 hadoop hadoop    121.6 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-2-2500
-rw-r--r--   3 hadoop hadoop     31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-3-2500
-rw-r--r--   3 hadoop hadoop    121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-3-2501
-rw-r--r--   3 hadoop hadoop     31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-4-2499
-rw-r--r--   3 hadoop hadoop    122.1 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-4-2500
-rw-r--r--   3 hadoop hadoop     31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-5-2499
-rw-r--r--   3 hadoop hadoop    121.8 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-5-2500
-rw-r--r--   3 hadoop hadoop     31.8 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-6-2499
-rw-r--r--   3 hadoop hadoop    121.5 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-6-2500
-rw-r--r--   3 hadoop hadoop     31.6 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-7-2500
-rw-r--r--   3 hadoop hadoop    122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-7-2501
-rw-r--r--   3 hadoop hadoop     31.7 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-8-2501
-rw-r--r--   3 hadoop hadoop    122.0 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-8-2502
-rw-r--r--   3 hadoop hadoop     31.9 M 2020-12-04 14:55 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-9-2501
-rw-r--r--   3 hadoop hadoop    121.9 M 2020-12-04 14:56 hdfs://xxx/test.db/hdfs_test/day=2020-12-04/hour=14/minute=55/part-3dca3b00-fd94-4f49-bdf8-a8b65bcfa92c-9-2502
{code}
 

 

However,when I dig into source code,when writing element to bucket it'll invoke `shouldRollOnEvent` in TableRollingPolicy.

I don't understand how can this happen?Is a BUG or somewhere I get it wrong.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)