[jira] [Created] (FLINK-14043) SavepointMigrationTestBase is super slow

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-14043) SavepointMigrationTestBase is super slow

Shang Yuanchun (Jira)
Till Rohrmann created FLINK-14043:
-------------------------------------

             Summary: SavepointMigrationTestBase is super slow
                 Key: FLINK-14043
                 URL: https://issues.apache.org/jira/browse/FLINK-14043
             Project: Flink
          Issue Type: Bug
          Components: Runtime / State Backends, Tests
    Affects Versions: 1.9.0, 1.8.1, 1.10.0
            Reporter: Till Rohrmann
            Assignee: Till Rohrmann
             Fix For: 1.10.0, 1.9.1, 1.8.3


The subclasses of {{SavepointMigrationTestBase}} take super long to execute. On my local machine

* {{TypeSerializerSnapshotMigrationITCase}} takes 2min 30s
* {{StatefulJobWBroadcastStateMigrationITCase}} takes 1min 45s
* {{StatefulJobSavepointMigrationITCase}} takes 2min 5s

to execute. The reasons for the long runtimes seem to be that we are using the {{AccumulatorCountingSink}} which uses the accumulators to signal when a job is done. Since the accumulators are being sent with the TM heartbeats, the heartbeat interval how fast the client realizes that the job can be shut down. The default heartbeat interval is {{10 s}} and hence it takes always at least 10 seconds until the client stops the job.

I suggest to decrease the heartbeat interval in the {{SavepointMigrationTestBase}} to 500ms in order to speed up the tests. On my machine the test runtimes with this settings are:

* {{TypeSerializerSnapshotMigrationITCase}} takes 13s
* {{StatefulJobWBroadcastStateMigrationITCase}} takes 10s
* {{StatefulJobSavepointMigrationITCase}} takes 11s




--
This message was sent by Atlassian Jira
(v8.3.2#803003)