Till Rohrmann created FLINK-14043:
-------------------------------------
Summary: SavepointMigrationTestBase is super slow
Key: FLINK-14043
URL:
https://issues.apache.org/jira/browse/FLINK-14043 Project: Flink
Issue Type: Bug
Components: Runtime / State Backends, Tests
Affects Versions: 1.9.0, 1.8.1, 1.10.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
Fix For: 1.10.0, 1.9.1, 1.8.3
The subclasses of {{SavepointMigrationTestBase}} take super long to execute. On my local machine
* {{TypeSerializerSnapshotMigrationITCase}} takes 2min 30s
* {{StatefulJobWBroadcastStateMigrationITCase}} takes 1min 45s
* {{StatefulJobSavepointMigrationITCase}} takes 2min 5s
to execute. The reasons for the long runtimes seem to be that we are using the {{AccumulatorCountingSink}} which uses the accumulators to signal when a job is done. Since the accumulators are being sent with the TM heartbeats, the heartbeat interval how fast the client realizes that the job can be shut down. The default heartbeat interval is {{10 s}} and hence it takes always at least 10 seconds until the client stops the job.
I suggest to decrease the heartbeat interval in the {{SavepointMigrationTestBase}} to 500ms in order to speed up the tests. On my machine the test runtimes with this settings are:
* {{TypeSerializerSnapshotMigrationITCase}} takes 13s
* {{StatefulJobWBroadcastStateMigrationITCase}} takes 10s
* {{StatefulJobSavepointMigrationITCase}} takes 11s
--
This message was sent by Atlassian Jira
(v8.3.2#803003)