[jira] [Created] (FLINK-9678) Remove hard-coded sleeps in HA E2E test

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-9678) Remove hard-coded sleeps in HA E2E test

Shang Yuanchun (Jira)
Chesnay Schepler created FLINK-9678:
---------------------------------------

             Summary: Remove hard-coded sleeps in HA E2E test
                 Key: FLINK-9678
                 URL: https://issues.apache.org/jira/browse/FLINK-9678
             Project: Flink
          Issue Type: Improvement
          Components: Distributed Coordination, Tests
    Affects Versions: 1.5.0, 1.6.0
            Reporter: Chesnay Schepler


{{test_ha.sh}} uses 2 hard-coded sleeps.
{code:java}
# let the job run for a while to take some checkpoints
sleep 20

for (( c=0; c<${JM_KILLS}; c++ )); do
    # kill the JM and wait for watchdog to
    # create a new one which will take over
    kill_jm
    sleep 60
done{code}
These sleeps are always troublesome as they either make the test brittle by being to small, or causing the test to idle when they are to large.

The first sleep should be replaced with {{wait_num_checkpoints.}}

I'm not entirely sure about the semantics of the second sleep, but I guess we're waiting for the new JM to continue the job execution. In this case I suggest to instead query the job status via REST and wait until the job is running.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)