[jira] [Created] (FLINK-14843) Streaming bucketing end-to-end test can fail with Output hash mismatch

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-14843) Streaming bucketing end-to-end test can fail with Output hash mismatch

Shang Yuanchun (Jira)
Gary Yao created FLINK-14843:
--------------------------------

             Summary: Streaming bucketing end-to-end test can fail with Output hash mismatch
                 Key: FLINK-14843
                 URL: https://issues.apache.org/jira/browse/FLINK-14843
             Project: Flink
          Issue Type: Bug
          Components: Connectors / FileSystem, Tests
    Affects Versions: 1.10.0
         Environment: rev: dcc1330375826b779e4902176bb2473704dabb11
            Reporter: Gary Yao


*Description*
Streaming bucketing end-to-end test ({{test_streaming_bucketing.sh}}) can fail with Output hash mismatch.

{noformat}
Number of running task managers has reached 4.
Job (67212178694f8b2a9bc9d9572567a53f) is running.
Waiting until all values have been produced
Truncating buckets
Number of produced values 26325/60000
Truncating buckets
Number of produced values 31315/60000
Truncating buckets
Number of produced values 36735/60000
Truncating buckets
Number of produced values 40705/60000
Truncating buckets
Number of produced values 46125/60000
Truncating buckets
Number of produced values 51135/60000
Truncating buckets
Number of produced values 56555/60000
Truncating buckets
Number of produced values 61935/60000
Cancelling job 67212178694f8b2a9bc9d9572567a53f.
Cancelled job 67212178694f8b2a9bc9d9572567a53f.
Waiting for job (67212178694f8b2a9bc9d9572567a53f) to reach terminal state CANCELED ...
Job (67212178694f8b2a9bc9d9572567a53f) reached terminal state CANCELED
Job 67212178694f8b2a9bc9d9572567a53f was cancelled, time to verify
FAIL Bucketing Sink: Output hash mismatch.  Got 4e2d1859e41184a38e5bc95090fe9941, expected 01aba5ff77a0ef5e5cf6a727c248bdc3.
head hexdump of actual:
0000000   (   2   ,   1   0   ,   0   ,   S   o   m   e       p   a   y
0000010   l   o   a   d   .   .   .   )  \n   (   2   ,   1   0   ,   1
0000020   ,   S   o   m   e       p   a   y   l   o   a   d   .   .   .
0000030   )  \n   (   2   ,   1   0   ,   2   ,   S   o   m   e       p
0000040   a   y   l   o   a   d   .   .   .   )  \n   (   2   ,   1   0
0000050   ,   3   ,   S   o   m   e       p   a   y   l   o   a   d   .
0000060   .   .   )  \n   (   2   ,   1   0   ,   4   ,   S   o   m   e
0000070       p   a   y   l   o   a   d   .   .   .   )  \n   (   2   ,
0000080   1   0   ,   5   ,   S   o   m   e       p   a   y   l   o   a
0000090   d   .   .   .   )  \n   (   2   ,   1   0   ,   6   ,   S   o
00000a0   m   e       p   a   y   l   o   a   d   .   .   .   )  \n   (
00000b0   2   ,   1   0   ,   7   ,   S   o   m   e       p   a   y   l
00000c0   o   a   d   .   .   .   )  \n   (   2   ,   1   0   ,   8   ,
00000d0   S   o   m   e       p   a   y   l   o   a   d   .   .   .   )
00000e0  \n   (   2   ,   1   0   ,   9   ,   S   o   m   e       p   a
00000f0   y   l   o   a   d   .   .   .   )  \n                        
00000fa
Stopping taskexecutor daemon (pid: 654547) on host gyao-desktop.
Stopping standalonesession daemon (pid: 650368) on host gyao-desktop.
Stopping taskexecutor daemon (pid: 650812) on host gyao-desktop.
Skipping taskexecutor daemon (pid: 651347), because it is not running anymore on gyao-desktop.
Skipping taskexecutor daemon (pid: 651795), because it is not running anymore on gyao-desktop.
Skipping taskexecutor daemon (pid: 652249), because it is not running anymore on gyao-desktop.
Stopping taskexecutor daemon (pid: 653481) on host gyao-desktop.
Stopping taskexecutor daemon (pid: 654099) on host gyao-desktop.
[FAIL] Test script contains errors.
Checking of logs skipped.

[FAIL] 'flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh' failed after 2 minutes and 3 seconds! Test exited with exit code 1
{noformat}


*How to reproduce*
Comment out the delay of 10s after the 1st TM is restarted to provoke the issue:

{code:bash}
echo "Restarting 1 TM"
$FLINK_DIR/bin/taskmanager.sh start
wait_for_number_of_running_tms 4

#sleep 10

echo "Killing 2 TMs"
kill_random_taskmanager
kill_random_taskmanager
wait_for_number_of_running_tms 2
{code}

Command to run the test:
{noformat}
FLINK_DIR=build-target/ flink-end-to-end-tests/run-single-test.sh skip flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh
{noformat}






--
This message was sent by Atlassian Jira
(v8.3.4#803005)