[jira] [Created] (FLINK-17479) Occasional checkpoint failure due to null pointer exception in Flink version 1.10

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-17479) Occasional checkpoint failure due to null pointer exception in Flink version 1.10

Shang Yuanchun (Jira)
nobleyd created FLINK-17479:
-------------------------------

             Summary: Occasional checkpoint failure due to null pointer exception in Flink version 1.10
                 Key: FLINK-17479
                 URL: https://issues.apache.org/jira/browse/FLINK-17479
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
    Affects Versions: 1.10.0
         Environment: Flink1.10.0

jdk1.8.0_60
            Reporter: nobleyd
         Attachments: image-2020-04-30-18-44-21-630.png, image-2020-04-30-18-55-53-779.png

I upgrade the standalone cluster(3 machines) from flink1.9 to flink1.10.0 latest. My job running normally in flink1.9 for about half a year, while I get some job failed due to null pointer exception when checkpoing in  flink1.10.0.

Below is the exception log:

!image-2020-04-30-18-55-53-779.png!

I have checked the StreamTask(882), and is shown below. I think the only case is that checkpointMetaData is null that can lead to a null pointer exception.

!image-2020-04-30-18-44-21-630.png!

I do not know why, is there anyone can help me? The problem only occurs in Flink1.10.0 for now, it works well in flink1.9. I give the some conf info(some different to the default) also in below, guessing that maybe it is an error for configuration mistake.

some conf of my flink1.10.0:

 
{code:java}
taskmanager.memory.flink.size: 71680m
taskmanager.memory.framework.heap.size: 512m
taskmanager.memory.framework.off-heap.size: 512m
taskmanager.memory.task.off-heap.size: 17920m
taskmanager.memory.managed.size: 512m
taskmanager.memory.jvm-metaspace.size: 512m

taskmanager.memory.network.fraction: 0.1
taskmanager.memory.network.min: 1024mb
taskmanager.memory.network.max: 1536mb
taskmanager.memory.segment-size: 128kb

rest.port: 8682
historyserver.web.port: 8782high-availability.jobmanager.port: 13141,13142,13143,13144
blob.server.port: 13146,13147,13148,13149taskmanager.rpc.port: 13151,13152,13153,13154
taskmanager.data.port: 13156metrics.internal.query-service.port: 13161,13162,13163,13164,13166,13167,13168,13169env.java.home: /usr/java/jdk1.8.0_60/bin/java
env.pid.dir: /home/work/flink-1.10.0{code}
 

Hope someone can help me solve it.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)