Dear community, I am running through the following issue. whenever I use
rocksdb as state backend along with incremental checkpoints, I get the following error: *Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator Service Join SuperService (6/8). at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942) ... 6 moreCaused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53) at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:47) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853) ... 5 moreCaused by: java.lang.IllegalStateException at org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.runSnapshot(RocksDBKeyedStateBackend.java:2568) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50) ... 7 more* In my case, I am able to use incremental checkopints with rocksdb as long as I disable *fs.default-scheme* property; in any other case, I get the above error. Is this a known issue? Hope this can help, -- *Andrea Spina* Head of R&D @ Radicalbit Srl Via Giovanni Battista Pirelli 11, 20124, Milano - IT |
Hi Andrea
The error happens when Flink try to verify whether your local backup directory existed[1]. If you could reproduce this, would you please share your configuration to RocksDBStateBackend, and what `fs.default-scheme` have you configured. Taskmanager log with more details is also very welcome. [1] https://github.com/apache/flink/blob/6f4148180ba372a2c12c1d54bea8579350af6c98/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L2568 Best Yun Tang ________________________________ From: Andrea Spina <[hidden email]> Sent: Monday, July 1, 2019 20:06 To: [hidden email] Subject: Fwd: Flink 1.6.4 Issue on RocksDB incremental checkpoints and fs.default-scheme Dear community, I am running through the following issue. whenever I use rocksdb as state backend along with incremental checkpoints, I get the following error: *Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator Service Join SuperService (6/8). at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942) ... 6 moreCaused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53) at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:47) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853) ... 5 moreCaused by: java.lang.IllegalStateException at org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.runSnapshot(RocksDBKeyedStateBackend.java:2568) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50) ... 7 more* In my case, I am able to use incremental checkopints with rocksdb as long as I disable *fs.default-scheme* property; in any other case, I get the above error. Is this a known issue? Hope this can help, -- *Andrea Spina* Head of R&D @ Radicalbit Srl Via Giovanni Battista Pirelli 11, 20124, Milano - IT |
Hi Yun, rocksDB configuration is set as follows: ``` RocksDB write-buffer size: 512MB RocksDB BlockSize (cache) [B/K/M]: 128MB Checkpoints directory: hdfs:///address-to-hdfs-chkp-dir:8020/flink/checkpoints enable Checkpoints: true Rocksdb cache index and filters true RocksDB thread No.: 4 Checkpoints interval: 60000 RocksDB BlockSize [B/K/M]: 16KB RocksDB write-buffer count: 5 Use incremental checkpoints: true Rocksdb optimize hits: true RocksDB write-buffer number to merge: 2 ``` I use RocksDBStateBackend class, but I recorded the same result by using configuration parameter state.backend.incremental: true. Il giorno lun 1 lug 2019 alle ore 14:41 Yun Tang <[hidden email]> ha scritto: Hi Andrea Andrea Spina Head of R&D @ Radicalbit Srl Via Giovanni Battista Pirelli 11, 20124, Milano - IT tmLOG.zip (234K) Download Attachment |
Hi Andrea
Unfortunately, the tm log provided is not the task manager in which RocksDBStateBackend first failed. All tasks on this task manager are actually canceled by job manager, you could find a lot of "Attempting to cancel task" before any task failed. From your latest description, this problem happened without any relationship to fs.default-schema. And actually I wonder the previous error "Could not materialize checkpoint 1 for operator Service Join SuperService (6/8)" was whether the root cause of your job's first failover, it might be caused by other task failure and then cancelled via JM leading to that directory cleaned up. I think you could search your job manager's log to find the first failed task exception and locate which task manager that task run. That task manager would contain useful messages. If possible, please provide your job manager's log. Best Yun Tang ________________________________ From: Andrea Spina <[hidden email]> Sent: Monday, July 1, 2019 23:14 To: [hidden email] Subject: Re: Flink 1.6.4 Issue on RocksDB incremental checkpoints and fs.default-scheme Hi Yun, rocksDB configuration is set as follows: ``` RocksDB write-buffer size: 512MB RocksDB BlockSize (cache) [B/K/M]: 128MB Checkpoints directory: hdfs:///address-to-hdfs-chkp-dir:8020/flink/checkpoints enable Checkpoints: true Rocksdb cache index and filters true RocksDB thread No.: 4 Checkpoints interval: 60000 RocksDB BlockSize [B/K/M]: 16KB RocksDB write-buffer count: 5 Use incremental checkpoints: true Rocksdb optimize hits: true RocksDB write-buffer number to merge: 2 ``` I use RocksDBStateBackend class, but I recorded the same result by using configuration parameter state.backend.incremental: true. Il giorno lun 1 lug 2019 alle ore 14:41 Yun Tang <[hidden email]<mailto:[hidden email]>> ha scritto: Hi Andrea The error happens when Flink try to verify whether your local backup directory existed[1]. If you could reproduce this, would you please share your configuration to RocksDBStateBackend, and what `fs.default-scheme` have you configured. Taskmanager log with more details is also very welcome. [1] https://github.com/apache/flink/blob/6f4148180ba372a2c12c1d54bea8579350af6c98/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L2568 Best Yun Tang ________________________________ From: Andrea Spina <[hidden email]<mailto:[hidden email]>> Sent: Monday, July 1, 2019 20:06 To: [hidden email]<mailto:[hidden email]> Subject: Fwd: Flink 1.6.4 Issue on RocksDB incremental checkpoints and fs.default-scheme Dear community, I am running through the following issue. whenever I use rocksdb as state backend along with incremental checkpoints, I get the following error: *Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator Service Join SuperService (6/8). at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942) ... 6 moreCaused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53) at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:47) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853) ... 5 moreCaused by: java.lang.IllegalStateException at org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.runSnapshot(RocksDBKeyedStateBackend.java:2568) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50) ... 7 more* In my case, I am able to use incremental checkopints with rocksdb as long as I disable *fs.default-scheme* property; in any other case, I get the above error. Is this a known issue? Hope this can help, -- *Andrea Spina* Head of R&D @ Radicalbit Srl Via Giovanni Battista Pirelli 11, 20124, Milano - IT -- Andrea Spina Head of R&D @ Radicalbit Srl Via Giovanni Battista Pirelli 11, 20124, Milano - IT |
Hi, I attached also the JM log. Thereby you can appreciate the exception. I hope that can help. As I said previously, disabling fs.default-scheme property solved my issue. cheers, Il giorno lun 1 lug 2019 alle ore 18:17 Yun Tang <[hidden email]> ha scritto: Hi Andrea Andrea Spina Head of R&D @ Radicalbit Srl Via Giovanni Battista Pirelli 11, 20124, Milano - IT jmLOG.zip (86K) Download Attachment |
Hi Andrea
This should be a bug already fixed by https://issues.apache.org/jira/browse/FLINK-12042 , you could upgrade to at least 1.7.3 version to avoid this bug. Best Yun Tang ________________________________ From: Andrea Spina <[hidden email]> Sent: Wednesday, July 3, 2019 15:46 To: [hidden email] Subject: Re: Flink 1.6.4 Issue on RocksDB incremental checkpoints and fs.default-scheme Hi, I attached also the JM log. Thereby you can appreciate the exception. I hope that can help. As I said previously, disabling fs.default-scheme property solved my issue. cheers, Il giorno lun 1 lug 2019 alle ore 18:17 Yun Tang <[hidden email]<mailto:[hidden email]>> ha scritto: Hi Andrea Unfortunately, the tm log provided is not the task manager in which RocksDBStateBackend first failed. All tasks on this task manager are actually canceled by job manager, you could find a lot of "Attempting to cancel task" before any task failed. From your latest description, this problem happened without any relationship to fs.default-schema. And actually I wonder the previous error "Could not materialize checkpoint 1 for operator Service Join SuperService (6/8)" was whether the root cause of your job's first failover, it might be caused by other task failure and then cancelled via JM leading to that directory cleaned up. I think you could search your job manager's log to find the first failed task exception and locate which task manager that task run. That task manager would contain useful messages. If possible, please provide your job manager's log. Best Yun Tang ________________________________ From: Andrea Spina <[hidden email]<mailto:[hidden email]>> Sent: Monday, July 1, 2019 23:14 To: [hidden email]<mailto:[hidden email]> Subject: Re: Flink 1.6.4 Issue on RocksDB incremental checkpoints and fs.default-scheme Hi Yun, rocksDB configuration is set as follows: ``` RocksDB write-buffer size: 512MB RocksDB BlockSize (cache) [B/K/M]: 128MB Checkpoints directory: hdfs:///address-to-hdfs-chkp-dir:8020/flink/checkpoints enable Checkpoints: true Rocksdb cache index and filters true RocksDB thread No.: 4 Checkpoints interval: 60000 RocksDB BlockSize [B/K/M]: 16KB RocksDB write-buffer count: 5 Use incremental checkpoints: true Rocksdb optimize hits: true RocksDB write-buffer number to merge: 2 ``` I use RocksDBStateBackend class, but I recorded the same result by using configuration parameter state.backend.incremental: true. Il giorno lun 1 lug 2019 alle ore 14:41 Yun Tang <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> ha scritto: Hi Andrea The error happens when Flink try to verify whether your local backup directory existed[1]. If you could reproduce this, would you please share your configuration to RocksDBStateBackend, and what `fs.default-scheme` have you configured. Taskmanager log with more details is also very welcome. [1] https://github.com/apache/flink/blob/6f4148180ba372a2c12c1d54bea8579350af6c98/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L2568 Best Yun Tang ________________________________ From: Andrea Spina <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> Sent: Monday, July 1, 2019 20:06 To: [hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>> Subject: Fwd: Flink 1.6.4 Issue on RocksDB incremental checkpoints and fs.default-scheme Dear community, I am running through the following issue. whenever I use rocksdb as state backend along with incremental checkpoints, I get the following error: *Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator Service Join SuperService (6/8). at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942) ... 6 moreCaused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53) at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:47) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853) ... 5 moreCaused by: java.lang.IllegalStateException at org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.runSnapshot(RocksDBKeyedStateBackend.java:2568) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50) ... 7 more* In my case, I am able to use incremental checkopints with rocksdb as long as I disable *fs.default-scheme* property; in any other case, I get the above error. Is this a known issue? Hope this can help, -- *Andrea Spina* Head of R&D @ Radicalbit Srl Via Giovanni Battista Pirelli 11, 20124, Milano - IT -- Andrea Spina Head of R&D @ Radicalbit Srl Via Giovanni Battista Pirelli 11, 20124, Milano - IT -- Andrea Spina Head of R&D @ Radicalbit Srl Via Giovanni Battista Pirelli 11, 20124, Milano - IT |
Dear Yun, thank you for your support then. We will update as soon as we can.
Cheers, On Thu, 4 Jul 2019, 04:57 Yun Tang, <[hidden email]> wrote: > Hi Andrea > > This should be a bug already fixed by > https://issues.apache.org/jira/browse/FLINK-12042 , you could upgrade to > at least 1.7.3 version to avoid this bug. > > Best > Yun Tang > ________________________________ > From: Andrea Spina <[hidden email]> > Sent: Wednesday, July 3, 2019 15:46 > To: [hidden email] > Subject: Re: Flink 1.6.4 Issue on RocksDB incremental checkpoints and > fs.default-scheme > > Hi, I attached also the JM log. Thereby you can appreciate the exception. > I hope that can help. > As I said previously, disabling fs.default-scheme property solved my issue. > > cheers, > > Il giorno lun 1 lug 2019 alle ore 18:17 Yun Tang <[hidden email]<mailto: > [hidden email]>> ha scritto: > Hi Andrea > > Unfortunately, the tm log provided is not the task manager in which > RocksDBStateBackend first failed. All tasks on this task manager are > actually canceled by job manager, you could find a lot of "Attempting to > cancel task" before any task failed. > > From your latest description, this problem happened without any > relationship to fs.default-schema. And actually I wonder the previous error > "Could not materialize checkpoint 1 for operator Service Join SuperService > (6/8)" was whether the root cause of your job's first failover, it might be > caused by other task failure and then cancelled via JM leading to that > directory cleaned up. > > I think you could search your job manager's log to find the first failed > task exception and locate which task manager that task run. That task > manager would contain useful messages. If possible, please provide your job > manager's log. > > Best > Yun Tang > ________________________________ > From: Andrea Spina <[hidden email]<mailto: > [hidden email]>> > Sent: Monday, July 1, 2019 23:14 > To: [hidden email]<mailto:[hidden email]> > Subject: Re: Flink 1.6.4 Issue on RocksDB incremental checkpoints and > fs.default-scheme > > Hi Yun, > rocksDB configuration is set as follows: > ``` > RocksDB write-buffer size: 512MB > RocksDB BlockSize (cache) [B/K/M]: 128MB > Checkpoints directory: > hdfs:///address-to-hdfs-chkp-dir:8020/flink/checkpoints > enable Checkpoints: true > Rocksdb cache index and filters true > RocksDB thread No.: 4 > Checkpoints interval: 60000 > RocksDB BlockSize [B/K/M]: 16KB > RocksDB write-buffer count: 5 > Use incremental checkpoints: true > Rocksdb optimize hits: true > RocksDB write-buffer number to merge: 2 > ``` > > I use RocksDBStateBackend class, but I recorded the same result by using > configuration parameter state.backend.incremental: true. > > Il giorno lun 1 lug 2019 alle ore 14:41 Yun Tang <[hidden email]<mailto: > [hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> ha > scritto: > Hi Andrea > > The error happens when Flink try to verify whether your local backup > directory existed[1]. If you could reproduce this, would you please share > your configuration to RocksDBStateBackend, and what `fs.default-scheme` > have you configured. Taskmanager log with more details is also very welcome. > > > [1] > https://github.com/apache/flink/blob/6f4148180ba372a2c12c1d54bea8579350af6c98/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L2568 > > Best > Yun Tang > ________________________________ > From: Andrea Spina <[hidden email]<mailto: > [hidden email]><mailto:[hidden email]<mailto: > [hidden email]>>> > Sent: Monday, July 1, 2019 20:06 > To: [hidden email]<mailto:[hidden email]><mailto: > [hidden email]<mailto:[hidden email]>> > Subject: Fwd: Flink 1.6.4 Issue on RocksDB incremental checkpoints and > fs.default-scheme > > Dear community, I am running through the following issue. whenever I use > rocksdb as state backend along with incremental checkpoints, I get the > following error: > > > > > > > > > > > > > > > > > *Caused by: java.lang.Exception: Could not materialize checkpoint 1 for > operator Service Join SuperService (6/8). at > > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942) > ... 6 moreCaused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException at > java.util.concurrent.FutureTask.report(FutureTask.java:122) at > java.util.concurrent.FutureTask.get(FutureTask.java:192) at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53) > at > > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:47) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853) > ... 5 moreCaused by: java.lang.IllegalStateException at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) > at > > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.runSnapshot(RocksDBKeyedStateBackend.java:2568) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50) > ... 7 more* > > In my case, I am able to use incremental checkopints with rocksdb as long > as I disable *fs.default-scheme* property; in any other case, I get the > above error. Is this a known issue? > > Hope this can help, > -- > *Andrea Spina* > Head of R&D @ Radicalbit Srl > Via Giovanni Battista Pirelli 11, 20124, Milano - IT > > > -- > Andrea Spina > Head of R&D @ Radicalbit Srl > Via Giovanni Battista Pirelli 11, 20124, Milano - IT > > > -- > Andrea Spina > Head of R&D @ Radicalbit Srl > Via Giovanni Battista Pirelli 11, 20124, Milano - IT > |
Free forum by Nabble | Edit this page |