Status of a savepoint operation returns Completed but an error was thrown

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Status of a savepoint operation returns Completed but an error was thrown

Diogo Santos
Hi guys,

We developed some scripts to improve the rolling updates in our pipelines,
and one of the tasks done is to trigger a savepoint and waits for the
response until the status is Completed or until it achieves the limit of
retries.

It was noticed that sometimes the response has the status Completed but the
request failed:

{
    "status": {
        "id": "COMPLETED"
    },
    "operation": {
        "failure-cause": {
            "class": "java.util.concurrent.CompletionException",
            "stack-trace": "java.util.concurrent.CompletionException: ....
)\n\t... 47 more\n",
            "serialized-throwable": "..."
        }
    }
}

An easy way to reproduce the issue is to put the job in a restart loop and
trigger a savepoint.

Should the status be in-progress, right?
Reply | Threaded
Open this post in threaded view
|

Re: Status of a savepoint operation returns Completed but an error was thrown

Till Rohrmann
Hi Diogo,

the idea is that a savepoint operation can also fail. The status only
denotes whether the savepoint operation is still in-progress or completed
because it is an asynchronous operation. A savepoint operation can be
completed if it succeeded or if it failed. The failure cause should tell
you what went wrong with the operation. Does this make sense?

Cheers,
Till

On Mon, May 17, 2021 at 7:00 AM Diogo Santos <[hidden email]>
wrote:

> Hi guys,
>
> We developed some scripts to improve the rolling updates in our pipelines,
> and one of the tasks done is to trigger a savepoint and waits for the
> response until the status is Completed or until it achieves the limit of
> retries.
>
> It was noticed that sometimes the response has the status Completed but the
> request failed:
>
> {
>     "status": {
>         "id": "COMPLETED"
>     },
>     "operation": {
>         "failure-cause": {
>             "class": "java.util.concurrent.CompletionException",
>             "stack-trace": "java.util.concurrent.CompletionException: ....
> )\n\t... 47 more\n",
>             "serialized-throwable": "..."
>         }
>     }
> }
>
> An easy way to reproduce the issue is to put the job in a restart loop and
> trigger a savepoint.
>
> Should the status be in-progress, right?
>