Configmaps not deleted Kubernetes HA

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Configmaps not deleted Kubernetes HA

Enrique
Hi all,

I am deploying a Flink Cluster in session mode using Kubernetes HA and have
seen it working with the different config maps for the dispatcher,
restserver and resourcemanager. I also have configured storage for
checkpointing and HA metadata.

When I submit a job, I can see that a config map is created for it
containing checkpoint information which is updated correctly. Yet, when I
cancel a job I assume the config map would be deleted but it's seems that it
isn't. Is this the intended behaviour? I am worried that s many jobs are
submitted and cancelled from Flink Cluster a large number of Config Maps
would remain in the cluster.

Thanks in advance,

Enrique



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Configmaps not deleted Kubernetes HA

Till Rohrmann
Hi Enrique,

I think you are running into this problem FLINK-20695 [1]. In a nutshell,
Flink only deletes the config maps when it shuts down at the moment. We
want to change this with the next release.

[1]  https://issues.apache.org/jira/browse/FLINK-20695

Cheers,
Till

On Wed, May 5, 2021 at 8:36 PM Enrique <[hidden email]> wrote:

> Hi all,
>
> I am deploying a Flink Cluster in session mode using Kubernetes HA and have
> seen it working with the different config maps for the dispatcher,
> restserver and resourcemanager. I also have configured storage for
> checkpointing and HA metadata.
>
> When I submit a job, I can see that a config map is created for it
> containing checkpoint information which is updated correctly. Yet, when I
> cancel a job I assume the config map would be deleted but it's seems that
> it
> isn't. Is this the intended behaviour? I am worried that s many jobs are
> submitted and cancelled from Flink Cluster a large number of Config Maps
> would remain in the cluster.
>
> Thanks in advance,
>
> Enrique
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Configmaps not deleted Kubernetes HA

Enrique
Hi Till,

I'm not using Zookeeper HA, but the new Native Kubernetes HA. I'm deploying
the Flink Cluster using a StatefulSet one for each JM/TM and PVC to store HA
metadata/checkpointing/savepointing. When I delete both StatefulSets and the
JM/TM pods terminate the HA Config Maps are not deleted.

If I then want to delete my storage and recreate the Flink Cluster, it will
try to restore Jobs from the Config Map data and fail. So to clarify, the
intended behaviour is for Config Maps to be deleted as part of the Flink
Cluster shutting down? Is there a JIRA ticket raised for Native Kubernetes
HA?

Thanks,
Enrique



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Configmaps not deleted Kubernetes HA

Yang Wang
In reply to this post by Till Rohrmann
Hi Enrique,

I think it is related with FLINK-20219. Currently, the HA related
ConfigMap/ZNodes could not be cleaned up properly.

The HA related ConfigMaps clean up mechanism for session could get improved
in the following two ways.
* Delete the jobmanager leader ConfigMap once the job reached to a terminal
state(canceled, succeed, failed)
* Try to clean up all the HA ConfigMaps for terminal jobs when shut down
the cluster

[1]. https://issues.apache.org/jira/browse/FLINK-20219


Best,
Yang

Till Rohrmann <[hidden email]> 于2021年5月6日周四 下午10:23写道:

> Hi Enrique,
>
> I think you are running into this problem FLINK-20695 [1]. In a nutshell,
> Flink only deletes the config maps when it shuts down at the moment. We
> want to change this with the next release.
>
> [1]  https://issues.apache.org/jira/browse/FLINK-20695
>
> Cheers,
> Till
>
> On Wed, May 5, 2021 at 8:36 PM Enrique <[hidden email]> wrote:
>
> > Hi all,
> >
> > I am deploying a Flink Cluster in session mode using Kubernetes HA and
> have
> > seen it working with the different config maps for the dispatcher,
> > restserver and resourcemanager. I also have configured storage for
> > checkpointing and HA metadata.
> >
> > When I submit a job, I can see that a config map is created for it
> > containing checkpoint information which is updated correctly. Yet, when I
> > cancel a job I assume the config map would be deleted but it's seems that
> > it
> > isn't. Is this the intended behaviour? I am worried that s many jobs are
> > submitted and cancelled from Flink Cluster a large number of Config Maps
> > would remain in the cluster.
> >
> > Thanks in advance,
> >
> > Enrique
> >
> >
> >
> > --
> > Sent from:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Configmaps not deleted Kubernetes HA

Till Rohrmann
In reply to this post by Enrique
Hi Enrique,

I think you are actually seeing a mixture of FLINK-20219 and FLINK-20695.
If any of these problems is solved, then the problem should be gone. Also
note that the K8s HA services won't clean up the ConfigMaps if you delete
the deployment as documented here [1].

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/ha/kubernetes_ha/#high-availability-data-clean-up

Cheers,
Till

On Fri, May 7, 2021 at 9:28 AM Enrique <[hidden email]> wrote:

> Hi Till,
>
> I'm not using Zookeeper HA, but the new Native Kubernetes HA. I'm deploying
> the Flink Cluster using a StatefulSet one for each JM/TM and PVC to store
> HA
> metadata/checkpointing/savepointing. When I delete both StatefulSets and
> the
> JM/TM pods terminate the HA Config Maps are not deleted.
>
> If I then want to delete my storage and recreate the Flink Cluster, it will
> try to restore Jobs from the Config Map data and fail. So to clarify, the
> intended behaviour is for Config Maps to be deleted as part of the Flink
> Cluster shutting down? Is there a JIRA ticket raised for Native Kubernetes
> HA?
>
> Thanks,
> Enrique
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>