Hi all,
I am deploying a Flink Cluster in session mode using Kubernetes HA and have seen it working with the different config maps for the dispatcher, restserver and resourcemanager. I also have configured storage for checkpointing and HA metadata. When I submit a job, I can see that a config map is created for it containing checkpoint information which is updated correctly. Yet, when I cancel a job I assume the config map would be deleted but it's seems that it isn't. Is this the intended behaviour? I am worried that s many jobs are submitted and cancelled from Flink Cluster a large number of Config Maps would remain in the cluster. Thanks in advance, Enrique -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ |
Hi Enrique,
I think you are running into this problem FLINK-20695 [1]. In a nutshell, Flink only deletes the config maps when it shuts down at the moment. We want to change this with the next release. [1] https://issues.apache.org/jira/browse/FLINK-20695 Cheers, Till On Wed, May 5, 2021 at 8:36 PM Enrique <[hidden email]> wrote: > Hi all, > > I am deploying a Flink Cluster in session mode using Kubernetes HA and have > seen it working with the different config maps for the dispatcher, > restserver and resourcemanager. I also have configured storage for > checkpointing and HA metadata. > > When I submit a job, I can see that a config map is created for it > containing checkpoint information which is updated correctly. Yet, when I > cancel a job I assume the config map would be deleted but it's seems that > it > isn't. Is this the intended behaviour? I am worried that s many jobs are > submitted and cancelled from Flink Cluster a large number of Config Maps > would remain in the cluster. > > Thanks in advance, > > Enrique > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > |
Hi Till,
I'm not using Zookeeper HA, but the new Native Kubernetes HA. I'm deploying the Flink Cluster using a StatefulSet one for each JM/TM and PVC to store HA metadata/checkpointing/savepointing. When I delete both StatefulSets and the JM/TM pods terminate the HA Config Maps are not deleted. If I then want to delete my storage and recreate the Flink Cluster, it will try to restore Jobs from the Config Map data and fail. So to clarify, the intended behaviour is for Config Maps to be deleted as part of the Flink Cluster shutting down? Is there a JIRA ticket raised for Native Kubernetes HA? Thanks, Enrique -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ |
In reply to this post by Till Rohrmann
Hi Enrique,
I think it is related with FLINK-20219. Currently, the HA related ConfigMap/ZNodes could not be cleaned up properly. The HA related ConfigMaps clean up mechanism for session could get improved in the following two ways. * Delete the jobmanager leader ConfigMap once the job reached to a terminal state(canceled, succeed, failed) * Try to clean up all the HA ConfigMaps for terminal jobs when shut down the cluster [1]. https://issues.apache.org/jira/browse/FLINK-20219 Best, Yang Till Rohrmann <[hidden email]> 于2021年5月6日周四 下午10:23写道: > Hi Enrique, > > I think you are running into this problem FLINK-20695 [1]. In a nutshell, > Flink only deletes the config maps when it shuts down at the moment. We > want to change this with the next release. > > [1] https://issues.apache.org/jira/browse/FLINK-20695 > > Cheers, > Till > > On Wed, May 5, 2021 at 8:36 PM Enrique <[hidden email]> wrote: > > > Hi all, > > > > I am deploying a Flink Cluster in session mode using Kubernetes HA and > have > > seen it working with the different config maps for the dispatcher, > > restserver and resourcemanager. I also have configured storage for > > checkpointing and HA metadata. > > > > When I submit a job, I can see that a config map is created for it > > containing checkpoint information which is updated correctly. Yet, when I > > cancel a job I assume the config map would be deleted but it's seems that > > it > > isn't. Is this the intended behaviour? I am worried that s many jobs are > > submitted and cancelled from Flink Cluster a large number of Config Maps > > would remain in the cluster. > > > > Thanks in advance, > > > > Enrique > > > > > > > > -- > > Sent from: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > > > |
In reply to this post by Enrique
Hi Enrique,
I think you are actually seeing a mixture of FLINK-20219 and FLINK-20695. If any of these problems is solved, then the problem should be gone. Also note that the K8s HA services won't clean up the ConfigMaps if you delete the deployment as documented here [1]. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/ha/kubernetes_ha/#high-availability-data-clean-up Cheers, Till On Fri, May 7, 2021 at 9:28 AM Enrique <[hidden email]> wrote: > Hi Till, > > I'm not using Zookeeper HA, but the new Native Kubernetes HA. I'm deploying > the Flink Cluster using a StatefulSet one for each JM/TM and PVC to store > HA > metadata/checkpointing/savepointing. When I delete both StatefulSets and > the > JM/TM pods terminate the HA Config Maps are not deleted. > > If I then want to delete my storage and recreate the Flink Cluster, it will > try to restore Jobs from the Config Map data and fail. So to clarify, the > intended behaviour is for Config Maps to be deleted as part of the Flink > Cluster shutting down? Is there a JIRA ticket raised for Native Kubernetes > HA? > > Thanks, > Enrique > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > |
Free forum by Nabble | Edit this page |