Skip to content

How to rotate certificates of a TKGM cluster without upgrading

When TKG management and workload clusters are created, certificates are generated during the kubeadm initialization phase. These certificates expire after a year. These certificates are automatically rotated when you upgrade your clusters. But what if you can not upgrade your clusters within a year's time frame?

This post shares the steps on how certificates can be rotated without upgrading your clusters. At a high level, this process triggers a rollout of control planes so they enter the kubeadm initialization phase again and generate new certs. For a similar walkthrough in the TKGS environment check out this post

Info

The upstream community is working on a PR to automatically renew control plane machine certificates. This should remove the need for any manual intervention in the future.

This PR achieves certificate rotation on control plane machine by repaving the machines. It is achieved by doing the following:

  • Add an annotation (machine.cluster.x-k8s.io/certificates-expiry-date) on KubeadmBootstrapConfig objects that captures the certificates expiry date (1 year from the creation time) Update the machine status with certificate expiry date by either reading that annotation on the machine object or the bootstrap config object.
  • Add a field to KCP called kcp.spec.rolloutBefore.certificatesExpiryDays that can be used to trigger a rollout if the control plane machine's certificates will expire within the specified days.

Environment Details Before Rotation

Get the list of nodes and check the certificate expiration

kubectl get nodes \
-o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' \
-l node-role.kubernetes.io/master= > nodes

for i in `cat nodes`; do
    printf "\n######\n"
    ssh -o "StrictHostKeyChecking=no" -q capv@$i hostname
    ssh -o "StrictHostKeyChecking=no" -q capv@$i sudo kubeadm certs check-expiration
done;

Sample output

######
workload-slot35rp10-control-plane-ggsmj
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0923 17:51:03.686273 4172778 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Sep 21, 2023 23:13 UTC   363d            ca                      no
apiserver                  Sep 21, 2023 23:13 UTC   363d            ca                      no
apiserver-etcd-client      Sep 21, 2023 23:13 UTC   363d            etcd-ca                 no
apiserver-kubelet-client   Sep 21, 2023 23:13 UTC   363d            ca                      no
controller-manager.conf    Sep 21, 2023 23:13 UTC   363d            ca                      no
etcd-healthcheck-client    Sep 21, 2023 23:13 UTC   363d            etcd-ca                 no
etcd-peer                  Sep 21, 2023 23:13 UTC   363d            etcd-ca                 no
etcd-server                Sep 21, 2023 23:13 UTC   363d            etcd-ca                 no
front-proxy-client         Sep 21, 2023 23:13 UTC   363d            front-proxy-ca          no
scheduler.conf             Sep 21, 2023 23:13 UTC   363d            ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Sep 18, 2032 23:09 UTC   9y              no
etcd-ca                 Sep 18, 2032 23:09 UTC   9y              no
front-proxy-ca          Sep 18, 2032 23:09 UTC   9y              no

image

Certificate rotation using KubeadmControlPlane(KCP) rollout

Switch to management cluster context

k config use-context mgmt-slot35rp10-admin@mgmt-slot35rp10

Get KCP

k get kcp

NAME                                CLUSTER               INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
workload-slot35rp10-control-plane   workload-slot35rp10   true          true                   3          3       3         0             42h   v1.23.8+vmware.2

Trigger certificate rotation

# Preferred
kubectl patch kcp workload-slot35rp10-control-plane --type merge -p "{\"spec\":{\"rolloutAfter\":\"`date +'%Y-%m-%dT%TZ'`\"}}

k patch kcp prz-mgmt-rp03-control-plane \
-n tkg-system \
--type "json" \
-p '[{"op":"add","path":"/spec/kubeadmConfigSpec/preKubeadmCommands/-","value":"echo \"$(date)\" >> /tmp/kcp_recreate_date.log"}]'

Machine rollout begins

k get machines

NAME                                        CLUSTER               NODENAME                                    PROVIDERID                                       PHASE          AGE   VERSION
workload-slot35rp10-control-plane-7z95k     workload-slot35rp10                                                                                                Provisioning   20s   v1.23.8+vmware.2
workload-slot35rp10-control-plane-ggsmj     workload-slot35rp10   workload-slot35rp10-control-plane-ggsmj     vsphere://4201a86e-3c15-879a-1b85-78f76a16c27f   Running        42h   v1.23.8+vmware.2
workload-slot35rp10-control-plane-hxbxb     workload-slot35rp10   workload-slot35rp10-control-plane-hxbxb     vsphere://42014b2e-07e4-216a-24ef-86e2d52d7bbd   Running        42h   v1.23.8+vmware.2
workload-slot35rp10-control-plane-sm4nw     workload-slot35rp10   workload-slot35rp10-control-plane-sm4nw     vsphere://4201cff3-2715-ffe1-c4a6-35fc795995ce   Running        42h   v1.23.8+vmware.2
workload-slot35rp10-md-0-667bcd6b57-79br9   workload-slot35rp10   workload-slot35rp10-md-0-667bcd6b57-79br9   vsphere://420142a2-d141-7d6b-b322-9c2afcc47da5   Running        42h   v1.23.8+vmware.2
workload-slot35rp10-md-1-7bdfdcf7f-rhc8j    workload-slot35rp10   workload-slot35rp10-md-1-7bdfdcf7f-rhc8j    vsphere://420115c0-3672-4da7-dd16-77ef4e0c557f   Running        42h   v1.23.8+vmware.2
workload-slot35rp10-md-2-5bb8468b59-z4jdf   workload-slot35rp10   workload-slot35rp10-md-2-5bb8468b59-z4jdf   vsphere://42019a7e-4900-84ed-2a34-135ee837952f   Running        42h   v1.23.8+vmware.2

Machine status post rollout

k get machines
NAME                                        CLUSTER               NODENAME                                    PROVIDERID                                       PHASE     AGE   VERSION
workload-slot35rp10-control-plane-4xgw8     workload-slot35rp10   workload-slot35rp10-control-plane-4xgw8     vsphere://42011ef0-2abb-b934-a03b-ce995d5e2b8e   Running   13m   v1.23.8+vmware.2
workload-slot35rp10-control-plane-7z95k     workload-slot35rp10   workload-slot35rp10-control-plane-7z95k     vsphere://42018773-23ab-cb58-89b7-0d5e6656aca1   Running   20m   v1.23.8+vmware.2
workload-slot35rp10-control-plane-xwhgj     workload-slot35rp10   workload-slot35rp10-control-plane-xwhgj     vsphere://4201b550-9388-52ad-6848-8f05d885bb9c   Running   17m   v1.23.8+vmware.2
workload-slot35rp10-md-0-667bcd6b57-79br9   workload-slot35rp10   workload-slot35rp10-md-0-667bcd6b57-79br9   vsphere://420142a2-d141-7d6b-b322-9c2afcc47da5   Running   43h   v1.23.8+vmware.2
workload-slot35rp10-md-1-7bdfdcf7f-rhc8j    workload-slot35rp10   workload-slot35rp10-md-1-7bdfdcf7f-rhc8j    vsphere://420115c0-3672-4da7-dd16-77ef4e0c557f   Running   43h   v1.23.8+vmware.2
workload-slot35rp10-md-2-5bb8468b59-z4jdf   workload-slot35rp10   workload-slot35rp10-md-2-5bb8468b59-z4jdf   vsphere://42019a7e-4900-84ed-2a34-135ee837952f   Running   43h   v1.23.8+vmware.2

Verify cert rotation

Switch to workload cluster context

k config use-context workload-slot35rp10-admin@workload-slot35rp10

Get Certificate details post rollout

kubectl get nodes \
-o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' \
-l node-role.kubernetes.io/master= > nodes

for i in `cat nodes`; do
    printf "\n######\n"
    ssh -o "StrictHostKeyChecking=no" -q capv@$i hostname
    ssh -o "StrictHostKeyChecking=no" -q capv@$i sudo kubeadm certs check-expiration
done;

Sample Output

######
workload-slot35rp10-control-plane-4xgw8
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'

W0923 18:10:02.660438   13427 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Sep 23, 2023 18:05 UTC   364d            ca                      no
apiserver                  Sep 23, 2023 18:05 UTC   364d            ca                      no
apiserver-etcd-client      Sep 23, 2023 18:05 UTC   364d            etcd-ca                 no
apiserver-kubelet-client   Sep 23, 2023 18:05 UTC   364d            ca                      no
controller-manager.conf    Sep 23, 2023 18:05 UTC   364d            ca                      no
etcd-healthcheck-client    Sep 23, 2023 18:05 UTC   364d            etcd-ca                 no
etcd-peer                  Sep 23, 2023 18:05 UTC   364d            etcd-ca                 no
etcd-server                Sep 23, 2023 18:05 UTC   364d            etcd-ca                 no
front-proxy-client         Sep 23, 2023 18:05 UTC   364d            front-proxy-ca          no
scheduler.conf             Sep 23, 2023 18:05 UTC   364d            ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Sep 18, 2032 23:09 UTC   9y              no
etcd-ca                 Sep 18, 2032 23:09 UTC   9y              no
front-proxy-ca          Sep 18, 2032 23:09 UTC   9y              no
  • Certificates have been refreshed/rotated back to 364 days

image