"Kubernetes and Nfs Volumes"

Since I started running Kubernetes with MicroK8s on my Raspberry Pi's and later on my HP Microserver, there's been one challenge, which I have been putting out. Storage.

That is persistent storage for my containers. Take this site running Ghost. It needs some persistent storage, in order to persist the configuration and articles being posted.

Till recently it has been solved with local storage, binding the container to a specific node. That is quite the hassle - I guess for obvious reasons, but let me outline the major pains:

  • The node cannot be drained - unless downtime is acceptable
  • Nodes needs disk space locally
  • Backup needs to happen on the nodes

nfs-volume

NFS to the rescue

Looking for solutions, I had been considering NFS for a while, and wanted to try it out - just making an MVP (minimal viable product). It would be setup on a virtual server I have in my Proxmox Virtual Environment. I installed the NFS server

sudo apt install nfs-kernel-server

Then created the nfs export structure:

mkdir -p /data/nfs/ghost-skov-codes

Added it to the export definition /etc/exports

/data/nfs/ghost-skov-codes *(rw,sync,no_subtree_check,insecure)

Finally listing the exports on the server with showmount:

showmount -e 10.0.0.100

Export list 10.0.0.100:
/data/nfs/ghost-skov-codes *

showmount lists exported NFS directories available on a server

Persistent volume

Then the part of getting the persistent volume created:

apiVersion: v1
kind: PersistentVolume
metadata:
name: skov-codes-content-nfs1
namespace: 
labels: 
    directory: skov-codes-content-nfs1
spec:
capacity: 
    storage: 100M
accessModes:
- ReadWriteMany
storageClassName: manual
nfs:
    server: 10.0.0.100
    path: /data/nfs/ghost-skov-codes/

And finally after applying, running kubectl to describe the persistent volume:

kubectl describe pv skov-codes-content-nfs1

Will give the following result

Name:            skov-codes-content-nfs1
    Labels:          app.kubernetes.io/managed-by=Helm
                    directory=skov-codes-content-nfs1
    Annotations:     meta.helm.sh/release-name: skov-codes
                    meta.helm.sh/release-namespace: default
                    pv.kubernetes.io/bound-by-controller: yes
    Finalizers:      [kubernetes.io/pv-protection]
    StorageClass:    manual
    Status:          Bound
    Claim:           default/skovcodes-local-content-volume-nfs1
    Reclaim Policy:  Retain
    Access Modes:    RWX
    VolumeMode:      Filesystem
    Capacity:        100M
    Node Affinity:   <none>
    Message:         
    Source:
        Type:      NFS (an NFS mount that lasts the lifetime of a pod)
        Server:    10.0.0.100
        Path:      /data/nfs/ghost-skov-codes/
        ReadOnly:  false
    Events:        <none>

After using this new persistent volume, it's possible to drain a node and have the pod continuing to serve.

Listing the pods on the nodes with:

kubectl get pods --output 'jsonpath={range .items[*]}{.spec.nodeName}{" "}{.metadata.namespace}{" "}{.metadata.name}{"\n"}{end}'

Shows all pods to be on the node "kube03":

kube03 default skov-codes-skov-run-c77684b5d-s5v6k
kube03 default skov-codes-skov-run-c77684b5d-fqggt
kube03 default skov-run-594f65775b-b44tc

Trying to drain the node with:

kubectl drain kube03 --ignore-daemonsets

Will now show us the pods being evicted:

➜  ~ kubectl drain kube01 --ignore-daemonsets
node/kube01 cordoned
WARNING: ignoring DaemonSet-managed Pods: ingress/nginx-ingress-microk8s-controller-754gx, kube-system/calico-node-cjbjk
evicting pod kube-system/calico-kube-controllers-dc86ccb69-5ggc4
evicting pod default/skov-codes-skov-run-c77684b5d-vx9tt
evicting pod default/skov-codes-skov-run-c77684b5d-g64j2
error when evicting pods/"skov-codes-skov-run-c77684b5d-g64j2" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/skov-codes-skov-run-c77684b5d-g64j2
error when evicting pods/"skov-codes-skov-run-c77684b5d-g64j2" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/skov-codes-skov-run-c77684b5d-vx9tt evicted
pod/skov-codes-skov-run-c77684b5d-g64j2 evicted
node/kube01 evicted

After the pods are all evicted, the node is now drained, and ready for maintenance

Writing this article taught me quite a few things. Having storage sorted with NFS gives a lot of freedom, in terms of shuffling pods around your cluster - however, it does not guarantee you 0 downtime.

I had to implement a PodDisruptionBudget, making sure at least one POD was always up. As well as having more than 1 replica. Something I wasn't sure would work with the Ghost blog image, but it works apparantly due to the NFS storage being "ReadWriteMany".

The POD Disruption Budget ended up like this:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: skov.codes-skov-run-pdb
namespace: 
labels:
        app.kubernetes.io/name: skov-run
        app.kubernetes.io/instance: skov.codes
spec:
selector: 
    matchLabels:
    app.kubernetes.io/name: skov-run
    app.kubernetes.io/instance: skov.codes
minAvailable: 1

At the same time I made sure to up the number of replicas to 2:

apiVersion: apps/v1
kind: Deployment
metadata:
name: skov.codes-skov-run
namespace: 
labels:
    helm.sh/chart: skov-run-0.1.0
    app.kubernetes.io/name: skov-run
    app.kubernetes.io/instance: skov.codes
    app.kubernetes.io/version: "1.16.0"
    app.kubernetes.io/managed-by: Helm
spec:
replicas: 2
selector:
    matchLabels:
    app.kubernetes.io/name: skov-run
    app.kubernetes.io/instance: skov.code
    ....

nginx

One final thing I had to get in place, was my NGiNX configuration. As previously, the configuration pointed only to the Master Node thus not taking full advantage of having 4 nodes, and leveraging the full potential of running multiple nodes.

I changed the configuration from a simple proxy_pass configuration to a load balancer configuration instead.

The new configuration is split into 2 parts - one for the upstream definition.

upstream app {
    server 10.0.0.150:80;
    server 10.0.0.151:80;
    server 10.0.0.152:80;
    server 10.0.0.153:80;
}

cluster_upstream.conf

And then the change to the proxy_pass directive in my virtual host configuration.

proxy_pass http://app;

Conclusions

I came from a shaky implementation of my Ghost blog site, where a node upgrade would most definetely lead to downtime, and moved to a far more resilient setup. The webserver now has load balancing. My application deployment on my Kubernetes cluster, now have a POD Disruption Policy making sure a POD is always up. Both of these things allowing me to drain nodes and patch and upgrade my cluster, without downtime. Very satisfying for a good nerdy weekend!

links

social