UpTimeRobot and Kubernetes nodes
I recently moved my home Kubernetes cluster to using Talos Linux. A new take on how nodes by consuming them as immutable resources. Thus avoiding the cumbersome work of maintaining an OS.
Previously I ran my cluster on Ubuntu Cloud Init images and then using MicroK8s. I however found myself using time on OS patching and when I was introduced to Talos Linux through work, I knew I just had to try it out.
Setting up the cluster was incredibly easy, and there even was a great article on how set it up on Proxmox - my hypervisor of choice :-) That article can be found here
There is a whole list of how to install it on different platforms - virtual, cloud and bare metal.
Back to that monitoring thing....
So back on the Ubuntu servers I had an ansible script to add a cronjob, which would hit up a heartbeat on UptimeRobot - so I would know if a node was down. After moving to Talos Linux, that was no longer possible. So I came to thinking, why not get a cronjob running inside my Kubernetes Cluster. That way, it would also tell me if scheduling workloads was out of order.
The update of the heartbeat is a simple https request, and I chose to execute it with wget. So after doing some thinking, I decided to create my own Docker image based on alpine and install wget in that.
So just a short Dockerfile
FROM alpine:3.15.4
RUN apk add wget
So build and push it:
docker build . -t simcax/alpine-wget:3.15.4
docker push simcax/alpine-wget:3.15.4
Now I was ready to use it in a cronjob. So I created my cronjob.yaml for the first master:
apiVersion: batch/v1
kind: CronJob
metadata:
namespace: crons
name: uptime-robot-master-01
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
ttlSecondsAfterFinished: 30
template:
spec:
nodeName: talos01
containers:
- name: uptime-robot-heartbeat-talos-master-01
image: simcax/alpine-wget:3.15.4
imagePullPolicy: IfNotPresent
command:
- wget
- --spider
- https://heartbeat.uptimerobot.com/<uptime-robot-heartbeat-random-uid>
restartPolicy: OnFailure
A couple of things of note:
- I stuck with best practices and had the jobs be created in a namespace by itself - crons
- The ttlSecondsAfterFinished makes sure to get the jobs cleaned up once they are done
And that was it - now I have a monitor for not only the nodes being up, but also scheduable!