"Uptimerobot Monitoring Talos"

UpTimeRobot and Kubernetes nodes

I recently moved my home Kubernetes cluster to using Talos Linux. A new take on how nodes by consuming them as immutable resources. Thus avoiding the cumbersome work of maintaining an OS.

Previously I ran my cluster on Ubuntu Cloud Init images and then using MicroK8s. I however found myself using time on OS patching and when I was introduced to Talos Linux through work, I knew I just had to try it out.

Setting up the cluster was incredibly easy, and there even was a great article on how set it up on Proxmox - my hypervisor of choice :-) That article can be found here

There is a whole list of how to install it on different platforms - virtual, cloud and bare metal.

Back to that monitoring thing....

So back on the Ubuntu servers I had an ansible script to add a cronjob, which would hit up a heartbeat on UptimeRobot - so I would know if a node was down. After moving to Talos Linux, that was no longer possible. So I came to thinking, why not get a cronjob running inside my Kubernetes Cluster. That way, it would also tell me if scheduling workloads was out of order.

The update of the heartbeat is a simple https request, and I chose to execute it with wget. So after doing some thinking, I decided to create my own Docker image based on alpine and install wget in that.

So just a short Dockerfile

FROM alpine:3.15.4
RUN apk add wget

So build and push it:

docker build . -t simcax/alpine-wget:3.15.4
docker push simcax/alpine-wget:3.15.4

Now I was ready to use it in a cronjob. So I created my cronjob.yaml for the first master:

apiVersion: batch/v1
kind: CronJob
metadata:
  namespace: crons
  name: uptime-robot-master-01
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 30
      template:
        spec:
          nodeName: talos01
          containers:
          - name: uptime-robot-heartbeat-talos-master-01
            image: simcax/alpine-wget:3.15.4
            imagePullPolicy: IfNotPresent
            command:
            - wget
            - --spider
            - https://heartbeat.uptimerobot.com/<uptime-robot-heartbeat-random-uid>
          restartPolicy: OnFailure 

A couple of things of note: - I stuck with best practices and had the jobs be created in a namespace by itself - crons - The ttlSecondsAfterFinished makes sure to get the jobs cleaned up once they are done

And that was it - now I have a monitor for not only the nodes being up, but also scheduable!

links

social