Skip to main content

Hybrid cluster playbook

Install and join process for the physical node.

Physical worker node (bare-metal)

To support more demanding applications within your cluster — such as PostgreSQL databases, GitLab, or CPU-based LLMs — your worker node will need more memory than the standard 4 GB.
A minimum of 16 GB RAM is required, though 64 GB or more is recommended if you plan to run LLMs on the CPU.

If you are running Parallels on a Mac Studio, allocate additional memory to the virtual worker node now.
Otherwise, consider using a physical (bare-metal) worker node instead.

A spare desktop or old computer is often sufficient, provided it has at least 16 GB of RAM.
This system will run the same Ubuntu Server version used in your virtual environment, but on Intel or AMD architecture rather than Apple Silicon.


Preparation

  • Download the correct server ISO image for your architecture
    • I have an old Intel i7 desktop and this will require the use of an amd64 ISO
    • Open a browser and navigate to: Ubuntu 24.04.3 LTS
    • The default download link is for amd64 systems: amd64 ISO
  • Create a bootable USB flash drive
    • As I am using macOS I will be using an application called balenaEtcherto create the bootable USB
    • If you installing onto a physical server with a LOM (Lights Out Management) then a USB drive is not required. Simply mount the ISO via the LOM, select it as a once-off boot drive and restart the server

  • Insert the USB into a port on the physical machine
  • Connect a keyboard, mouse and monitor
  • Power on, pressing the appropriate key to enter the boot menu, where you will select the USB and then continue to boot

Install Ubuntu Server on Bare Metal

The Ubuntu tutorial can be found here: Tutorial

  • Follow Stages 1 to 6 in the tutorial
  • Stage 7 of the tutorial make a note of the ethernet adapter and IP address (assuming your router provides DHCP addresses). The IP address will be manually configured in a later step
    • My physical node has an interface of eno1 and DHCP of 192.168.30.76
  • Stage 8: guided storage configuration, select: Use an entire disk
  • Stage 10: confirm partitions. You may have multiple storage devices available, one will be the USB drive, in my case I have two NVMe drives installed and one is allocated to an older ubuntu install (ubuntu-vg)

  • Do NOT press Done until you have configured your chosen storage device to use its full size. Note how the size of my ubuntu-vg is 1.816T but the new format will only use 100G of it - Move the cursor to the Used Devices section and select ubuntu-lv then edit it - Change the size to the max, in my case, 1.816T and save - Now you can proceed to the next stage after confirming your changes

  • Stage 12: setup a profile. I am going to use the same name (dev) and password as the virtual template, the only difference being the server name

    • As this the role of this server will be a worker node in my dev environment I will choose: dev-w-p1
  • The ISO differs from the tutorial at this point

    • Install Ubuntu Pro - select: No
    • SSH configuration - select the tick box next to Install OpenSSH server
    • Featured server snaps - do not select any
  • Stage 13: Install software. The install will now commence automatically

  • Stage 14: Install complete. Select Reboot Now and remove the USB

  • If all is well the server console will display: Ubuntu 24.04.3 LTS dev-w-p1 tty1

  • Press return for the login prompt

  • Confirm SSH from your workstation to the new server: ssh dev@192.168.30.76


Setup physical worker node

The worker node was previously defined:

PurposeIP AddressHostnameFQDN/notes
Subnet192.168.30.0/24Defines local network range
Default Gateway192.168.30.254Router or gateway for external access
FQDNreids.net.au
Cluster Name192.168.30.200devdev.reids.net.au
Worker Node (p1)192.168.30.207dev-w-v1dev-w-p1.reids.net.au

Before we disconnect the keyboard, mouse and monitor and manage the server through SSH we need to follow a similar process to the template.

  • ssh using username: dev and password

    • Start interactive shell as root: sudo -i

    • Update packages: apt update

    • Upgrade packages: apt upgrade

    • Install additional packages: apt install net-tools traceroute ntp locate

    • Check ethernet name and IP address: ifconfig

    • Run the following command to edit the Netplan config:

      vi /etc/netplan/50-cloud-init.yaml
    • Then change the contents in the file to your IP and gateway, the ethernet adapter in my physical server is: eno1

    • From:

      network:
      version: 2
      ethernets:
      eno1:
      dhcp4: true
    • To:

      network:
      version: 2
      ethernets:
      eno1:
      addresses:
      - '192.168.30.207/24'
      routes:
      - to: 'default'
      via: '192.168.30.254'
      • Apply the changes: netplan apply
    • At this point the SSH session will stop responding as the IP address has changed

    • Add the DNS name dev-w-p1 to your router and SSH to the server using the new IP address and login: ssh dev@dev-w-p1.reids.net.au

    • Upon success the keyboard, mouse and monitor can be disconnected from the server


Server preparation continued from ssh session

  • Connect to the server in a new terminal session: ssh dev@dev-w-p1.reids.net.au
  • Login with username: dev and password
  • Switch to root user: sudo -i
  • Check routing table: ip route
  • Check ping to google: ping 8.8.8.8
  • Check time and date: timedatectl
  • Check swap usage: swapon
  • Disable swap temporarily (reboot clears it): swapoff -a
  • Disable swap perm
    vi /etc/fstab
  • Comment out swap
    # /swap.img	none	swap	sw	0	0
  • Prepare DNS - router provides DNS
    mkdir /etc/systemd/resolved.conf.d
    vi /etc/systemd/resolved.conf.d/dns_servers.conf
    [Resolve]
    DNS=192.168.30.254
    Domains=~.
  • Enable and restart services
    systemctl enable systemd-resolved
    sudo systemctl restart systemd-resolved
  • Confirm local name resolution: ping lab.reids.net.au
  • Confirm external name resolution: ping www.google.com.au
  • Configure NTP (note NTP is now part of NTPSEC)
    vi /etc/ntpsec/ntp.conf
  • Add local NTP server and comment out the defaults - my router provides NTP
    pool 192.168.30.254 iburst
    # pool 0.ubuntu.pool.ntp.org iburst
    # pool 1.ubuntu.pool.ntp.org iburst
    # pool 2.ubuntu.pool.ntp.org iburst
    # pool 3.ubuntu.pool.ntp.org iburst
    # server ntp.ubuntu.com
  • Restart NTP services for changes to take effect: service ntp restart
  • Check that NTP services are being picked up: ntpq -p
  • Set timezone: timedatectl set-timezone Australia/Perth
  • Check time and date: timedatectl
  • Update database for locate package: updatedb
  • Add additional user: adduser andy
  • Add new user to sudo list: usermod -aG sudo andy
  • Disable IPv6
    vi /etc/default/grub
  • Change the default
    GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 ipv6.disable=1 quiet splash"
  • Update grub for changes to take effect: update-grub
  • Disable cloud init: touch /etc/cloud/cloud-init.disabled
  • Install NFS: apt install nfs-common -y
  • Reboot the server to ensure all changes have taken effect: reboot

Add new server to k8s cluster as a worker node

  • New Node: Process to deploy additional node. A new physical worker node (dev-w-p1) is to be added to the cluster
    • Ensure the following steps have been completed before running the playbook
      • Prepare the node to be added: Completed
      • Power on the ansible server: dev-ansible and login as dev
      • Activate the virtual environment
        source venv/bin/activate
      • Add the new node to the ansible server host file
        sudo vi /etc/hosts
        192.168.30.200 dev dev.reids.net.au
        192.168.30.201 dev-lb-v1 dev-lb-v1.reids.net.au
        192.168.30.203 dev-m-v1 dev-m-v1.reids.net.au
        192.168.30.206 dev-w-v1 dev-w-v1.reids.net.au
        192.168.30.207 dev-w-p1 dev-w-p1.reids.net.au
      • Deploy the ssh-key to the node, and test ssh to it
        cd ~/.ssh/
        ssh-copy-id dev-w-p1
      • Does the new node allow SSH connection from the ansible server: ssh dev-w-p1
      • Update the inventory/devcluster/inventory.ini with the role of the new node
        vi ~/kubespray-devcluster/kubespray/inventory/devcluster/inventory.ini
        [kube_control_plane]
        dev-m-v1 ansible_host=192.168.30.203 etcd_member_name=etcd1

        [etcd:children]
        kube_control_plane

        [kube_node]
        dev-w-v1 ansible_host=192.168.30.206
        dev-w-p1 ansible_host=192.168.30.207
    • Change to the kubespray directory
      cd ~/kubespray-devcluster/kubespray/
    • Refresh facts for all hosts before limiting. Running facts.yml ensures all nodes (old + new) have up-to-date facts cached:
      ansible-playbook -i inventory/devcluster/inventory.ini playbooks/facts.yml -b -u dev -K
      PLAY RECAP ********************************************************************************************************************************
      dev-m-v1 : ok=25 changed=1 unreachable=0 failed=0 skipped=21 rescued=0 ignored=0
      dev-w-p1 : ok=25 changed=3 unreachable=0 failed=0 skipped=21 rescued=0 ignored=0
      dev-w-v1 : ok=37 changed=1 unreachable=0 failed=0 skipped=27 rescued=0 ignored=0

      Sunday 05 October 2025 05:10:01 +0000 (0:00:00.783) 0:00:11.993 ********
      ===============================================================================
      system_packages : Manage packages -------------------------------------------------------------------------------------------------- 5.88s
      Gather necessary facts (hardware) -------------------------------------------------------------------------------------------------- 0.78s
      bootstrap_os : Fetch /etc/os-release ----------------------------------------------------------------------------------------------- 0.67s
      system_packages : Gather OS information -------------------------------------------------------------------------------------------- 0.61s
      bootstrap_os : Assign inventory name to unconfigured hostnames (non-CoreOS, non-Flatcar, Suse and ClearLinux, non-Fedora) ---------- 0.46s
      network_facts : Gather ansible_default_ipv6 ---------------------------------------------------------------------------------------- 0.44s
      Gather minimal facts --------------------------------------------------------------------------------------------------------------- 0.35s
      network_facts : Gather ansible_default_ipv4 ---------------------------------------------------------------------------------------- 0.35s
      Gather necessary facts (network) --------------------------------------------------------------------------------------------------- 0.33s
      bootstrap_os : Gather facts -------------------------------------------------------------------------------------------------------- 0.32s
      bootstrap_os : Create remote_tmp for it is used by another module ------------------------------------------------------------------ 0.27s
      bootstrap_os : Ensure bash_completion.d folder exists ------------------------------------------------------------------------------ 0.21s
      bootstrap_os : Check if bootstrap is needed ---------------------------------------------------------------------------------------- 0.12s
      bootstrap_os : Check http::proxy in apt configuration files ------------------------------------------------------------------------ 0.10s
      bootstrap_os : Check https::proxy in apt configuration files ----------------------------------------------------------------------- 0.08s
      bootstrap_os : Include vars -------------------------------------------------------------------------------------------------------- 0.05s
      bootstrap_os : Include tasks ------------------------------------------------------------------------------------------------------- 0.05s
      dynamic_groups : Match needed groups by their old names or definition -------------------------------------------------------------- 0.05s
      Check that python netaddr is installed --------------------------------------------------------------------------------------------- 0.05s
      validate_inventory : Check that kube_pods_subnet does not collide with kube_service_addresses -------------------------------------- 0.03s
    • Run the scale playbook:
      ansible-playbook -i inventory/devcluster/inventory.ini scale.yml -b -u dev -K --limit dev-w-p1
      PLAY RECAP ********************************************************************************************************************************
      dev-w-p1 : ok=378 changed=77 unreachable=0 failed=0 skipped=639 rescued=0 ignored=0

      Sunday 05 October 2025 05:16:04 +0000 (0:00:00.020) 0:05:00.164 ********
      ===============================================================================
      download : Download_container | Download image if required ------------------------------------------------------------------------ 41.86s
      download : Download_file | Download item ------------------------------------------------------------------------------------------ 29.30s
      download : Download_container | Download image if required ------------------------------------------------------------------------ 28.09s
      download : Download_file | Download item ------------------------------------------------------------------------------------------ 26.84s
      download : Download_container | Download image if required ------------------------------------------------------------------------ 23.36s
      download : Download_file | Download item ------------------------------------------------------------------------------------------ 11.61s
      download : Download_file | Download item ------------------------------------------------------------------------------------------ 10.60s
      network_plugin/calico : Wait for calico kubeconfig to be created ------------------------------------------------------------------- 7.31s
      container-engine/containerd : Download_file | Download item ------------------------------------------------------------------------ 7.08s
      download : Download_container | Download image if required ------------------------------------------------------------------------- 5.02s
      container-engine/crictl : Download_file | Download item ---------------------------------------------------------------------------- 4.57s
      download : Download_container | Download image if required ------------------------------------------------------------------------- 3.83s
      container-engine/nerdctl : Download_file | Download item --------------------------------------------------------------------------- 3.73s
      container-engine/runc : Download_file | Download item ------------------------------------------------------------------------------ 3.68s
      container-engine/validate-container-engine : Populate service facts ---------------------------------------------------------------- 3.31s
      network_plugin/cni : CNI | Copy cni plugins ---------------------------------------------------------------------------------------- 2.41s
      container-engine/containerd : Containerd | Unpack containerd archive --------------------------------------------------------------- 2.35s
      container-engine/crictl : Extract_file | Unpacking archive ------------------------------------------------------------------------- 2.31s
      container-engine/nerdctl : Extract_file | Unpacking archive ------------------------------------------------------------------------ 1.93s
      kubernetes/kubeadm : Join to cluster if needed ------------------------------------------------------------------------------------- 1.76s

Kubespray remove virtual worker node

Now that the physical worker node is operational we can remove the virtual worker node from the cluster. It is currently running 11 pods so the node will need to be drained first.

  • Remove Node: Process to remove a single node
    • Power on the ansible server: dev-ansible and login as dev

    • Activate the virtual environment

      source venv/bin/activate
    • Change to the kubespray directory

      cd ~/kubespray-devcluster/kubespray/
    • Run the remove node playbook to remove the node from the cluster (cordon, drain, delete node, etc):

      ansible-playbook -i inventory/devcluster/inventory.ini remove-node.yml -b -u dev -K -e "node=dev-w-v1"
      • Press yes to continue
        PLAY RECAP ****************************************************************************************************************************************************
        dev-m-v1 : ok=23 changed=1 unreachable=0 failed=0 skipped=23 rescued=0 ignored=0
        dev-w-p1 : ok=23 changed=1 unreachable=0 failed=0 skipped=23 rescued=0 ignored=0
        dev-w-v1 : ok=73 changed=19 unreachable=0 failed=0 skipped=43 rescued=0 ignored=1
        localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

        Sunday 05 October 2025 06:01:45 +0000 (0:00:00.161) 0:00:58.494 ********
        ===============================================================================
        remove_node/pre_remove : Remove-node | Drain node except daemonsets resource -------------------------------------------------------------------------- 18.21s
        Confirm Execution ------------------------------------------------------------------------------------------------------------------------------------- 12.29s
        reset : Reset | delete some files and directories ------------------------------------------------------------------------------------------------------ 8.08s
        Gather information about installed services ------------------------------------------------------------------------------------------------------------ 2.39s
        system_packages : Manage packages ---------------------------------------------------------------------------------------------------------------------- 1.18s
        reset : Reset | remove containerd binary files --------------------------------------------------------------------------------------------------------- 1.03s
        reset : Reset | stop services -------------------------------------------------------------------------------------------------------------------------- 0.83s
        reset : Reset | remove services ------------------------------------------------------------------------------------------------------------------------ 0.82s
        Gather necessary facts (hardware) ---------------------------------------------------------------------------------------------------------------------- 0.78s
        reset : Reset | stop all cri pods ---------------------------------------------------------------------------------------------------------------------- 0.75s
        reset : Reset | stop containerd and etcd services ------------------------------------------------------------------------------------------------------ 0.72s
        reset : Gather active network services ----------------------------------------------------------------------------------------------------------------- 0.71s
        bootstrap_os : Assign inventory name to unconfigured hostnames (non-CoreOS, non-Flatcar, Suse and ClearLinux, non-Fedora) ------------------------------ 0.52s
        reset : Reset | force remove all cri pods -------------------------------------------------------------------------------------------------------------- 0.50s
        Gather necessary facts (network) ----------------------------------------------------------------------------------------------------------------------- 0.49s
        reset : Flush iptables --------------------------------------------------------------------------------------------------------------------------------- 0.46s
        reset : Reset | systemctl daemon-reload ---------------------------------------------------------------------------------------------------------------- 0.45s
        system_packages : Gather OS information ---------------------------------------------------------------------------------------------------------------- 0.43s
        bootstrap_os : Create remote_tmp for it is used by another module -------------------------------------------------------------------------------------- 0.38s
        reset : Set IPv4 iptables default policies to ACCEPT --------------------------------------------------------------------------------------------------- 0.38s
    • This step is optional if you plan to reuse the node, otherwise you can just shutdown and delete the VM, or redeploy using an ISO.

      • DANGEROUS action. Wipe the node you removed (cleans kubelet/containerd configs, iptables, CNI, etc). Important - when resetting just one node, always use both the limit and the variable
        ansible-playbook -i inventory/devcluster/inventory.ini reset.yml -b -u dev -K --limit dev-w-v1 -e reset_nodes=dev-w-v1 -e reset_confirmation=yes
  • Final cleanup
    • Update inventory: remove dev-w-v1 from inventory.ini
      [kube_control_plane]
      dev-m-v1 ansible_host=192.168.30.203 etcd_member_name=etcd1

      [etcd:children]
      kube_control_plane

      [kube_node]
      dev-w-p1 ansible_host=192.168.30.207
    • Remove ssh key relating to removed node
      ssh-keygen -f "/home/dev/.ssh/known_hosts" -R "dev-w-v1"
      # Host dev-w-v1 found: line 8
      /home/dev/.ssh/known_hosts updated.
      Original contents retained as /home/dev/.ssh/known_hosts.old
    • Remove from host file: sudo vi /etc/hosts
      192.168.30.200 dev dev.reids.net.au
      192.168.30.201 dev-lb-v1 dev-lb-v1.reids.net.au
      192.168.30.203 dev-m-v1 dev-m-v1.reids.net.au
      192.168.30.207 dev-w-p1 dev-w-p1.reids.net.au
    • Verify removal:
      kubectl get nodes
      kubectl get csr | grep -i dev-w-v1 || true
      NAME       STATUS   ROLES           AGE   VERSION
      dev-m-v1 Ready control-plane 13m v1.32.9
      dev-w-p1 Ready <none> 12m v1.32.9