Understanding systemd at startup on Linux

Exploring Linux startup with systemd

More on sysadmins

Before you can observe the startup sequence, you need to do a couple of things to make the boot and startup sequences open and visible. Normally, most distributions use a startup animation or splash screen to hide the detailed messages that would otherwise be displayed during a Linux host’s startup and shutdown. This is called the Plymouth boot screen on Red Hat-based distros. Those hidden messages can provide a great deal of information about startup and shutdown to a sysadmin looking for information to troubleshoot a bug or to just learn about the startup sequence. You can change this using the GRUB (Grand Unified Boot Loader) configuration.

The main GRUB configuration file is /boot/grub2/grub.cfg, but, because this file can be overwritten when the kernel version is updated, you do not want to change it. Instead, modify the /etc/default/grub file, which is used to modify the default settings of grub.cfg.

Start by looking at the current, unmodified version of the /etc/default/grub file:

[root@testvm1 ~]# cd /etc/default ; cat grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="resume=/dev/mapper/fedora_testvm1-swap rd.lvm.
lv=fedora_testvm1/root rd.lvm.lv=fedora_testvm1/swap rd.lvm.lv=fedora_
testvm1/usr rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
[root@testvm1 default]#

Chapter 6 of the GRUB documentation contains a list of all the possible entries in the /etc/default/grub file, but I focus on the following:

  • I change GRUB_TIMEOUT, the number of seconds for the GRUB menu countdown, from five to 10 to give a bit more time to respond to the GRUB menu before the countdown hits zero.
  • I delete the last two parameters on GRUB_CMDLINE_LINUX, which lists the command-line parameters that are passed to the kernel at boot time. One of these parameters, rhgb stands for Red Hat Graphical Boot, and it displays the little Fedora icon animation during the kernel initialization instead of showing boot-time messages. The other, the quiet parameter, prevents displaying the startup messages that document the progress of the startup and any errors that occur. I delete both rhgb and quiet because sysadmins need to see these messages. If something goes wrong during boot, the messages displayed on the screen can point to the cause of the problem.

After you make these changes, your GRUB file will look like:

[root@testvm1 default]# cat grub
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="resume=/dev/mapper/fedora_testvm1-swap rd.lvm.
lv=fedora_testvm1/root rd.lvm.lv=fedora_testvm1/swap rd.lvm.lv=fedora_
testvm1/usr"
GRUB_DISABLE_RECOVERY="false"
[root@testvm1 default]#

The grub2-mkconfig program generates the grub.cfg configuration file using the contents of the /etc/default/grub file to modify some of the default GRUB settings. The grub2-mkconfig program sends its output to STDOUT. It has a -o option that allows you to specify a file to send the datastream to, but it is just as easy to use redirection. Run the following command to update the /boot/grub2/grub.cfg configuration file:

[root@testvm1 grub2]# grub2-mkconfig > /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.18.9-200.fc28.x86_64
Found initrd image: /boot/initramfs-4.18.9-200.fc28.x86_64.img
Found linux image: /boot/vmlinuz-4.17.14-202.fc28.x86_64
Found initrd image: /boot/initramfs-4.17.14-202.fc28.x86_64.img
Found linux image: /boot/vmlinuz-4.16.3-301.fc28.x86_64
Found initrd image: /boot/initramfs-4.16.3-301.fc28.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-7f12524278bd40e9b10a085bc82dc504
Found initrd image: /boot/initramfs-0-rescue-7f12524278bd40e9b10a085bc82dc504.img
done
[root@testvm1 grub2]#

Reboot your test system to view the startup messages that would otherwise be hidden behind the Plymouth boot animation. But what if you need to view the startup messages and have not disabled the Plymouth boot animation? Or you have, but the messages stream by too fast to read? (Which they do.)

There are a couple of options, and both involve log files and systemd journals—which are your friends. You can use the less command to view the contents of the /var/log/messages file. This file contains boot and startup messages as well as messages generated by the operating system during normal operation. You can also use the journalctl command without any options to view the systemd journal, which contains essentially the same information:

[root@testvm1 grub2]# journalctl
-- Logs begin at Sat 2020-01-11 21:48:08 EST, end at Fri 2020-04-03 08:54:30 EDT. --
Jan 11 21:48:08 f31vm.both.org kernel: Linux version 5.3.7-301.fc31.x86_64 ([email protected]) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Mon Oct >
Jan 11 21:48:08 f31vm.both.org kernel: Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.3.7-301.fc31.x86_64 root=/dev/mapper/VG01-root ro resume=/dev/mapper/VG01-swap rd.lvm.lv=VG01/root rd>
Jan 11 21:48:08 f31vm.both.org kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Jan 11 21:48:08 f31vm.both.org kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Jan 11 21:48:08 f31vm.both.org kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Jan 11 21:48:08 f31vm.both.org kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Jan 11 21:48:08 f31vm.both.org kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Jan 11 21:48:08 f31vm.both.org kernel: BIOS-provided physical RAM map:
Jan 11 21:48:08 f31vm.both.org kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
Jan 11 21:48:08 f31vm.both.org kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
Jan 11 21:48:08 f31vm.both.org kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
Jan 11 21:48:08 f31vm.both.org kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000dffeffff] usable
Jan 11 21:48:08 f31vm.both.org kernel: BIOS-e820: [mem 0x00000000dfff0000-0x00000000dfffffff] ACPI data
Jan 11 21:48:08 f31vm.both.org kernel: BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
Jan 11 21:48:08 f31vm.both.org kernel: BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
Jan 11 21:48:08 f31vm.both.org kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
Jan 11 21:48:08 f31vm.both.org kernel: BIOS-e820: [mem 0x0000000100000000-0x000000041fffffff] usable
Jan 11 21:48:08 f31vm.both.org kernel: NX (Execute Disable) protection: active
Jan 11 21:48:08 f31vm.both.org kernel: SMBIOS 2.5 present.
Jan 11 21:48:08 f31vm.both.org kernel: DMI: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Jan 11 21:48:08 f31vm.both.org kernel: Hypervisor detected: KVM
Jan 11 21:48:08 f31vm.both.org kernel: kvm-clock: Using msrs 4b564d01 and 4b564d00
Jan 11 21:48:08 f31vm.both.org kernel: kvm-clock: cpu 0, msr 30ae01001, primary cpu clock
Jan 11 21:48:08 f31vm.both.org kernel: kvm-clock: using sched offset of 8250734066 cycles
Jan 11 21:48:08 f31vm.both.org kernel: clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
Jan 11 21:48:08 f31vm.both.org kernel: tsc: Detected 2807.992 MHz processor
Jan 11 21:48:08 f31vm.both.org kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Jan 11 21:48:08 f31vm.both.org kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
<snip>

I truncated this datastream because it can be hundreds of thousands or even millions of lines long. (The journal listing on my primary workstation is 1,188,482 lines long.) Be sure to try this on your test system. If it has been running for some time—even if it has been rebooted many times—huge amounts of data will be displayed. Explore this journal data because it contains a lot of information that can be very useful when doing problem determination. Knowing what this data looks like for a normal boot and startup can help you locate problems when they occur.

I will discuss systemd journals, the journalctl command, and how to sort through all of that data to find what you want in more detail in a future article in this series.

After GRUB loads the kernel into memory, it must first extract itself from the compressed version of the file before it can perform any useful work. After the kernel has extracted itself and started running, it loads systemd and turns control over to it.

This is the end of the boot process. At this point, the Linux kernel and systemd are running but unable to perform any productive tasks for the end user because nothing else is running, there’s no shell to provide a command line, no background processes to manage the network or other communication links, and nothing that enables the computer to perform any productive function.

Systemd can now load the functional units required to bring the system up to a selected target run state.

Targets

A systemd target represents a Linux system’s current or desired run state. Much like SystemV start scripts, targets define the services that must be present for the system to run and be active in that state. Figure 1 shows the possible run-state targets of a Linux system using systemd. As seen in the first article of this series and in the systemd bootup man page (man bootup), there are other intermediate targets that are required to enable various necessary services. These can include swap.target, timers.target, local-fs.target, and more. Some targets (like basic.target) are used as checkpoints to ensure that all the required services are up and running before moving on to the next-higher level target.

Unless otherwise changed at boot time in the GRUB menu, systemd always starts the default.target. The default.target file is a symbolic link to the true target file. For a desktop workstation, this is typically going to be the graphical.target, which is equivalent to runlevel 5 in SystemV. For a server, the default is more likely to be the multi-user.target, which is like runlevel 3 in SystemV. The emergency.target file is similar to single-user mode. Targets and services are systemd units.

The following table, which I included in the previous article in this series, compares the systemd targets with the old SystemV startup runlevels. The systemd target aliases are provided by systemd for backward compatibility. The target aliases allow scripts—and sysadmins—to use SystemV commands like init 3 to change runlevels. Of course, the SystemV commands are forwarded to systemd for interpretation and execution.

systemd targetsSystemV runleveltarget aliasesDescription
default.target  This target is always aliased with a symbolic link to either multi-user.target or graphical.target. systemd always uses the default.target to start the system. The default.target should never be aliased to halt.target, poweroff.target, or reboot.target.
graphical.target5runlevel5.targetMulti-user.target with a GUI
 4runlevel4.targetUnused. Runlevel 4 was identical to runlevel 3 in the SystemV world. This target could be created and customized to start local services without changing the default multi-user.target.
multi-user.target3runlevel3.targetAll services running, but command-line interface (CLI) only
 2runlevel2.targetMulti-user, without NFS, but all other non-GUI services running
rescue.target1runlevel1.targetA basic system, including mounting the filesystems with only the most basic services running and a rescue shell on the main console
emergency.targetS Single-user mode—no services are running; filesystems are not mounted. This is the most basic level of operation with only an emergency shell running on the main console for the user to interact with the system.
halt.target  Halts the system without powering it down
reboot.target6runlevel6.targetReboot
poweroff.target0runlevel0.targetHalts the system and turns the power off

Fig. 1: Comparison of SystemV runlevels with systemd targets and target aliases.

Each target has a set of dependencies described in its configuration file. systemd starts the required dependencies, which are the services required to run the Linux host at a specific level of functionality. When all of the dependencies listed in the target configuration files are loaded and running, the system is running at that target level. If you want, you can review the systemd startup sequence and runtime targets in the first article in this series, Learning to love systemd.

Exploring the current target

Many Linux distributions default to installing a GUI desktop interface so that the installed systems can be used as workstations. I always install from a Fedora Live boot USB drive with an Xfce or LXDE desktop. Even when I’m installing a server or other infrastructure type of host (such as the ones I use for routers and firewalls), I use one of these installations that installs a GUI desktop.

I could install a server without a desktop (and that would be typical for data centers), but that does not meet my needs. It is not that I need the GUI desktop itself, but the LXDE installation includes many of the other tools I use that are not in a default server installation. This means less work for me after the initial installation.

But just because I have a GUI desktop does not mean it makes sense to use it. I have a 16-port KVM that I can use to access the KVM interfaces of most of my Linux systems, but the vast majority of my interaction with them is via a remote SSH connection from my primary workstation. This way is more secure and uses fewer system resources to run multi-user.target compared to graphical.target.

To begin, check the default target to verify that it is the graphical.target:

[root@testvm1 ~]# systemctl get-default
graphical.target
[root@testvm1 ~]#

Now verify the currently running target. It should be the same as the default target. You can still use the old method, which displays the old SystemV runlevels. Note that the previous runlevel is on the left; it is N (which means None), indicating that the runlevel has not changed since the host was booted. The number 5 indicates the current target, as defined in the old SystemV terminology:

[root@testvm1 ~]# runlevel
N 5
[root@testvm1 ~]#

Note that the runlevel man page indicates that runlevels are obsolete and provides a conversion table.

You can also use the systemd method. There is no one-line answer here, but it does provide the answer in systemd terms:

[root@testvm1 ~]# systemctl list-units --type target
UNIT                   LOAD   ACTIVE SUB    DESCRIPTION                
basic.target           loaded active active Basic System               
cryptsetup.target      loaded active active Local Encrypted Volumes    
getty.target           loaded active active Login Prompts              
graphical.target       loaded active active Graphical Interface        
local-fs-pre.target    loaded active active Local File Systems (Pre)   
local-fs.target        loaded active active Local File Systems         
multi-user.target      loaded active active Multi-User System          
network-online.target  loaded active active Network is Online          
network.target         loaded active active Network                    
nfs-client.target      loaded active active NFS client services        
nss-user-lookup.target loaded active active User and Group Name Lookups
paths.target           loaded active active Paths                      
remote-fs-pre.target   loaded active active Remote File Systems (Pre)  
remote-fs.target       loaded active active Remote File Systems        
rpc_pipefs.target      loaded active active rpc_pipefs.target          
slices.target          loaded active active Slices                     
sockets.target         loaded active active Sockets                    
sshd-keygen.target     loaded active active sshd-keygen.target         
swap.target            loaded active active Swap                       
sysinit.target         loaded active active System Initialization      
timers.target          loaded active active Timers                     

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

21 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

This shows all of the currently loaded and active targets. You can also see the graphical.target and the multi-user.target. The multi-user.target is required before the graphical.target can be loaded. In this example, the graphical.target is active.

Switching to a different target

Making the switch to the multi-user.target is easy:

[root@testvm1 ~]# systemctl isolate multi-user.target

The display should now change from the GUI desktop or login screen to a virtual console. Log in and list the currently active systemd units to verify that graphical.target is no longer running:

[root@testvm1 ~]# systemctl list-units --type target

Be sure to use the runlevel command to verify that it shows both previous and current “runlevels”:

[root@testvm1 ~]# runlevel
5 3

Changing the default target

Now, change the default target to the multi-user.target so that it will always boot into the multi-user.target for a console command-line interface rather than a GUI desktop interface. As the root user on your test host, change to the directory where the systemd configuration is maintained and do a quick listing:

[root@testvm1 ~]# cd /etc/systemd/system/ ; ll
drwxr-xr-x. 2 root root 4096 Apr 25  2018  basic.target.wants
<snip>
lrwxrwxrwx. 1 root root   36 Aug 13 16:23  default.target -> /lib/systemd/system/graphical.target
lrwxrwxrwx. 1 root root   39 Apr 25  2018  display-manager.service -> /usr/lib/systemd/system/lightdm.service
drwxr-xr-x. 2 root root 4096 Apr 25  2018  getty.target.wants
drwxr-xr-x. 2 root root 4096 Aug 18 10:16  graphical.target.wants
drwxr-xr-x. 2 root root 4096 Apr 25  2018  local-fs.target.wants
drwxr-xr-x. 2 root root 4096 Oct 30 16:54  multi-user.target.wants
<snip>
[root@testvm1 system]#

I shortened this listing to highlight a few important things that will help explain how systemd manages the boot process. You should be able to see the entire list of directories and links on your virtual machine.

The default.target entry is a symbolic link (symlink, soft link) to the directory /lib/systemd/system/graphical.target. List that directory to see what else is there:

[root@testvm1 system]# ll /lib/systemd/system/ | less

You should see files, directories, and more links in this listing, but look specifically for multi-user.target and graphical.target. Now display the contents of default.target, which is a link to /lib/systemd/system/graphical.target:

[root@testvm1 system]# cat default.target 
#  SPDX-License-Identifier: LGPL-2.1+
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Graphical Interface
Documentation=man:systemd.special(7)
Requires=multi-user.target
Wants=display-manager.service
Conflicts=rescue.service rescue.target
After=multi-user.target rescue.service rescue.target display-manager.service
AllowIsolate=yes
[root@testvm1 system]#

This link to the graphical.target file describes all of the prerequisites and requirements that the graphical user interface requires. I will explore at least some of these options in the next article in this series.

To enable the host to boot to multi-user mode, you need to delete the existing link and create a new one that points to the correct target. Make the PWD /etc/systemd/system, if it is not already:

[root@testvm1 system]# rm -f default.target 
[root@testvm1 system]# ln -s /lib/systemd/system/multi-user.target default.target

List the default.target link to verify that it links to the correct file:

[root@testvm1 system]# ll default.target 
lrwxrwxrwx 1 root root 37 Nov 28 16:08 default.target -> /lib/systemd/system/multi-user.target
[root@testvm1 system]#

If your link does not look exactly like this, delete it and try again. List the content of the default.target link:

[root@testvm1 system]# cat default.target 
#  SPDX-License-Identifier: LGPL-2.1+
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Multi-User System
Documentation=man:systemd.special(7)
Requires=basic.target
Conflicts=rescue.service rescue.target
After=basic.target rescue.service rescue.target
AllowIsolate=yes
[root@testvm1 system]#

The default.target—which is really a link to the multi-user.target at this point—now has different requirements in the [Unit] section. It does not require the graphical display manager.

Reboot. Your virtual machine should boot to the console login for virtual console 1, which is identified on the display as tty1. Now that you know how to change the default target, change it back to the graphical.target using a command designed for the purpose.

First, check the current default target:

[root@testvm1 ~]# systemctl get-default 
multi-user.target
[root@testvm1 ~]# systemctl set-default graphical.target
Removed /etc/systemd/system/default.target.
Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/graphical.target.
[root@testvm1 ~]#

Enter the following command to go directly to the graphical.target and the display manager login page without having to reboot:

[root@testvm1 system]# systemctl isolate default.target

I do not know why the term “isolate” was chosen for this sub-command by systemd’s developers. My research indicates that it may refer to running the specified target but “isolating” and terminating all other targets that are not required to support the target. However, the effect is to switch targets from one run target to another—in this case, from the multi-user target to the graphical target. The command above is equivalent to the old init 5 command in SystemV start scripts and the init program.

Log into the GUI desktop, and verify that it is working as it should.

Summing up

This article explored the Linux systemd startup sequence and started to explore two important systemd tools, systemctl and journalctl. It also explained how to switch from one target to another and to change the default target.

The next article in this series will create a new systemd unit and configure it to run during startup. It will also look at some of the configuration options that help determine where in the sequence a particular unit will start, for example, after networking is up and running.

How to Install Kubernetes Cluster on Ubuntu 22.04 with ZFS

Are you looking for an easy guide on how to install Kubernetes Cluster on Ubuntu 22.04 (Jammy Jellyfish)?

The step-by-step guide on this page will show you how to install Kubernetes cluster on Ubuntu 22.04 using Kubeadm command step by step.

Kubernetes is a free and open-source container orchestration tool, it also known as k8s. With the help of Kubernetes, we can achieve automated deployment, scaling and management of containerized application.

A Kubernetes cluster consists of worker nodes on which application workload is deployed and a set up master nodes which are used to manage worker nodes and pods in the cluster.

In this guide, we are using one master node and two worker nodes. Following are system requirements on each node,

  • Minimal install Ubuntu 22.04
  • Minimum 2GB RAM or more
  • Minimum 2 CPU cores / or 2 vCPU
  • 20 GB free disk space on /var or more
  • Sudo user with admin rights
  • Internet connectivity on each node

Lab Setup

  • Master Node:  192.168.1.173 – k8smaster.example.net
  • First Worker Node:  192.168.1.174 – k8sworker1.example.net
  • Second Worker Node:  192.168.1.175 – k8sworker2.example.net

Without any delay, let’s jump into the installation steps of Kubernetes cluster

Step 1) Set hostname and add entries in the hosts file

Login to to master node and set hostname using hostnamectl command,

sudo hostnamectl set-hostname "k8smaster.example.net"

On the worker nodes, run

sudo hostnamectl set-hostname "k8sworker1.example.net"
sudo hostnamectl set-hostname "k8sworker2.example.net"

Add the following entries in /etc/hosts file on each node

192.168.1.173   k8smaster.example.net k8smaster
192.168.1.174   k8sworker1.example.net k8sworker1
192.168.1.175   k8sworker2.example.net k8sworker2

Step 2) Disable swap & add kernel settings

Execute beneath swapoff and sed command to disable swap. Make sure to run the following commands on all the nodes.

sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

Load the following kernel modules on all the nodes,

sudo tee /etc/modules-load.d/containerd.conf <<EOF
br_netfilter
EOF
sudo modprobe br_netfilter

Set the following Kernel parameters for Kubernetes, run beneath tee command

sudo tee /etc/sysctl.d/kubernetes.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF 

Reload the above changes, run

sudo sysctl --system

Step 3) Install containerd run time

In this guide, we are using containerd run time for our Kubernetes cluster. So, to install containerd, first install its dependencies.

sudo apt install -y curl gnupg2 software-properties-common apt-transport-https ca-certificates

Enable docker repository

sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/docker.gpg
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

Now, run following apt command to install containerd

sudo apt update
sudo apt install -y containerd.io

Configure containerd so that it starts using systemd as cgroup.

containerd config default | sudo tee /etc/containerd/config.toml >/dev/null 2>&1
sudo sed -i 's/SystemdCgroup \= false/SystemdCgroup \= true/g' /etc/containerd/config.toml
sudo sed -i 's/snapshotter \= "overlayfs"/snapshotter \= "zfs"/g' /etc/containerd/config.toml

You will now need to create a zpool to use as the snapshotter for containerd. If you create this in the default path everything should work with the config created above, but you might need to set the path for the zfs snapshotter if you want a different path.

sudo zfs create -o mountpoint=/var/lib/containerd/io.containerd.snapshotter.v1.zfs <your zfs pool>/containerd

Restart and enable containerd service

sudo systemctl restart containerd
sudo systemctl enable containerd

Step 4) Add apt repository for Kubernetes

Execute following commands to add apt repository for Kubernetes

sudo curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/google.gpg
sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"

Note: At time of writing this guide, Xenial is the latest Kubernetes repository but when repository is available for Ubuntu 22.04 (Jammy Jellyfish) then you need replace xenial word with ‘jammy’ in ‘apt-add-repository’ command.

Step 5) Install Kubernetes components Kubectl, kubeadm & kubelet

Install Kubernetes components like kubectl, kubelet and Kubeadm utility on all the nodes. Run following set of commands,

sudo apt update
sudo apt install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

Step 6) Initialize Kubernetes cluster with Kubeadm command

Now, we are all set to initialize Kubernetes cluster. Run the following Kubeadm command from the master node only.

sudo kubeadm init --control-plane-endpoint=k8smaster.example.net

Output of above command should end with something like the following,

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.0.0.42:6443 --token vt4ua6.23wer232423134 \
        --discovery-token-ca-cert-hash sha256:3a2c36feedd14cff3ae835abcdefgesadf235adca0369534e938ccb307ba5

As the output above confirms that control-plane has been initialize successfully. In output also we are getting set of commands for interacting the cluster and also the command for worker node to join the cluster.

So, to start interacting with cluster, run following commands from the master node,

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Now, try to run following kubectl commands to view cluster and node status

kubectl cluster-info
kubectl get nodes

Output,

user@server:~ $ kubectl cluster-info
Kubernetes control plane is running at https://10.0.0.42:6443
CoreDNS is running at https://10.0.0.42:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
user@server:~ $ kubectl get nodes
NAME         STATUS   ROLES           AGE    VERSION
k8smaster   Ready    control-plane   153m   v1.26.1

If you only want to have one node you can run the following to allow scheduling on the master

kubectl taint node k8smaster node-role.kubernetes.io/master:NoSchedule-
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl taint nodes --all  node-role.kubernetes.io/control-plane-

Join both the worker nodes to the cluster, command is already there is output, just copy paste on the worker nodes,

sudo kubeadm join k8smaster.example.net:6443 --token vt4ua6.23wer232423134 \
   --discovery-token-ca-cert-hash sha256:3a2c36feedd14cff3ae835abcdefgesadf235adca0369534e938ccb307ba5

Output from both the worker nodes,

Check the nodes status from master node using kubectl command,

kubectl get nodes
Node-Status-K8s-Before-CNI

As we can see nodes status is ‘NotReady’, so to make it active. We must install CNI (Container Network Interface) or network add-on plugins like Calico, Flannel and Weave-net.

Step 6) Install Calico Pod Network Add-on

Run following curl and kubectl command to install Calico network plugin from the master node,

curl https://projectcalico.docs.tigera.io/manifests/calico.yaml -O
kubectl apply -f calico.yaml

Output of above commands would look like below,

Install-Calico-Network-Add-on-k8s

Verify the status of pods in kube-system namespace,

kubectl get pods -n kube-system

Output,

Kube-System-Pods-after-calico-installation

Perfect, check the nodes status as well.

kubectl get nodes
Nodes-Status-after-Calico-Network-Add-on

Great, above confirms that nodes are active node. Now, we can say that our Kubernetes cluster is functional.

Step 7) Test Kubernetes Installation

To test Kubernetes installation, let’s try to deploy nginx based application and try to access it.

kubectl create deployment nginx-app --image=nginx --replicas=2

Check the status of nginx-app deployment

kubectl get deployment nginx-app
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
nginx-app   2/2     2            2           68s

Expose the deployment as NodePort,

kubectl expose deployment nginx-app --type=NodePort --port=80
service/nginx-app exposed

Run following commands to view service status

kubectl get svc nginx-app
kubectl describe svc nginx-app

Output of above commands,

Deployment-Service-Status-k8s

Use following command to access nginx based application,

curl http://<woker-node-ip-addres>:31246
curl http://192.168.1.174:31246

Output,

Curl-Command-Access-Nginx-Kubernetes

Great, above output confirms that nginx based application is accessible.

That’s all from this guide, I hope you have found this guide useful. Most of this post comes from https://www.linuxtechi.com/install-kubernetes-on-ubuntu-22-04/ with modifications to work with ZFS.

Resolving Oracle Cloud “Out of Capacity” issue and getting free VPS with 4 ARM cores / 24GB of memory (using OCI CLI)

Very neat and useful configuration was recently announced at Oracle Cloud Infrastructure (OCI) blog as a part of Always Free tier. Unfortunately, as of July 2021, it’s still very complicated to launch an instance due to the “Out of Capacity” error. Here we’re solving that issue as Oracle constantly adds capacity from time to time.

Each tenancy gets the first 3,000 OCPU hours and 18,000 GB hours per month for free to create Ampere A1 Compute instances using the VM.Standard.A1.Flex shape (equivalent to 4 OCPUs and 24 GB of memory).

Starting from Oracle Cloud Infrastructure (OCI) CLI installation.

Quickstart

The installer script automatically installs the CLI and its dependencies, Python and virtualenv. Before running the…

docs.oracle.com

On a Mac Computer you can also install the OCI cli with Brew.

brew install oci-cli jq

Generating API key

After logging in to OCI Console, click profile icon and then “User Settings”

Go to Resources -> API keys, click “Add API Key” button

Add API Key

Make sure “Generate API Key Pair” radio button is selected, click “Download Private Key” and then “Add”.

Download Private Key

Copy the contents from textarea and save it to file with a name “config”. I put it together with *.pem file in newly created directory $HOME/.oci

That’s all about the API key generation part.

Setting up CLI

Specify config location

OCI_CLI_RC_FILE=$HOME/.oci/config

If you haven’t added OCI CLI binary to your PATH, run

alias oci="$HOME/bin/oci"

(or whatever path it was installed).

Set permissions for the private key

oci setup repair-file-permissions --file $HOME/.oci/oracleidentitycloudservice***.pem

Test the authentication (user value should be taken from textarea when generating API key):

oci iam user get --user-id ocid1.user.oc1..aaaaaaaaa***123

Output should be similar to:

{
  "data": {
    "capabilities": {
      "can-use-api-keys": true,
      "can-use-auth-tokens": true,
      "can-use-console-password": false,
      "can-use-customer-secret-keys": true,
      "can-use-db-credentials": true,
      "can-use-o-auth2-client-credentials": true,
      "can-use-smtp-credentials": true
    },
    "compartment-id": "ocid1.tenancy.oc1..aaaaaaaa***123",
    "db-user-name": null,
    "defined-tags": {
      "Oracle-Tags": {
        "CreatedBy": "scim-service",
        "CreatedOn": "2021-08-31T21:03:23.374Z"
      }
    },
    "description": "[email protected]",
    "email": null,
    "email-verified": true,
    "external-identifier": "123456789qwertyuiopas",
    "freeform-tags": {},
    "id": "ocid1.user.oc1..aaaaaaaaa***123",
    "identity-provider-id": "ocid1.saml2idp.oc1..aaaaaaaae***123",
    "inactive-status": null,
    "is-mfa-activated": false,
    "last-successful-login-time": null,
    "lifecycle-state": "ACTIVE",
    "name": "oracleidentitycloudservice/[email protected]",
    "previous-successful-login-time": null,
    "time-created": "2021-08-31T21:03:23.403000+00:00"
  },
  "etag": "121345678abcdefghijklmnop"
}

Acquiring launch instance params

We need to know which Availability Domain is always free. Click Oracle Cloud menu -> Compute -> Instances

Instances

Click “Create Instance” and notice which one has “Always Free Eligible” label in Placement Section. In our case it’s AD-2.

Almost every command needs compartment-id param to be set. Let’s save it to COMPARTMENT var (replace with your “tenancy” value from the config file) then save the following under ~/bin/launch-instance:

#!/bin/bash -x
SSH_PUB_KEY_FILE="$HOME/.ssh/id_rsa.pub"
SSH_KEY=$(cat $SSH_PUB_KEY_FILE)
OCI_CLI="/opt/homebrew/bin/oci"
JQ="/opt/homebrew/bin/jq"
FLAG_FILE="$HOME/.oci/success"
if [[ -f "$FLAG_FILE" ]]; then
  echo "Already deployed!"
  exit 0
fi
# ARM
SHAPE=VM.Standard.A1.Flex

COMPARTMENT=ocid1.tenancy.oc1..aaaaaaaa**123

# Setup the oci profile using the following command:
#   oci session authenticate --region us-ashburn-1
HC="TEST"
PROFILE=DEFAULT
DISPLAY_NAME="$USER-$( date -I )-$RANDOM"
mkdir -p $HOME/.oci/hosts/
INSTANCE_INFO_FILE="$HOME/.oci/hosts/$DISPLAY_NAME"
AUTH_PARAMS="--profile $PROFILE"
#AUTH_PARAMS="--profile $PROFILE --auth security_token"

AD=$($OCI_CLI iam availability-domain $AUTH_PARAMS list --compartment-id $COMPARTMENT | $JQ -r ".data| .[0].name")
if [[ $? != 0 ]]; then
   echo "Could not determine AD.  You might need to reauthenticate"
   echo "oci session authenticate --region us-ashburn-1 $AUTH_PARAMS"
   exit 1
fi

SUBNET=$($OCI_CLI network subnet $AUTH_PARAMS list --compartment-id $COMPARTMENT | $JQ -r ".data| .[0].id")
if [[ $? != 0 ]]; then
   echo "Could not determine Subnet"
   exit 1
fi

IMAGE=$($OCI_CLI compute image $AUTH_PARAMS list --compartment-id=$COMPARTMENT --shape=$SHAPE | $JQ -r '[ .data[] | select(."operating-system" == "Oracle Linux") | select(."operating-system-version"|startswith("8"))] | .[0].id')
if [[ $? != 0 ]]; then
   echo "Could not determine Image"
   exit 1
fi
# export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-bundle.crt
OCI_INFO=$($OCI_CLI $AUTH_PARAMS compute instance launch --shape $SHAPE \
   --availability-domain $AD \
   --compartment-id $COMPARTMENT \
   --image-id $IMAGE \
   --display-name $DISPLAY_NAME \
   --metadata "{ \"hostclass\": \"$HC\" }" \
   --subnet-id $SUBNET --shape-config "{ \"memoryInGBs\": 24.0, \"ocpus\": 4.0 }" \
   --ssh-authorized-keys-file $SSH_PUB_KEY_FILE \
)
if [[ $? != 0 ]]; then
   echo "Failed to deploy"
   exit 1
fi
echo $OCI_INFO > $INSTANCE_INFO_FILE
INSTANCE_ID=$($JQ -r '.data.id' < $INSTANCE_INFO_FILE)
if [[ -z $INSTANCE_ID ]]; then
   echo "Faild to read instance info from the file"
   exit 1
fi
while [[ -z "$INSTANCE_IP" ]]; do
  echo "Waiting 10s for the ip to be availible"
  sleep 10s
  INSTANCE_IP=$($OCI_CLI $AUTH_PARAMS compute instance list-vnics --instance-id $INSTANCE_ID | $JQ -r '.data[]."public-ip"')
done
if [[ ! -z $INSTANCE_IP ]]; then
echo "Updating the SSH config to include $DISPLAY_NAME"
cat >> ~/.ssh/config.d/custom < $FLAG_FILE

You can now setup crontab to run this script e.g. every minute by, saving this to $HOME/bin/launch-instance as a script file and making sure cron user is able to access private key. Some of the variables in the script might need to be updated to match your system. We won’t cover this part.

Output:

{
  "data": {
    "agent-config": {
      "are-all-plugins-disabled": false,
      "is-management-disabled": false,
      "is-monitoring-disabled": false,
      "plugins-config": null
    },
    "availability-config": {
      "is-live-migration-preferred": null,
      "recovery-action": "RESTORE_INSTANCE"
    },
    "availability-domain": "RCFH:US-ASHBURN-AD-1",
    "capacity-reservation-id": null,
    "compartment-id": "ocid1.compartment.oc1..aaaaaaaa***123",
    "dedicated-vm-host-id": null,
    "defined-tags": {},
    "display-name": "user-2023-01-30-22601",
    "extended-metadata": {},
    "fault-domain": "FAULT-DOMAIN-3",
    "freeform-tags": {},
    "id": "ocid1.instance.oc1.iad.adsfasfasdfasdfasdfasdf12323dfsag234",
    "image-id": "ocid1.image.oc1.iad.aaaaaaaa**123",
    "instance-options": {
      "are-legacy-imds-endpoints-disabled": false
    },
    "ipxe-script": null,
    "launch-mode": "PARAVIRTUALIZED",
    "launch-options": {
      "boot-volume-type": "PARAVIRTUALIZED",
      "firmware": "UEFI_64",
      "is-consistent-volume-naming-enabled": true,
      "is-pv-encryption-in-transit-enabled": false,
      "network-type": "PARAVIRTUALIZED",
      "remote-data-volume-type": "PARAVIRTUALIZED"
    },
    "lifecycle-state": "PROVISIONING",
    "metadata": {
      "hostclass": "Your-Hostclass",
      "ssh_authorized_keys": "ssh-rsa AAAAB3123432412343241234324123432412343241234324123432412343241234324123432412343241234324123432412343241234324123432412343241234324123432412343241234324/71ctthb1Ek= your-ssh-key"
    },
    "platform-config": null,
    "preemptible-instance-config": null,
    "region": "iad",
    "shape": "VM.Standard.A1.Flex",
    "shape-config": {
      "baseline-ocpu-utilization": null,
      "gpu-description": null,
      "gpus": 0,
      "local-disk-description": null,
      "local-disks": 0,
      "local-disks-total-size-in-gbs": null,
      "max-vnic-attachments": 2,
      "memory-in-gbs": 6.0,
      "networking-bandwidth-in-gbps": 1.0,
      "ocpus": 1.0,
      "processor-description": "3.0 GHz Ampere® Altra™"
    },
    "source-details": {
      "boot-volume-size-in-gbs": null,
      "boot-volume-vpus-per-gb": null,
      "image-id": "ocid1.image.oc1.iad.aaaaaaaas***123",
      "kms-key-id": null,
      "source-type": "image"
    },
    "system-tags": {},
    "time-created": "2023-01-30T19:13:44.584000+00:00",
    "time-maintenance-reboot-due": null
  },
  "etag": "123456789123456789123456789123456789123456789",
  "opc-work-request-id": "ocid1.coreservicesworkrequest.oc1.iad.abcd***123"
}

I believe it’s pretty safe to leave the cron running and check cloud console once per few days. Because when you’ll succeed, usually you won’t be able to create more instances than allowed — but start getting something like

{
    "code": "LimitExceeded",
    "message": "The following service limits were exceeded: standard-a1-memory-count, standard-a1-core-count. Request a service limit increase from the service limits page in the console. "
}

or (again)

{
    "code": "InternalError",
    "message": "Out of host capacity."
}

At least that’s how it worked for me. Just in case the script writes a file when it successfully deploys an instance. If the file is in place the script will not run again.

To verify the instance you can run the following.

oci compute instance list --compartment-id $C

You could also add something to check it’s output periodically to know when cron needs to be disabled. That’s not related to our issue here.

Assigning public IP address

We are not doing this during the command run due to the default limitation (2 ephemeral addresses per compartment). That’s how you can achieve this. When you’ll succeed with creating an instance, open OCI Console, go to Instance Details -> Resources -> Attached VNICs by selecting it’s name

VNICs

Then Resources -> IPv4 Addresses -> … -> Edit

IPv4 Addresses

Choose ephemeral and click “Update”

Edit IP address

Conclusion

That’s how you will login when instance will be created (notice opc default username)

ssh -i ~/.ssh/id_rsa [email protected]

If you didn’t assign public IP, you can still copy internal FQDN or private IP (10.x.x.x) from the instance details page and connect from your other instance in the same VNIC. e.g.

ssh -i ~/.ssh/id_rsa [email protected]

Thanks for reading!

Resize a disk in Linux

The following example shows the volumes on a Nitro-based instance:

[system-user ~]$ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT 
sda       8:0    0   30G  0 disk
├─sda1    8:1    0  9.9G  0 part /
├─sda14   8:14   0    4M  0 part
└─sda15   8:15   0  106M  0 part /boot/efi
  • The root volume, /dev/sda, has a partition, /dev/sda1. While the size of the root volume reflects the new size, 30 GB, the size of the partition reflects the original size, 30 GB, and must be extended before you can extend the file system.

To extend the partition on the root volume, use the following growpart command. Notice that there is a space between the device name and the partition number.

[system-user ~]$ sudo growpart /dev/sda 1

You can verify that the partition reflects the increased volume size by using the lsblk command again.

[system-user ~]$ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda       8:0    0   30G  0 disk 
├─sda1    8:1    0 29.9G  0 part /
├─sda14   8:14   0    4M  0 part 
└─sda15   8:15   0  106M  0 part /boot/efi

Extending the File System

Use a file system-specific command to resize each file system to the new volume capacity. For a file system other than the examples shown here, refer to the documentation for the file system for instructions.

Example: Extend an ext2, ext3, or ext4 file system

Use the df -h command to verify the size of the file system for each volume. In this example, /dev/sda1 reflects the original size of the volume, 10 GB.

[system-user ~]$ df -h
/dev/root       9.6G  3.9G  5.7G  41% /
...

Use the resize2fs command to extend the file system on each volume.

[system-user ~]$ sudo resize2fs /dev/sda1

You can verify that each file system reflects the increased volume size by using the df -h command again.

[system-user ~]$ df -h
/dev/root        29G  3.8G   26G  14% /
...

Copyright © 2018 tpmullan.com. All right reserved