Test Suite - KVM: Difference between revisions
m (tweak colums) |
(firewall) |
||
Line 247: | Line 247: | ||
sudo dnf install httpd | sudo dnf install httpd | ||
sudo mkdir /var/www/html/results/ | sudo mkdir /var/www/html/results/ | ||
sudo chown $(id -un) /var/www/html/results/ | sudo chown $(id -un) /var/www/html/results/ | ||
sudo chmod 755 /var/www/html/results/ | sudo chmod 755 /var/www/html/results/ | ||
sudo sh -c 'echo "AddType text/plain .diff" >/etc/httpd/conf.d/diff.conf' | sudo sh -c 'echo "AddType text/plain .diff" >/etc/httpd/conf.d/diff.conf' | ||
# until next reboot | |||
sudo firewall-cmd --add-service=http | |||
sudo systemctl start httpd | |||
# make it permenant | |||
sudo systemctl enable httpd # make it permenant | |||
# firewall rule goes here! | |||
and then $(WEB_SUMMARYDIR) used to specify that the web pages should be published under the server directory: | and then $(WEB_SUMMARYDIR) used to specify that the web pages should be published under the server directory: |
Revision as of 19:46, 6 September 2022
KVM Test framework
Libreswan's test framework can be run using KVM guests, and the kvm scripts. It is strongly recommended to run the test suite on a host machine that has a CPU wth virtualisation instructions.
To access files on the host file system:
- Linux guests (Fedora) use the PLAN9 filesystem (9p)
- BSD guests (FreeBSD, NetBSD, OpenBSD) use NFS via the NAT interface
For an overview of the tests see Test_Suite
Preparing the host machine
Enable virtualization in the BIOS
Virtualization needs to be enabled by the BIOS during boot.
Add yourself to sudo
Some of the test scrips need to be run as root. The test environment assumes this can be done using sudo without a password vis:
sudo pwd
XXX: Surely qemu can be driven without root?
This is setup by adding an entry under /etc/sudoers.d/ specifying that your account does not need a password to become root:
echo "$(id -u -n) ALL=(ALL) NOPASSWD: ALL" | sudo dd of=/etc/sudoers.d/$(id -u -n)
Fight SELinux
SELinux blocks some actions that we need. We have not created any SELinux rules to avoid this. The options are:
- set SELinux to permissive (recommended)
sudo sed --in-place=.ORIG -e 's/^SELINUX=.*/SELINUX=permissive/' /etc/selinux/config sudo setenforce Permissive
- disable SELinux
sudo sed --in-place=.ORIG -e 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config sudo reboot
- (experimental) label source tree for SELinux
The source tree on the host is shared with the virtual machines. SELinux considers this a bug unless the tree is labelled with type svirt_image_t.
sudo dnf install policycoreutils-python-utils sudo semanage fcontext -a -t svirt_image_t "$(pwd)"'(/.*)?' sudo restorecon -vR /home/build/libreswan
There may be other things that SELinux objects to.
Check that the host has enough entropy
As a rough guide run:
while true ; do cat /proc/sys/kernel/random/entropy_avail ; sleep 3 ; done
it should have values in the hundrets if not thousands. If it is in the units or tens then see Entropy matters
Install Dependencies
Why | Fedora | Mint (debian) |
---|---|---|
Basics | sudo dnf install -y make git gitk | sudo apt-get install -y make make-doc git gitk |
Python | sudo dnf install -y python3-pexpect | sudo apt-get install python3-pexpect |
Virtulization | sudo dnf install -y qemu virt-install libvirt-daemon-kvm libvirt-daemon-qemu | sudo apt install -y qemu virtinst libvirt-clients libvirt-daemon libvirt-daemon-system libvirt-daemon-driver-qemu libosinfo-query qemu-system-x86? |
Boot CDs | sudo dnf install -y dvd+rw-tools | sudo apt-get install -y dvd+rw-tools |
Web pages | sudo dnf install -y jq nodejs-typescript | sudo apt-get install -y jq node-typescript |
NFS | ??? | sudo apt-get install -y nfs-kernel-server rpcbind |
Enable libvirt
If you're switching from the old libvirtd see https://libvirt.org/daemons.html#switching-to-modular-daemons for how to shut down the old daemons.
Start the "collection of modular daemons that replace functionality previously provided by the monolithic libvirtd daemon":
for drv in qemu network nodedev nwfilter secret storage interface do sudo systemctl unmask virt${drv}d.service sudo systemctl unmask virt${drv}d{,-ro,-admin}.socket sudo systemctl enable virt${drv}d.service sudo systemctl enable virt${drv}d{,-ro,-admin}.socket done for drv in qemu network nodedev nwfilter secret storage do sudo systemctl start virt${drv}d{,-ro,-admin}.socket done
There should be no errors and warnings.
This code has bugs:
- work-around is to stop the daemons shutting down when idle:
- $ grep ARGS /etc/sysconfig/*virt*
- /etc/sysconfig/virtnetworkd:VIRTNETWORKD_ARGS=""
- /etc/sysconfig/virtqemud:VIRTQEMUD_ARGS=""
- /etc/sysconfig/virtstoraged:VIRTSTORAGED_ARGS=""
- libvirtd deadlocks (fixed)
- no work-around
- work-around is to run sudo systemctl restart virtqemud when things get slow
Add yourself to the KVM/QEMU group
You need to add yourself to the group that QEMU/KVM uses when writing to /var/lib/libvirt/qemu. On Fedora it is 'qemu', and on Debian it is 'kvm'. Something like:
sudo usermod -a -G $(stat --format %G /var/lib/libvirt/qemu) $(id -u -n)
After this you will will need to re-login (or run sudo su - $(id -u -n)
Make certain that root can access the build
The path to your build needs to be accessible (executable) by root, assuming things are under home:
chmod a+x $HOME
Fix /var/lib/libvirt/qemu
Because our VMs don't run as qemu, /var/lib/libvirt/qemu needs to be changed using chmod g+w to make it writable for the qemu group. This needs to be repeated if the libvirtd package is updated on the system |
sudo chmod g+w /var/lib/libvirt/qemu
Arguably we should run libvirt as a normal user instead.
Create /etc/modules-load.d/virtio.conf (obsolete since 2022 at least)
Several virtio modules need to be loaded into the host's kernel. This could be done by modprobe ahead of running any virtual machines but it is easier to install them whenever the host boots. This is arranged by listing the modules in a file within /etc/modules-load.d. The host must be rebooted for this to take effect.
sudo dd <<EOF of=/etc/modules-load.d/virtio.conf virtio_blk virtio-rng virtio_console virtio_net virtio_scsi virtio virtio_balloon virtio_input virtio_pci virtio_ring 9pnet_virtio EOF
As of Fedora 28, several of these modules are built into the kernel and will not show up in /proc/modules (virtio, virtio_rng, virtio_pci, virtio_ring).
Debian
On Debian slack based systems (i.e., Linux Mint 20.3), the default python is too old. Fortunately python 3.9 is also available vis:
sudo apt-get install python3.9 echo KVM_PYTHON=python3.9 >> Makefile.inc.local
BSD
Anyone?
Download and configure libreswan
Fetch Libreswan
The libreswan source tree includes all the components that are used on the host and inside the test VMs. To get the latest source code using git:
git clone https://github.com/libreswan/libreswan cd libreswan
Create the Pool directory for storing VM disk images - $(KVM_POOLDIR)
The pool directory is used used to store VM disk images and other configuration files. By default $(top_srcdir)/../pool is used (that is, adjacent to your source tree).
To change the location of the pool directory, set the KVM_POOLDIR make variable in Makefile.inc.local. For instance:
$ grep KVM_POOLDIR Makefile.inc.local KVM_POOLDIR=/home/libreswan/pool
(optional) Use /tmp/pool (tmpfs) to store test VM disk images - $(KVM_LOCALDIR)
By default, all disk mages are stored in $(KVM_POOLDIR) (see above). That is both the base VM disk image, and the build VM and test VM disk images. Since only the base VM image needs long-term storage, $(KVM_LOCALDIR) can be used to specify that the build and test images are stored in /tmp:
$ grep KVM_LOCALDIR Makefile.inc.local KVM_LOCALDIR=/tmp/pool
This has the advantage of eliminating physical disk I/O as a bottle neck when accessing VM disk images; but the disadvantage of needing to re-build the images after a reboot.
(optional) Run tests in parallel - $(KVM_PREFIXES)
By default only one test is run at a time. This can be changed using the $(KVM_PREFIXES) make variable. This provides a list of prefixes to be pretended to test domains creating multiple test groups. The default value is:
KVM_PREFIXES=
which creates the build domains fedora-build, netbsd-build, et.al., and the test domains east, west, et.al. (i.e., after expansion east, west, et.al.).
To run tests in parallel, specify multiple prefixes. For instance two tests can be run in parallel by specifying:
KVM_PREFIXES= 1.
This will create the build domains fedora-build, netbsd-build, et.al., and the test domains east, west, et.al., and separately 1.east, 1.west, et.al.
TODO: generate $(KVM_PREFIXES) from $(KVM_PREFIX) and $(KVM_WORKERS) so that the build domains are prefixed by $(KVM_PREFIX) and the test domains are prefixed by $(KVM_PREFIX), $(KVM_PREFIX)2, ..., $(KVM_PREFIX)$(KVM_WORKERS); only create the first test domains and then create the rest as runtime snapshots. |
(optional) Parallel builds - $(KVM_WORKERS)
By default, build domains only have on virtual CPU. Since building is very CPU intensive, this can be increased using $(KVM_WORKERS).
KVM_WORKERS=2
In the past, because many tests were racy (results were sensitive to CPU load) KVM_WORKERS was used throttle the number of domains been booted in parallel (it is very CPU intensive). That is no longer true. See notes under KVM_PREFIXES above. |
(optional) Generate a web page of the test results
See the nightly test results for an example.
To create the web directory RESULTS/ and populate it with the current test results use:
make web
The files can the be viewed using http://file.
To disable web page generation, delete the directory RESULTS/.
Alternatively, a web server can be installed and configured:
sudo dnf install httpd sudo mkdir /var/www/html/results/ sudo chown $(id -un) /var/www/html/results/ sudo chmod 755 /var/www/html/results/ sudo sh -c 'echo "AddType text/plain .diff" >/etc/httpd/conf.d/diff.conf'
# until next reboot sudo firewall-cmd --add-service=http sudo systemctl start httpd # make it permenant sudo systemctl enable httpd # make it permenant # firewall rule goes here!
and then $(WEB_SUMMARYDIR) used to specify that the web pages should be published under the server directory:
$ grep WEB_SUMMARYDIR Makefile.inc.local WEB_SUMMARYDIR=/var/www/html/results
If you want it to be the main page of the website, you can create the file /var/www/html/index.html containing:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <meta http-equiv="REFRESH" content="0;url=/results/"> </head> <BODY> </BODY> </HTML>
Running the testsuite
In the past, the testsuite was driven using make kvm-... commands. That's largely been replaced by the top-level wrapper script ./kvm which has several advantages over make:
- it is file (file) completion friendly
- it is shell script friendly
For the impatient: ./kvm install check
To build the VMs, and build and install (or update) libreswan, and then run the tests, use:
./kvm install check
Setting up ./kvm (tab completion)
If this:
complete -o filenames -C './kvm' ./kvm
is added to .bashrc then tab completion with ./kvm will include both commands and directories.
Running the testsuite
- ./kvm install
- update the KVMs ready for a new test run
- ./kvm check
- run the testsuite, previous results are saved in BACKUP/-date-
- ./kvm recheck
- run the testsuite, but skip tests that already passed
- ./kvm results
- list the results from the test run
- ./kvm diffs
- display differences between the test results and the expected results, exit non-zero if there are any
the operations can be combined on a single line:
./kvm install check recheck diff
and individual tests can be selected (see Running a Single Test, below):
./kvm install check diff testing/pluto/*ikev2*
To stop ./kvm use control-c.
Updating Certificates
The full testsuite requires a number of certificates. If not present, then ./kvm check will automatically generate them using the domain build. Just note that the certificates have a limited lifetime. Should the test system detects out-of-date certificates then ./kvm check will barf.
To rebuild the certificates:
- ./kvm keys
can be used to force the generation of new certificates.
Cleaning up (and general maintenance)
- ./kvm check-clean
- delete the test results
- ./kvm uninstall
- delete the KVM build and test domains (but don't touch the build tree or test results)
- ./kvm clean
- delete the test results, the KVM build and test domains, the build tree, and the certificates
- ./kvm purge
- also delete the test networks (is purge still useful?)
- ./kvm demolish
- also delete the KVM base domain that was used to create the other domains
- ./kvm upgrade
- delete all KVM build and test domains, and then upgrade and transmogrify the base domain ready for a fresh install
- ./kvm transmogrify
- run a fresh transmogrify on the base domain (the base domain is reverted to before the last transmogrify)
- ./kvm downgrade
- revert the base domain back to before it was upgraded (useful when debugging upgrade and transmogrify)
Shell and Console Access (Logging In)
There are several different ways to gain shell access to the domains.
Each method, depending on the situation, has both advantages and disadvantages. For instance:
- while make kvmsh-host provide quick access to the console, it doesn't support file copy
- while SSH takes more to set up, it supports things like proper terminal configuration and file copy
Serial Console access using ./kvm sh HOST (kvmsh.py)
./kvm sh HOST is a wrapper around "virsh" that automatically handles things like booting the machine, logging in, and correctly configuring the terminal. It's big advantage is that it always works. For instance:
$ ./testing/utils/kvmsh.py east [...] Escape character is ^] [root@east ~]# printenv TERM xterm [root@east ~]# stty -a ...; rows 52; columns 185; ... [root@east ~]#
The script "kvmsh.py" can also be used directly to invoke commands on a guest (this is how ./kvm install works):
$ ./testing/utils/kvmsh.py east ls [root@east ~]# ls anaconda-ks.cfg
When $(KVM_PREFIXES) contains multiple prefixes, ./kvm sh east always logs into the first prefixe's domain.
Limitations:
- no file transfer but files can be accessed via /pool and /testing
Graphical Console access using virt-manager
"virt-manager", a gnome tool can be used to access individual domains.
While easy to use, it doesn't support cut/paste or mechanisms for copying files.
Shell access using SSH
While requiring more effort to set up, it provides full shell access to the domains.
Since you will be using ssh a lot to login to these machines, it is recommended to either put their names in /etc/hosts:
# /etc/hosts entries for libreswan test suite 192.1.2.45 west 192.1.2.23 east 192.0.3.254 north 192.1.3.209 road 192.1.2.254 nic
or add entries to .ssh/config such as:
Host west Hostname 192.1.2.45
If you wish to be able to ssh into all the VMs created without using a password, add your ssh public key to testing/baseconfigs/all/etc/ssh/authorized_keys. This file is installed as /root/.ssh/authorized_keys on all VMs
Using ssh becomes easier if you are running ssh-agent (you probably are) and your public key is known to the virtual machine. This command, run on the host, installs your public key on the root account of the guest machines west. This assumes that west is up (it might not be, but you can put this off until you actually need ssh, at which time the machine would need to be up anyway). Remember that the root password on each guest machine is "swan".
ssh-copy-id root@west
You can use ssh-copy for any VM. Unfortunately, the key is forgotten when the VM is restarted.
Limitations:
- this only works with the default east, et.al. (it does not work with KVM_PREFIXES and/or multiple test directories)
kvm workflows
(seeing as everyone has a "flow", why not kvm) here are some common workflows, the following commands are used:
- ./kvm modified
- list the test directories that have been modified
- ./kvm baseline
- compare test results against a baseline
- ./kvm patch
- update the expected test results
- ./kvm add
- git add the modified test results
- ./kvm status
- show the status of the currently running testsuite
- ./kvm kill
- kill the currently running testsuite
Running a single test
There are two ways to run an individual test:
- the test to run can be specified on the command line:
- kvm check testing/pluto/basic-pluto-01
- the test is implied when running kvm from a test directory:
- cd testing/pluto/basic-pluto-01
- ../../../kvm
But there's a catch! The behaviour is different to a normal test run.
When there are multiple tests, as each test finishes:
- pluto is stopped (via post-mortem.sh)
- the domain is shutdown.
This is so that bugs in the shutdown code can be flushed out.
However, when there's only one test these steps are skipped:
- pluto is left running (post-mortem.sh is not run)
- the domain is not shutdown
This is so that it is possible to login and look around after the test finishes (but it also means that bugs in shutdown code can be missed).
To override this behaviour, add:
KVMRUNNER_FLAGS += --run-post-mortem
to Makefile.inc.local.
Working on individual tests
The modified command can be used to limit the test run to just tests with modified files (according to git):
- ./kvm modified install check diff
- install libreswan and then run the testsuite against just the modified tests, display differences differences
- ./kvm modified recheck diff
- re-run the modified tests that are failing, display differences
- ./kvm modified patch add
- update the modified tests applying the latest output and add them to git
this workflow comes into its own, when updating tests en-mass using sed, for instance:
sed -i -e 's/PARENT_//' testing/pluto/*/*.console.txt ./kvm modified check
Checking for regressions
Start by setting up a base directory. Give the KVMs unique bN prefixes (only "b1" is needed, but we're in a hurry so add "b2 b3 b4", 4 boot workers, and /tmp/pool for KVM disk images) and kick off a test run:
$ git clone https://github.com/libreswan/libreswan base $ cd base base$ # base - use bN as the prefix base$ echo KVM_PREFIXES=b1 b2 b3 b4 >> base/Makefile.inc.local base$ echo KVM_WORKERS=4 >> base/Makefile.inc.local base$ echo KVM_LOCALDIR=/tmp/pool >> base/Makefile.inc.local base$ mkdir -p ../pool base$ nohup ./kvm install check & base$ tail -f nohup.out
Next, set up a working directory. This time the KVMs are given the unique wN prefix, and point KVM_BASELINE back at base:
$ git clone https://github.com/libreswan/libreswan work $ cd work work$ # work - use wN as the prefix work$ echo KVM_PREFIXES=w1 w2 w3 w4 >> work/Makefile.inc.local work$ echo KVM_WORKERS=4 >> work/Makefile.inc.local work$ echo KVM_BASELINE=../base >> work/Makefile.inc.local work$ echo KVM_LOCALDIR=/tmp/pool >> work/Makefile.inc.local work$ mkdir -p ../pool
work then then progress in the work directory, and when ready the test run started (here in the background):
work$ ed programs/pluto/plutomain.c /static bool selftest_only = false/ s/false/true/ w q work$ gmake && nohup ./kvm install check &
as the tests progress, the results can be monitored:
work$ ./kvm baseline results testing/pluto/basic-pluto-01 failed east:baseline-passed,output-different west:baseline-passed,output-different ... work$ ./kvm baseline diffs testing/pluto/basic-pluto-01 +whack: Pluto is not running (no "/run/pluto/pluto.ctl")
and then the test run aborted, and the problem fixed and tested, and the test run restarted:
work$ ./kvm kill work$ git checkout -- programs/pluto/plutomain.c work$ ./kvm install check diff testing/pluto/basic-pluto-01 work$ nohup recheck &
The output can be fine tuned using baseline-failed (show differences when the baseline failed, ignoring passed and unresolved) baseline-passed (show differences when the baseline passed, ignoring failed and unresolved).
To override the KVM_BASELINE make variable, use --baseline DIRECTORY
Tracking down regressions (using git bisect)
Lets assume that the test basic-pluto-01, which was working, is now failing.
The easy way
This workflow works best when the regression is recent (i.e., the last few commits) and nothing significant has happened in the meantime (for instance, os upgrade, test rename, ...).
The command ./kvm install check diff exits with a git bisect friendly status codes which means it can be combined with git bisect run to automate regression testing.
For instance:
git bisect start main ^<suspect-commit> git bisect run ./kvm install check diff testing/pluto/basic-pluto-01 git bisect visualize # finally git bisect reset
The hard way
This workflow works best when trying to track down a regression in an older version of libreswan.
Two repositories are used:
- repo-under-test
- this contains the sources that will be built and installed into the test domains; it is what git bisect will manipulate
- testbench
- this contains the test scripts used to test repo-under-test
Start by checking out the two repositories (existing repositories can also be used, carefully):
git clone ... /home/repo-under-test git clone ... /home/testbench
Next, add the following to /home/testbench/Makefile.inc.local so that the /source directory used by testbench is pointing at repo-under-test:
# repo-under-test's sources are built KVM_SOURCEDIR=/home/repo-under-test # testbench's testing directory is used #KVM_TESTINGDIR=/home/testbench/testing
Next, (re-)transmogrify the testbench so that, within the domains, /source points at repo-under-test:
cd /home/testbench ./kvm transmogrify
Finally run the tests:
cd /home/testbench git -C /home/repo-under-test bisect start main ^<suspect-commit> ./kvm install check diff testing/pluto/basic-pluto-01 # based on output, pick one: git bisect {good,bad} # might work: git -C /home/repo-under-test bisect run -c 'cd ../testbench && ./kvm install check diff testing/pluto/basic-pluto-01 git -C /home/repo-under-test bisect visualize # finally git bisect reset
KVM_TESTINGDIR can also be pointed at repo-under-test.
Controlling a test run remotely
Start the testsuite in the background:
nohup ./kvm install check &
To determine if the testsuite is still running:
./kvm status
and to stop the running testsuite:
./kvm kill
Debugging inside the VM (pluto on east)
Terminal 1 - east: log into east, start pluto, and attach gdb
./kvm sh east east# cd /testing/pluto/basic-pluto-01 east# sh -x ./eastinit.sh east# gdb /usr/local/libexec/ipsec/pluto $(pidof pluto) (gdb) c
If pluto isn't running then gdb will complain with: --p requires an argument
Terminal 2 - west: log into west, start pluto and the test
./kvm sh west west# sh -x ./westinit.sh ; sh -x westrun.sh
When pluto crashes, gdb will show that and await commands. For example, the bt command will show a backtrace.
TODO:
- stop watchdog eventually killing pluto
- notes for west
Installing a custom Fedora kernel
Assuming the kernel RPMs are in the directory $(KVM_POOLDIR)/kernel-ipsec/ say, add the following to Makefile.inc.local:
KVM_FEDORA_KERNEL_RPMDIR = /pool/kernel-ipsec/ KVM_FEDORA_KERNEL_ARCH = x86_64 KVM_FEDORA_KERNEL_VERSION = -5.18.7-100.aiven_ipsec.fc35.$(KVM_FEDORA_KERNEL_ARCH).rpm
and then run:
./kvm upgrade-fedora
(should, like for NetBSD do this during transmogrify?)
Installing a custom NetBSD kernel
Copy the kernel to:
$(KVM_POOLDIR)/$(KVM_PREFIX)netbsd-kernel
and then run:
./kvm transmogrify-netbsd