Test Suite: Difference between revisions

Revision as of 21:29, 5 July 2016

Libreswan comes with an extensive test suite, written mostly in python, that uses KVM virtual machines and virtual networks. It has replaced the old UML test suite. Apart from KVM, the test suite uses libvirtd and qemu. It is strongly recommended to run the test suite natively on the OS (not in a VM itself) on a machine that has a CPU wth virtualization instructions. The PLAN9 filesystem (9p) is used to mount host directories in the guests - NFS is avoided to prevent network lockups when an IPsec test case would cripple the guest's networking.

Test Frameworks

This page describes Libreswan's old test framework. There are two more experimental front-ends available (alphabetically).

Docker - see Test Suite - Docker in this Wiki

Instead of using virtual machines, this uses Docker instances.

More information is found in Test Suite - Docker in this Wiki

make kvm - see mk/README.md in the source tree

This is a rewrite of the two components of the existing test framework. Namely:

the scripts install.sh/uninstall.sh, used to create and delete virtual machines, are replaced by make rules

the script swantest, used to run tests on the virtual machines, is replaced by kvmrunner and make rules

however, this way also has known limitations:

it doesn't generate pretty HTML pages (instead the tool ./testing/utils/kvmresults.py provides ways to compare results)
it doesn't use tcpdump to capture network traffic; a todo item

Where applicable the KVM alternative is included in the below.

More information is found in mk/README.md in the source tree

Preparing the host machine

In the following it is assumed that your account is called "build".

Add Yourself to sudo

The test scripts rely on being able to use sudo without a password to gain root access. This is done by creating a no-pasword rule to /etc/sudoers.d/.

XXX: Surely qemu can be driven without root?

To set this up, either add your account to the wheel group and permit wheel to have no-password access:

$ su
# echo '%wheel	ALL=(ALL)	NOPASSWD: ALL' > /etc/sudoers.d/swantest
# chmod go= /etc/sudoers.d/swantest
# chown root.root /etc/sudoers.d/swantest
# usermod -a -G wheel build

or only enable no-password access for your account:

$ su
# echo 'build	ALL=(ALL)	NOPASSWD: ALL' > /etc/sudoers.d/swantest
# chmod go= /etc/sudoers.d/swantest
# chown root.root /etc/sudoers.d/swantest

Disable SELinux

SELinux blocks some actions that we need. We have not created any SELinux rules to avoid this.

Either set it to permissive:

sudo sed --in-place=.ORIG -e 's/^SELINUX=.*/SELINUX=permissive/' /etc/selinux/config
sudo setenforce Permissive

Or disabled:

sudo sed --in-place=.ORIG -e 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
sudo setenforce Permissive

Install Required Dependencies

Now we are ready to install the various components of libvirtd, qemu and kvm and then start the libvirtd service.

Even virt-manager isn't strictly required.

On Fedora 24:

$ sudo dnf -y install virt-manager virt-install \
python3-pexpect \
python3-setproctitle diffstat

On Debian?

Install Utilities (Optional)

Various tools are used or convenient to have when running tests:

Packages to install on Fedora

sudo dnf -y install git tcpdump expect python-setproctitle python-ujson pyOpenSSL

Packages to install on Ubuntu

apt-get install python-pexpect git tcpdump  expect python-setproctitle python-ujson \
        python3-pexpect python3-setproctitle

Setting Users and Groups

You need to add yourself to the qemu group vis:

$ grep build /etc/group
qemu:x:107:cagney

This can be done with the command:

$ usermod -a -G qemu build

The following may be out-of-date, and is kept for reference:

Nothing apart from the system services requires root access. However, it does require that the user you are using is allowed to run various commands as root via sudo. Additionally, libvirt assumes the VMs are running under the qemu uid, but because we want to share files using the 9p filesystem between host and guests, we want the VMs to run under our own uid. The easiest solution to accomplish all of these is to add your user (for example the username "build") to the kvm, qemu and wheel groups. These are the changed lines from /etc/groups:

wheel:x:10:root,build
kvm:x:36:root,qemu,build
qemu:x:107:root,qemu,build

Commands to effect this:

sudo usermod -a -G wheel,kvm,qemu root
sudo usermod -a -G wheel,kvm,qemu build
sudo usermod -a -G kvm,qemu qemu

You will need to re-login for this to take effect.

Fix /var/lib/libvirt/qemu

sudo chmod g+w /var/lib/libvirt/qemu

ensure that the host has enough entropy

Entropy matters

With KVM, a guest systems uses entropy from the host through the kernel module "virtio_rng" in the guest's kernel. We normally configure that. http://wiki.qemu-project.org/Features-Done/VirtIORNG

tcpdump permissions on the Host (optional)

XXX: Only swantest uses this, and having swantest use sudo would be better.

getent group tcpdump || sudo groupadd tcpdump
#add build to group tcpdump
sudo usermod --append -G tcpdump build 
ls -lt /sbin/tcpdump
sudo chown root:tcpdump /sbin/tcpdump
sudo setcap "CAP_NET_RAW+eip" /sbin/tcpdump

# check tcpdump group users
getent group tcpdump
tcpdump:x:72:build

#when the installation is complete the following should work
tcpdump -i swan12

The libreswan source tree includes all the components that are used on the host and inside the test VMs. To get the latest source code using git:

git clone https://github.com/libreswan/libreswan
cd libreswan

Create the Pool directory

XXX: Default the pool directory to /home/$USER/pool?

The pool directory is used used to store KVM disk images and other configuration files. Its location is determined by the KVM_POOLDIR make variable. The variable needs to be defined and set to an existing directory.

Todo this, create a Makefile.inc.local file in the top of your source tree and define KVM_POOLDIR. For instance:

$ grep KVM_POOLDIR Makefile.inc.local
KVM_POOLDIR=/home/libreswan/pool

For the Impatient: Run The Testsuite Using "make kvm-install kvm-test"

If you're impatient, and want to use "make kvm" then do the following:

Install (or update) libreswan (if needed this will create the test domains):

make kvm-install

run the testsuite:

make kvm-test

from there:

re-run any tests that failed: make kvm-check
delete the build directory: make kvm-clean
delete the test results: make kvm-test-clean
delete the test machines (ready for the next run): make kvm-uninstall
run a subset of tests: make kvm-test KVM_TESTS=testing/pluto/basic-pluto-*
re-analyze the test results: ./testing/utiks/kvmresults testing/pluto
compare the results with a baseline: ./testing/utils/kvmresults testing/pluto ../baseline/testing/pluto
log into the domain east: make kvmsh-east
display a crib sheet: make kvm-help

Creating the Test Domains (and Destroying them)

Creating the Test Domains

First create the base domain and its network:

make install-kvm-base-network install-kvm-base-domain

and then, using that, create the test domains and their networks:

make install-kvm-test-networks install-kvm-test-domains

Details (somewhat out-of-date):

This file contains various environment variables used for creating and running the tests. In the example version, the KVMPREFIX= is set to the home directory of the user "build". The POOLSPACE= is where all the VM images will be stored. There should be at least 16GB of free disk space in the pool/ directory. You can change the OSTYPE= if you prefer to use ubuntu guests over fedora guests. We recommend that the host and guest run the same OS - it makes things like running gdb on the host for core dumps created in the guests much easier. The OSMEDIA= can be changed to point to a local distribution mirror.

First, a new VM is added to the system called "fedorabase" (or "ubuntubase"). This is an automated minimal install using kickstart. In the "post install" phase of the anaconda installer, this VM runs a "yum update" to ensure we have the latest versions of all packages. In that %post phase we also install various packages that we need to run the tests. This can result in the installer spending a very long time in the "post install" phase. During this time, the VM displays no progress bar. Just be patient.

Once the VM is fully installed, the disk image is converted to QCOW and copied for each test VM, west, east, north, road and nic. A few virtual networks are created to hook up the VMs in isolation. These virtual networks have names like "192_1_2_0" and use bridge interfaces names like "swan12". Finally, the actual VMs are added to the system's libvirt/KVM system.

Deleting the Test Domains

The test domains and base domain are removed separately.

First the test domains:

make uninstall-kvm-test-domains uninstall-kvm-test-networks

then the base domain:

make uninstall-kvm-base-domain uninstall-kvm-base-network

Installing Libreswan on the VMs

Either:

make kvm-install

Or:

make UPDATEONLY=1 check

Running the testsuite

Generating Certificates

The full testsuite requires a number of certificates. The virtual domains are configured for this purpose. Just use:

make kvm-keys

alternatively, the certificates can be generated on the local machine:

cd testing/x509
./dist_certs.py

(In order to run dist_certs.py, your pyOpenSSL version needs to support creating SHA1 CRLs. A patch for this can be found at https://github.com/pyca/pyopenssl/pull/161 )

Run the testsuite

To run all test cases (which include compiling and installing it on all vms, and non-VM based test cases), run:

make check UPDATE=1

or:

make kvm-install kvm-test

Stopping pluto tests (gracefully)

If you used "make kvm-test", type control-C.

If you used "make check", try:

The tests run for a long time. For example, on one of our machines they currently take 10 hours. If you want to stop a test run between individual pluto tests, you can create a file to indicate this:

touch testing/pluto/stop-tests-now

Be sure to remove the file afterwards.

Shell and Console Access (Logging In)

There are several different ways to gain shell access to the domains.

Each method, depending on the situation, has both advantages and disadvantages. For instance:

while virt-manager and virsh provide quick access to the console, their functionality is limited
while SSH takes more to set up but then supports things like proper terminal configuration and file copy

Graphical Console access using virt-manager

"virt-manager", a gnome tool can be used to access individual domains.

While easy to use, and having no , it doesn't support cut/paste or mechanisms for copying files.

Serial Console access using "virsh"

This provides text access to the domain's serial console. Things like cut/paste work but not much else. You need to remember to boot a domain before connecting. For instance:

$ sudo virsh console east

Serial Console access using "kvmsh.py" / "make kvmsh-HOST"

"kvmsh", is a wrapper around "virsh", automatically handles things like booting the machine and logging in.

$ ./testing/utils/kvmsh.py east
Escape character is ^]
[root@east ~]#

"kvmsh.py" can also be used to script remote commands (for instance, it is used to run "make" on the build domain):

$ ./testing/utils/kvmsh.py east ls
[root@east ~]# ls
anaconda-ks.cfg

Finally, "make kvmsh-HOST" provides a short cut for the above; and if your using multiple build trees (see further down), it will connect to the DOMAIN that corresponds to HOST. For instance, notice how the domain "a.east" is passed to kvmsh.py in the below:

$ make kvmsh-east
/home/libreswan/pools/testing/utils/kvmsh.py --output ++compile-log.txt --chdir . a.east
Escape character is ^]
[root@east source]#

Limitations:

no file transfer
the terminal settings are messed up (must be a way to fix this, perhaps "tset"?)

Run an individual test (or tests)

All the test cases involving VMs are located in the libreswan directory under testing/pluto/ . The most basic test case is called basic-pluto-01. Each test case consists of a few files:

description.txt to explain what this test case actually tests
ipsec.conf files - for host west is called west.conf. This can also include configuration files for strongswan or racoon2 for interop testig
ipsec.secret files - if non-default configurations are used. also uses the host syntax, eg west.secrets, east.secrets.
An init.sh file for each VM that needs to start (eg westinit.sh, eastinit.sh, etc)
One run.sh file for the host that is the initiator (eg westrun.sh)
Known good (sanitized) output for each VM (eg west.console.txt, east.console.txt)
testparams.sh if there are any non-default test parameters

You can run this test case by issuing the following command on the host:

Either:

cd testing/pluto/basic-pluto-01/
../../utils/swantest

or:

make kvm-test KVM_TESTS=testing/pluto/basic-pluto-01/

multiple tests can be selected with:

make kvm-test KVM_TESTS=testing/pluto/basic-pluto-*

Once the testrun has completed, you will see an OUTPUT/ directory in the test case directory:

$ ls OUTPUT/
east.console.diff  east.console.verbose.txt  RESULT       west.console.txt          west.pluto.log
east.console.txt   east.pluto.log            swan12.pcap  west.console.diff  west.console.verbose.txt

RESULT is a text file (whose format is sure to change in the next few months) stating whether the test succeeded or failed.
The diff files show the differences between this testrun and the last known good output.
Each VM's serial (sanitized) console log (eg west.console.txt)
Each VM's unsanitized verbose console output (eg west.console.verbose.txt)
A network capture from the bridge device (eg swan12.pcap)
Each VM's pluto log, created with plutodebug=all (eg west.pluto.log)
Any core dumps generated if a pluto daemon crashed

Shell access using SSH

While requiring slightly more effort to set up, it provides full shell access to the domains.

Since you will be using ssh a lot to login to these machines, it is recommended to either put their names in /etc/hosts:

# /etc/hosts entries for libreswan test suite
192.1.2.45 west
192.1.2.23 east
192.0.3.254 north
192.1.3.209 road
192.1.2.254 nic

or add entries to .ssh/config such as:

Host west
        Hostname 192.1.2.45

If you wish to be able to ssh into all the VMs created without using a password, add your ssh public key to testing/baseconfigs/all/etc/ssh/authorized_keys. This file is installed as /root/.ssh/authorized_keys on all VMs

Using ssh becomes easier if you are running ssh-agent (you probably are) and your public key is known to the virtual machine. This command, run on the host, installs your public key on the root account of the guest machines west. This assumes that west is up (it might not be, but you can put this off until you actually need ssh, at which time the machine would need to be up anyway). Remember that the root password on each guest machine is "swan".

ssh-copy-id root@west

You can use ssh-copy for any VM. Unfortunately, the key is forgotten when the VM is restarted.

Logging in using make / kvmsh.py / virsh

If you only need to take a quick look around then "make kvmsh-<name>" is easier. However, if you're planning on doing more significant work, and require a proper terminal (TERMCAP), then the effort of setting up SSH becomes worthwhile.

$ make kvmsh-east
Escape character is ^]
[root@east source]#

The underlying kvmsh.py script can also be used to run individual shell commands. For instance, to run "ls" on "east" in the current directory (".") use:

$ ./testing/utils/kvmsh.py --chdir . east ls -l

Diagnosing inside the VM

Once a test run has completed, the VMs shut down the ipsec subsystem. You can use ssh to login as root on any host (password "swan") and rerun the testcase manually. This gives you a chance to repeat a crasher while using gdb. You need three terminals to do this.

Terminal 1: prepare west

ssh root@west
cd /testing/pluto/basic-pluto-01
sh ./westinit.sh

Terminal 2: prepare east

ssh root@east
cd /testing/pluto/basic-pluto-01
sh ./eastinit.sh

terminal 3: gdb

This assumes that initialization worked and pluto hasn't yet crashed. Pick the side you wish to gdb, ssh in, and start gdb

ssh root@eastORwest
gdb -p `pidof pluto`
gdb> cont

If pluto wasn't running, gdb would complain: --p requires an argument

When pluto crashes, gdb will show that and await commands. For example, the bt command will show a backtrace.

terminal 1: start the test


sh ./westrun.sh

/root/.gdbinit

If you want to get rid of the warning "warning: File "/testing/pluto/ikev2-dpd-01/.gdbinit" auto-loading has been declined by your `auto-load safe-path'"

echo "set auto-load safe-path /" >> /root/.gdbinit

Diagnosing inside the VM (alternative version)

Once a testrun has completed, the VMs shut down the ipsec subsystem. You can use ssh to login as root on any host (password "swan") and rerun the testcase manually. This gives you a chance to repeat a crasher while using gdb:

ssh root@east
ipsec setup start
pidof pluto
cd /source/OBJ*
gdb programs/pluto/pluto
gdb> attach <pid>
gdb> cont

In another window, prepare west:

ssh root@west
cd /testing/pluto/basic-pluto-01
sh ./westinit.sh

In still another window, you can login to east and re-trigger the failure. You can either use the root command history using the arrow keys to start ipsec and load the right connection, or you can re-run the "eastinit.sh" file:

ssh root@east
cd /testing/pluto/basic-pluto-01
sh ./eastinit.sh

In the west window, you can either continue with running "westrun.sh" or you can look at westrun.sh and issue the commands manually.

Updating the VMs

Sometimes you want to update a VM's system or add a package to assist with debugging. This requires an internet connection. While the VMs are completely isolated, the "nic" VM can be configured to give internet access to the machines:

ssh root@nic
ifup eth3
iptables -I POSTROUTING -t nat -o eth3 -j MASQUERADE
route add default gw 192.168.234.1  # may be needed
exit

On the other VMs, change the nameserver entry in /etc/resolv.conf to point to a valid resolver (eg 8.8.8.8 or 193.110.157.123) and the VM will have full internet connectivity.

The /testing/guestbin directory

The guestbin directory contains scripts that are used within the VMs only.

swan-transmogrify

When the VMs were installed, an XML configuration file from testing/libvirt/vm/ was used to configure each VM with the right disks, mounts and nic cards. Each VM mounts the libreswan directory as /source and the libreswan/testing/ directory as /testing . This makes the /testing/guestbin/ directory available on the VMs. At boot, the VMs run /testing/guestbin/swan-transmogrify. This python script compares the nic of eth0 with the list of known MAC addresses from the XML files. By identifying the MAC, it knows which identity (west, east, etc) it should take on. Files are copied from /testing/baseconfigs/ into the VM's /etc directory and the network service is restarted.

swan-build, swans-install, swan-update

These commands are used to build, install or build+install (update) the libreswan userland and kernel code

swan-prep

This command is run as the first command of each test case to setup the host. It copies the required files from /testing/baseconfigs/ and the specific test case files onto the VM test machine. It does not start libreswan. That is done in the "init.sh" script.

The swan-prep command takes two options. The --x509 option is required to copy in all the required certificates and update the NSS database. The --46 /--6 option is used to give the host IPv4 and/or IPv6 connectivity. Hosts per default only get IPv4 connectivity as this reduces the noise captured with tcpdump

fipson and fipsoff

These are used to fake a kernel into FIPS mode, which is required for some of the tests.

Various notes

Currently, only one test can run at a time.
You can peek at the guests using virt-manager or you can ssh into the test machines from the host.
ssh may be slow to prompt for the password. If so, start up the vm "nic"
On VMs use only one CPU core. Multiple CPUs may cause pexpect to mangle output.
2014 Mar: DHR needed to do the following to make things work each time he rebooted the host

 $ sudo setenforce Permissive
 $ ls -ld /var/lib/libvirt/qemu
drwxr-x---. 6 qemu qemu 4096 Mar 14 01:23 /var/lib/libvirt/qemu
 $ sudo chmod g+w /var/lib/libvirt/qemu
 $ ( cd testing/libvirt/net ; for i in * ; do sudo virsh net-start $i ; done ; )

to make the SELinux enforcement change persist across host reboots, edit /etc/selinux/config
to remove "169.254.0.0/16 dev eth0 scope link metric 1002" from "ipsec status output"

 echo 'NOZEROCONF=1' >> /etc/sysconfig/network

Need Strongswan 5.3.2 or later

The baseline Strongswan needed for our interop tests is 5.3.2. This isn't part of Fedora or RHEL/CentOS at this time (2015 September).

Ask Paul for a pointer to the required RPM files.

Strongswan has dependency libtspi.so.1

 
sudo yum install trousers
sudo rpm -ev  strongswan
sudo rpm -ev strongswan-libipsec
sudo rpm -i strongswan-5.2.0-4.fc20.x86_64.rpm

To update to a newer verson, place the rpm in the source tree on the host machine. This avoids needing to connect the guests to the internet. Then start up all the machines, wait until they are booted, and update the Strongswan package on each machine. (DHR doesn't know which machines actually need a Strongswan.)

for vm in west east north road ; do sudo virsh start $vm; done
# wait for booting to finish
for vm in west east north road ; do ssh root@$vm 'rpm -Uv /source/strongswan-5.3.2-1.0.lsw.fc21.x86_64.rpm' ; done

To improve

install and remove RPM using swantest + make rpm support
add summarizing script that generate html/json to git repo
cordump. It has been a mystery :) systemd or some daemon appears to block coredump on the Fedora 20 systems.
when running multiple tests from TESTLIST shutdown the hosts before copying OUTPUT dir. This way we get leak detect inf. However, for single test runs do not shut down.

IPv6 tests

IPv6 test cases seems to work better when IPv6 is disabled on the KVM bridge interfaces the VMs use. The bridges are swanXX and their config files are /etc/libvirt/qemu/networks/192_0_1.xml . Remove the following line from it. Reboot/restart libvirt.

libvirt/qemu/networks/192_0_1.xml 

<ip family="ipv6" address="2001:db8:0:1::253" prefix="64"/>

and ifconfig swan01 should have no IPv6 address, no fe:80 or any v6 address. Then the v6 testcases should work.

please give me feedback if this hack work for you. I shall try to add more info about this.

Sanitizers

summarize output from tcpdump
count established IKE, ESP , AH states (there is count at the end of "ipsec status " that is not accurate. It counts instantiated connection as loaded.

dpd ping sanitizer. DPD tests have unpredictable packet loss for ping.

view results over http

THIS DOES NOT WORK without CSS, Javascript, and Python scripts that are not yet distributed.

Setup httpd (Apache web server):

sudo systemctl enable httpd
sudo systemctl start httpd
sudo ln -s /home/build/results /var/www/html/
sudo sh -c 'echo "AddType text/plain .diff" >/etc/httpd/conf.d/diff.conf'

To view the results, use http://localhost/results.

how to get graphs like http://blueswan.phenome.nl/results/

You need a bit of javascript magic. If I get time Antony will attach tarball for that.

Speeding up "make kvm-test" by running things in parallel

Internally kvmrunner.py has two work queues:

a pool of reboot threads used to reboot domains

- the default is 1 thread limiting things to one reboot at a time - KVM_WORKERS specifies the number of reboot threads, and hence, the boot parallelism - increasing this allows more domains to be rebooted in parallel - however, increasing this consumes more CPU resources

a pool of test threads that run tests, each assigned a set of domains with a unique prefix

- the default is no prefix limiting things to a single global domain pool - KVM_PREFIX specifies the domain prefixes to use, and hence, the test parallelism - increasing this allows more tests to be run in parallel - however, increasing this consumes more memory and context switch resources

The test threads then use the reboot pool as follows:

get the next test
submit required domains to reboot pool
wait for domains to reboot
run test
repeat

My adjusting KVM_WORKERS and KVM_PREFIX it is possible:

speed up test runs
run independent testsuites in parallel

The reboot thread pool - make KVM_WORKERS=...

Booting the domains is the most CPU intensive part of running a test, and trying to perform too many reboots in parallel will simply bog down the machine to the point where tests time out and interactive performance becomes useless. To avoid this risk, "make kvm-test" defaults to using a reboot thread pool of size one.

To increase the size of the reboot thread pool set KVM_WORKERS. For instance:

$ grep KVM_WORKERS Makefile.inc.local
KVM_WORKERS=2
$ make kvm-install kvm-test
[...]
runner 0.019: using a pool of 2 worker threads to reboot domains
[...]
runner basic-pluto-01 0.647/0.601: 0 shutdown/reboot jobs ahead of us in the queue
runner basic-pluto-01 0.647/0.601: submitting shutdown jobs for unused domains: road nic north
runner basic-pluto-01 0.653/0.607: submitting boot-and-login jobs for test domains: east west
runner basic-pluto-01 0.654/0.608: submitted 5 jobs; currently 3 jobs pending
[...]
runner basic-pluto-01 28.585/28.539: domains started after 28 seconds

Only if your machine has lots of cores should you consider adjusting this in Makefile.inc.local. As a rule of thumb, number-of-cores/2 is a good starting point.

The tests thread pool - make KVM_PREFIX=...

Note: this is more experimental; for instance:

stopping parallel tests requires multiple control-c.s
since the domains and networks get unique names things like "ssh east" don't apply; use "make kvmsh-<prefix><domain"; or "sudo virsh console <prefix><domain".

The make variable KVM_PREFIX (see kvmrunner.py --prefix, set in Makefile.inc.local) can be used to both name and increase the size of the test domains pool. For instance:

$ grep KVM_PREFIX Makefile.inc.local
KVM_PREFIX=a.
$ make kvm-install
[...]
$ make kvm-test
[...]
runner 0.018: using the serial test processor and domain prefix 'a.'
[...]
a.runner basic-pluto-01 0.574: submitting boot-and-login jobs for test domains: a.west a.east

Consequently it becomes possible to:

test multiple directories in parallel - assign unique KVM_PREFIX names to each
run multiple tests in parallel - assign multiple KVM_PREFIX names to a single test directory

Similarly:

$ grep KVM_PREFIX Makefile.inc.local
KVM_PREFIX=a. b.
$ make kvm-install
[...]
$ make kvm-test
[...]
runner 0.019: using the parallel test processor and domain prefixes ['a.', 'b.']
[...]
b.runner basic-pluto-02 0.632/0.596: submitting boot-and-login jobs for test domains: b.west b.east
[...]
a.runner basic-pluto-01 0.769/0.731: submitting boot-and-login jobs for test domains: a.west a.east

creates and uses two dedicated domain/network groups (a.east ..., and b.east ...) allowing tests to be run in parallel.

Finally, to get rid of all the domains use:

$ make kvm-uninstall

or even:

$ make KVM_PREFIX=b. kvm-uninstall

The default is to use just one domain group. Two domain groups (e.x., KVM_PREFIX=a. b.) seems to give the best results. Unfortunately because the prefixes need to be both short and unique, it isn't easy to provide a default KVM_PREFIX using something like the directory name.

Recommendations

The test system:

4-core 64-bit intel
plenty of ram
see mk/perf.sh

Increasing the number of parallel tests, for a given number of reboot threads:

having #cores/2 reboot threads has the greatest impact
having more than #cores reboot threads seems to slow things down

Increasing the number of reboots, for a given number of test threads:

adding a second test thread has a far greater impact than adding a second reboot thread - contrast top lines
adding a third and even fourth test thread - i.e., up to #cores - still improves things

Finally here's some ASCII art showing what happens to the failure rate when the KVM_PREFIX is set so big that the reboot thread pool is kept 100% busy:

                  Fails  Reboots  Time
     ************  127      1     6:35  ****************************************
   **************  135      2     3:33  *********************
  ***************  151      3     3:12  *******************
  ***************  154      4     3:01  ******************

Notice how having more than #cores/2 KVM_WORKERS (here 2) has little benefit and failures edge upwards.

@@ Line 683: / Line 683: @@
 == Speeding up "make kvm-test" by running things in parallel ==
-By default, "make kvm-test" is set up as follows:
+Internally kvmrunner.py has two work queues:
-* the test hosts east, west, et.al. are 1:1 mapped onto the domain east, west, et.al.;
+* a pool of reboot threads used to reboot domains
-  consequently, it is only possible to run one test (and/or testsuite) at a time
+- the default is 1 thread limiting things to one reboot at a time
+- KVM_WORKERS specifies the number of reboot threads, and hence, the boot parallelism
+- increasing this allows more domains to be rebooted in parallel
+- however, increasing this consumes more CPU resources
-* the directories /testing and /source, within each domain, are hardwired to the libreswan directory used when creating the domains;
+* a pool of test threads that run tests, each assigned a set of domains with a unique prefix
-  consequently, it is only possible to run tests or testsuite from one directory at a time
+- the default is no prefix limiting things to a single global domain pool
+- KVM_PREFIX specifies the domain prefixes to use, and hence, the test parallelism
+- increasing this allows more tests to be run in parallel
+- however, increasing this consumes more memory and context switch resources
-* operations such as booting domains and running tests are serialized;
+The test threads then use the reboot pool as follows:
-  consequently, load is minimized but so too is speed
-Internally kvmrunner.py has two work queues:
+* get the next test
+* submit required domains to reboot pool
+* wait for domains to reboot
+* run test
+* repeat
-* a pool of threads used to reboot domains and configured by KVM_WORKERS; increasing this increases the number of domains booted in parallel
+My adjusting KVM_WORKERS and KVM_PREFIX it is possible:
-* a pool of test domains used to run the tests and configured by KVM_PREFIX; increasing this allows more tests to be run in parallel
-and they are structured roughly as follows:
+* speed up test runs
+* run independent testsuites in parallel
-<pre>
-<< reboot thread pool - KVM_WORKERS >> -> << test domains pool - KVM_PREFIX >>
-</pre>
-I.e., the two pools are configured and operate pretty much independently.
-The following sections will describe each in turn.
 === The reboot thread pool - make KVM_WORKERS=... ===
@@ Line 732: / Line 733: @@
 Only if your machine has lots of cores should you consider adjusting this in Makefile.inc.local.  As a rule of thumb, number-of-cores/2 is a good starting point.
-=== The tests domains pool - make KVM_PREFIX=... ===
+=== The tests thread pool - make KVM_PREFIX=... ===
 Note: this is more experimental; for instance:
@@ Line 790: / Line 791: @@
 The default is to use just one domain group.  Two domain groups (e.x., KVM_PREFIX=a. b.) seems to give the best results.  Unfortunately because the prefixes need to be both short and unique, it isn't easy to provide a default KVM_PREFIX using something like the directory name.
-=== Tuning "make kvm-test" - Some Results ===
+=== Recommendations ===
 The test system:
 * 4-core 64-bit intel
 * plenty of ram
-* the first 100 kvm tests are run
+* see mk/perf.sh
-* checkout ???
+Increasing the number of parallel tests, for a given number of reboot threads:
+[[File:tests-vs-reboots.png]]
-Here's a graph showing what happens to both the failure rate and the runtime as the number of reboot workers is increased.  KVM_PREFIX was configured so that the reboot workers were 100% busy.
+* having #cores/2 reboot threads has the greatest impact
+* having more than #cores reboot threads seems to slow things down
+Increasing the number of reboots, for a given number of test threads:
+[[File:reboots-vs-tests.png]]
+* adding a second test thread has a far greater impact than adding a second reboot thread - contrast top lines
+* adding a third and even fourth test thread - i.e., up to #cores - still improves things
+Finally here's some ASCII art showing what happens to the failure rate when the KVM_PREFIX is set so big that the reboot thread pool is kept 100% busy:
 <pre>
-                   Fails  Workers  Time
+                   Fails  Reboots  Time
       ************  127      1     6:35  ****************************************
     **************  135      2     3:33  *********************
@@ Line 808: / Line 823: @@
 </pre>
-Notice how having more than #cores/2 KVM_WORKERS (here 2) has little benefit:
+Notice how having more than #cores/2 KVM_WORKERS (here 2) has little benefit and failures edge upwards.
-* times do not improve
-* failure rate starts to go up
-* assigning #cores/2 effectively means dedicating the machine to testing - testing in two directories is pushing it
-This is consistent with earlier results where, on a 2-core machine, only 1 worker was best
-The next graph looks at what happens as the number KVM_PREFIXes is increased (allowing more tests to run in parallel).  This time KVM_WORKERS=1 acting as a throttle on cpu consumption.
-Fails Prefixes Time
-   1  11:41
-   2  7:09