Test Suite: Difference between revisions

From Libreswan
Jump to navigation Jump to search
No edit summary
(draft renumber)
 
(85 intermediate revisions by 5 users not shown)
Line 1: Line 1:
== Nightly Test Tesults ==


Libreswan comes with an extensive test suite, written mostly in python, that uses KVM virtual machines and virtual networks. It has replaced the old UML test suite.
Libreswan's testsuite is run nightly.  The results are published [https://testing.libreswan.org/ here], with the most recent result [https://testing.libreswan.org/current here]. The tests are categorized as:
Apart from KVM, the test suite uses libvirtd and qemu. It is strongly recommended to run the test suite natively on the OS (not in a VM itself) on a machine that has a CPU wth virtualization instructions.
The PLAN9 filesystem (9p) is used to mount host directories in the guests - NFS is avoided to prevent network lockups when an IPsec test case would cripple the guest's networking.


{{ ambox | nocat=true | type=important | text = libvirt 0.9.11 and qemu 1.0 or better are required. RHEL does not support a writable 9p filesystem, so the recommended host/guest OS is Fedora 22 }}
* good: these tests are expected to pass (unfortunately, some still have timing problems and occasionally fail)
* wip: these tests require further work, for instance the result may not be deterministic, or the bug they demonstrate hasn't yet been fixed
* skiptest: these tests require manual intervention to run


[[File:testnet.png]]
To run tests locally, read on.
 
== Test Frameworks ==
 
This page describes Libreswan's old test framework.  There are two more experimental front-ends available (alphabetically).
 
=== Docker - see [[Test Suite - Docker]] in this Wiki ===
 
Instead of using virtual machines, this uses Docker instances.
 
More information is found in [[Test Suite - Docker]] in this Wiki
 
=== make kvm ===
 
This is a rewrite of the the existing test framework.  Namely:
 
* the scripts install.sh/uninstall.sh, used to create and delete virtual machines, are replaced by make rules
 
* the script swantest, used to run tests on the virtual machines, is replaced by kvmrunner and make rules
 
however, this way also has known limitations:
 
* it doesn't generate pretty HTML pages; see testing/web for scripts that can do that
 
* it doesn't use tcpdump to capture network traffic; there --tcpdump option is still experimental
 
Where applicable the KVM alternative is included in the below.
 
== Preparing the host machine ==
 
In the following it is assumed that your account is called "build".
 
=== Add Yourself to sudo ===
 
The test scripts rely on being able to use sudo without a password to gain root access.  This is done by creating a no-pasword rule to /etc/sudoers.d/.
 
XXX: Surely qemu can be driven without root?
 
To set this up, add your account to the wheel group and permit wheel to have no-password access. Issue the following commands as root:
 
<pre>
echo '%wheel ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/swantest
chmod go= /etc/sudoers.d/swantest
chown root.root /etc/sudoers.d/swantest
usermod -a -G wheel build
</pre>
 
=== Disable SELinux ===
 
SELinux blocks some actions that we need.  We have not created any SELinux rules to avoid this.
 
Either set it to permissive:
 
<pre>
sudo sed --in-place=.ORIG -e 's/^SELINUX=.*/SELINUX=permissive/' /etc/selinux/config
sudo setenforce Permissive
</pre>
 
Or disabled:
 
<pre>
sudo sed --in-place=.ORIG -e 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
sudo setenforce Permissive
</pre>
 
=== Install Required Dependencies ===
 
Now we are ready to install the various components of libvirtd, qemu and kvm and then start the libvirtd service.
 
Even virt-manager isn't strictly required.
 
On Fedora 24:
 
<pre>
sudo dnf -y install qemu virt-manager virt-install libvirt-daemon-kvm libvirt-daemon-qemu \
python3-pexpect \
python3-setproctitle diffstat
</pre>
 
On Debian?
 
=== Install Utilities (Optional) ===
 
Various tools are used or convenient to have when running tests:
 
Packages to install on Fedora
 
<pre>
sudo dnf -y install git tcpdump expect python-setproctitle python-ujson pyOpenSSL
</pre>
 
Packages to install on Ubuntu
 
<pre>
apt-get install python-pexpect git tcpdump  expect python-setproctitle python-ujson \
        python3-pexpect python3-setproctitle
</pre>
 
{{ ambox | nocat=true | type=important | text = do not install strongswan-libipsec because you won't be able to run non-NAT strongswan tests! }}
 
=== Setting Users and Groups ===
 
You need to add yourself to the qemu group using:
 
<pre>
usermod -a -G qemu build
</pre>
 
The following may be out-of-date, and is kept for reference:
 
Nothing apart from the system services requires root access. However, it does require that the user you are using is allowed to run various commands as root via sudo. Additionally, libvirt assumes the VMs are running under the qemu uid, but because we want to share files using the 9p filesystem between host and guests, we want the VMs to run under our own uid. The easiest solution to accomplish all of these is to add your user (for example the username "build") to the kvm, qemu and wheel groups. These are the changed lines from /etc/groups:
 
<pre>
wheel:x:10:root,build
kvm:x:36:root,qemu,build
qemu:x:107:root,qemu,build
</pre>
 
Commands to effect this:
<pre>
sudo usermod -a -G wheel,kvm,qemu root
sudo usermod -a -G wheel,kvm,qemu build
sudo usermod -a -G kvm,qemu qemu
</pre>
 
You will need to re-login for this to take effect.
 
=== Fix /var/lib/libvirt/qemu ===
 
{{ ambox | nocat=true | type=important | text = Because our VMs don't run as qemu, /var/lib/libvirt/qemu needs to be changed using chmod g+w to make it writable for the qemu group. This needs to be repeated if the libvirtd package is updated on the system }}
 
<pre>
sudo chmod g+w /var/lib/libvirt/qemu
</pre>
 
=== ensure that the host has enough entropy ===
 
[[Entropy matters]]
 
With KVM, a guest systems uses entropy from the host through the kernel module "virtio_rng" in the guest's kernel. On the host, create the file /etc/modules-load.d/virtio.conf
 
<pre>
# /etc/modules-load.d/virtio.conf
virtio_blk
virtio-rng
virtio_console
virtio_net
virtio_scsi
virtio
virtio_balloon
virtio_input
virtio_pci
virtio_ring
9pnet_virtio
</pre>
 
=== tcpdump permissions on the Host (optional) ===
 
XXX: Only swantest uses this, and having swantest use sudo would be better.
 
The experimental kvmrunner --tcpdump option does not require this configuration change.
 
<pre>
getent group tcpdump || sudo groupadd tcpdump
#add build to group tcpdump
sudo usermod --append -G tcpdump build
ls -lt /sbin/tcpdump
sudo chown root:tcpdump /sbin/tcpdump
sudo setcap "CAP_NET_RAW+eip" /sbin/tcpdump
 
# check tcpdump group users
getent group tcpdump
tcpdump:x:72:build
 
#when the installation is complete the following should work
tcpdump -i swan12
</pre>
 
The libreswan source tree includes all the components that are used on the host and inside the test VMs. To get the latest source code using git:
 
<pre>
git clone https://github.com/libreswan/libreswan
cd libreswan
</pre>
 
=== Create the Pool directory - KVM_POOLDIR ===
 
The pool directory is used used to store KVM disk images and other configuration files.  By default $(top_srcdir)/../pool is used (that is, adjacent to your source tree).  If the directory does not exist it will need to be created.
 
To change the location of the pool directory, set the KVM_POOLDIR make variable in Makefile.inc.local.  For instance:
 
<pre>
$ grep KVM_POOLDIR Makefile.inc.local
KVM_POOLDIR=/home/libreswan/pool
</pre>
 
== Set up KVM and run the Testsuite (for the impatient) ==
 
If you're impatient, and want to just run the testsuite using kvm then:
 
* install (or update) libreswan (if needed this will create the test domains):
: <tt>make kvm-install</tt>
* run the testsuite:
: <tt>make kvm-test</tt>
* list the kvm make targets:
: <tt>make kvm-help</tt>
 
== Manually Creating and Destroying Test Domains ==


This section describes how to create the test domains manually.  Normally <tt>make kvm-install</tt> would be used.
== Running tests ==


* create the Base domain (and network):
The libreswan tests, in testing/pluto, can be run using several different mechanisms:
: <tt>make install-kvm-base-domain</tt>
* create the Clone domain from the base domain:
: <tt>make install-kvm-clone-domain</tt>
* create the Test domains (and networks) from the clone domains:
: <tt>make install-kvm-test-domains</tt>


and the reverse:
{| class="wikitable"
|+ Test Frameworks
! Framework
! Speed
! Host OS
! Guest OS
! initsystem testing (systemd, rc.d, ...)
! Post-mortem
! Interop testing
! Notes
|- style="vertical-align:top;"
| [[Test Suite - KVM | KVM]]
| slower
| Fedora, Debian <br>(BSD anyone?)
| Alpine, Fedora, FreeBSD, NetBSD, OpenBSD, Debian
| yes
| shutdown, core, leaks, refcnt, selinux
| strongswan (Linux, FreeBSD), iked (OpenBSD), racoon (NetBSD), racoon2 (NetBSD)
| gold standard <br> ideal for BSD builds <br> idea for testing custom kernels <br> used by the [https://testing.libreswan.org Testing] machine <br> requires 9p (virtio anyone?)
|- style="vertical-align:top;"
| [[Test Suite - Namespace | Namespaces]]
| fast
| linux
| uses host's libreswan, kernel, and utilities
| no
| core, leaks
| strongswan (linux)?
| ideal for quick tests <br> requires libreswan to be built/installed on the host <br> requires all dependencies to be installed on the host <br> test results sensitive  differing kernel and utilities
|- style="vertical-align:top;"
| [[Test Suite - Docker | Docker]]
|
| linux
| uses host's kernel <br> uses distro's utilities
| ?
| ?
| ?
| ideal for cross-linux builds (CentOS 6, 7, 8, Fedora 28 - rawhide, Debian, Ubuntu) <br> sensitive to differing kernel and utilities
|}


* destroy the test domains and networks:
== How tests work ==
: <tt>make kvm-uninstall</tt>
* destroy the clone domain
: <tt>make kvm-uninstall-clones</tt>
* destroy everything
: <tt>make kvm-uninstall-base uninstall-kvm-default-network</tt>


=== Details (very out-of-date) ===
All the test cases involving VMs are located in the libreswan directory under <tt>testing/pluto/</tt>.  The most basic test case is called basic-pluto-01. Each test case consists of a few files:
 
First, a new VM is added to the system called "fedorabase" (or "ubuntubase"). This is an automated minimal install using kickstart. In the "post install" phase of the anaconda installer, this VM runs a "yum update" to ensure we have the latest versions of all packages. In that %post phase we also install various packages that we need to run the tests. This can result in the installer spending a very long time in the "post install" phase. During this time, the VM displays no progress bar. Just be patient.
 
Once the VM is fully installed, the disk image is converted to QCOW and copied for each test VM, west, east, north, road and nic. A few virtual networks are created to hook up the VMs in isolation. These virtual networks have names like "192_1_2_0" and use bridge interfaces names like "swan12". Finally, the actual VMs are added to the system's libvirt/KVM system.
 
== Installing Libreswan on the VMs ==
 
Either:
 
<pre>
make kvm-install
</pre>
 
Or:
 
<pre>
make UPDATEONLY=1 check
</pre>
 
{{ ambox | nocat=true | type=important | text = The directories /source and /testing inside any VM are automatically mounted from the host's libreswan directory. If you need to switch to another build then the test domains should be rebuilt vis: make kvm-uninstall kvm-install }}
 
== Running the testsuite ==
 
=== Generating Certificates ===
 
The full testsuite requires a number of certificates.  The virtual domains are configured for this purpose.  Just use:
 
<pre>
make kvm-keys
</pre>
 
alternatively, the certificates can be generated on the local machine:
 
<pre>
cd testing/x509
./dist_certs.py
</pre>
 
( ''Before pyOpenSSL version 0.15 you couldn't run dist_certs.py without a patch to support creating SHA1 CRLs.
A patch for this can be found at'' https://github.com/pyca/pyopenssl/pull/161 )
 
=== Run the testsuite ===
 
To run all test cases (which include compiling and installing it on all vms, and non-VM based test cases), run:
 
<pre>
make kvm-install kvm-test
</pre>
 
or:
 
<pre>
make check UPDATE=1
</pre>
 
=== Stopping pluto tests (gracefully) ===
 
If you used "make kvm-test", type control-C; possibly repeatedly.
 
If you used "make check", try:
 
The tests run for a long time.  For example, on one of our machines they currently take 10 hours.  If you want to stop a test run between individual pluto tests, you can create a file to indicate this:
<pre>
touch testing/pluto/stop-tests-now
</pre>
Be sure to remove the file afterwards.
 
== Shell and Console Access (Logging In) ==
 
There are several different ways to gain shell access to the domains.
 
Each method, depending on the situation, has both advantages and disadvantages.  For instance:
 
* while virt-manager and virsh provide quick access to the console, their functionality is limited
* while SSH takes more to set up, it supports things like proper terminal configuration and file copy
 
=== Graphical Console access using virt-manager ===
 
"virt-manager", a gnome tool can be used to access individual domains.
 
While easy to use, it doesn't support cut/paste or mechanisms for copying files.
 
=== Serial Console access using "virsh" ===
 
This provides text access to the domain's serial console.  Things like cut/paste work but not much else.  You need to remember to boot a domain before connecting.  For instance, this sequence starts then connects to the console, and then sets up TERM et.al.:
 
<pre>
$ printenv TERM
TERM=xterm
$ stty -a
... rows 52; columns 185; ...
$ sudo virsh start east
[...]
$ sudo virsh console east
[...]
Username: root
Password: swan
[root@east ~]# export TERM=...
[root@east ~]# stty rows <see-above> columns <see-above>
[root@east ~]# ...
</pre>
 
=== Serial Console access using "kvmsh.py" / "make kvmsh-HOST" ===
 
"kvmsh", is a wrapper around "virsh".  It automatically handles things like booting the machine, logging in, and correctly configuring the terminal:
 
<pre>
$ ./testing/utils/kvmsh.py east
[...]
Escape character is ^]
[root@east ~]# printenv TERM
xterm
[root@east ~]# stty -a
...; rows 52; columns 185; ...
[root@east ~]#
</pre>
 
"kvmsh.py" can also be used to script remote commands (for instance, it is used to run "make" on the build domain):
 
<pre>
$ ./testing/utils/kvmsh.py east ls
[root@east ~]# ls
anaconda-ks.cfg
</pre>
 
Finally, "make kvmsh-HOST" provides a short cut for the above; and if your using multiple build trees (see further down), it will connect to the DOMAIN that corresponds to HOST.  For instance, notice how the domain "a.east" is passed to kvmsh.py in the below:
 
<pre>
$ make kvmsh-east
/home/libreswan/pools/testing/utils/kvmsh.py --output ++compile-log.txt --chdir . a.east
Escape character is ^]
[root@east source]#
</pre>
 
Limitations:
 
* no file transfer
 
=== Shell access using SSH ===
 
While requiring slightly more effort to set up, it provides full shell access to the domains.
 
Since you will be using ssh a lot to login to these machines, it is recommended to either put their names in /etc/hosts:
 
<pre>
# /etc/hosts entries for libreswan test suite
192.1.2.45 west
192.1.2.23 east
192.0.3.254 north
192.1.3.209 road
192.1.2.254 nic
</pre>
 
or add entries to .ssh/config such as:
 
<pre>
Host west
        Hostname 192.1.2.45
</pre>
 
If you wish to be able to ssh into all the VMs created without using a password, add your ssh public key to '''testing/baseconfigs/all/etc/ssh/authorized_keys'''. This file is installed as /root/.ssh/authorized_keys on all VMs
 
Using ssh becomes easier if you are running ssh-agent (you probably are) and your public key is known to the virtual machine.  This command, run on the host, installs your public key on the root account of the guest machines west.  This assumes that west is up (it might not be, but you can put this off until you actually need ssh, at which time the machine would need to be up anyway).  Remember that the root password on each guest machine is "swan".
<pre>
ssh-copy-id root@west
</pre>
You can use ssh-copy for any VMUnfortunately, the key is forgotten when the VM is restarted.
 
== Run an individual test (or tests) ==
 
All the test cases involving VMs are located in the libreswan directory under testing/pluto/ . The most basic test case is called basic-pluto-01. Each test case consists of a few files:


* description.txt to explain what this test case actually tests
* description.txt to explain what this test case actually tests
Line 414: Line 64:
* testparams.sh if there are any non-default test parameters
* testparams.sh if there are any non-default test parameters


You can run this test case by issuing the following command on the host:
Once the test run has completed, you will see an OUTPUT/ directory in the test case directory:
 
Either:
 
<pre>
cd testing/pluto/basic-pluto-01/
../../utils/swantest
</pre>
 
or:
 
<pre>
make kvm-test KVM_TESTS=testing/pluto/basic-pluto-01/
</pre>
 
multiple tests can be selected with:
 
<pre>
make kvm-test KVM_TESTS=testing/pluto/basic-pluto-*
</pre>
 
Once the testrun has completed, you will see an OUTPUT/ directory in the test case directory:


<pre>
<pre>
Line 451: Line 80:
* Any core dumps generated if a pluto daemon crashed
* Any core dumps generated if a pluto daemon crashed


== Diagnosing inside the VM ==
; testing/baseconfigs/
:  configuration files installed on guest machines
; testing/guestbin/
:  shell scripts used by tests, and run on the guest
; testing/linux-system-roles.vpn/
:  ???
; testing/packaging/
:  ???
; testing/pluto/TESTLIST
:  list of tests, and their expected outcome
; testing/pluto/*/
:  individual test directories
; testing/programs/
:  executables used by tests, and run on the guest
; testing/sanitizers/
:  filters for cleaning up the test output
; testing/utils/
:  test drivers and other host tools
; testing/x509/
:  certificates, scripts are run on a guest


=== Method 1 ===
== Network Diagram ==


Once a test run has completed, the VMs shut down the ipsec subsystem. You can use ssh to login as root on any host (password "swan") and rerun the testcase manually. This gives you a chance to repeat a crasher while using gdb.  You need three terminals to do this.
* interface-0 (eth0, vio0, vioif0) is connected to SWANDEFAULT which has a NAT gateway to the internet
** the exceptions are the Linux test domains: EAST, WEST, ROAD, NORTH; should they?
** the BSD domains always up inteface-0 so that /pool, /source, and /testing can be NFS mounted
** NIC needs to run DHCP on eth0 manually; how?
** transmogrify does not try to modify interface-0(SWANDEFAULT) (doing so would break established network sessions such as NFS)
* the interface names do not have consistent order (see comment above about Fedora's interface-0 not pointing at SWANDEFAULT)
** Fedora has ethN
** OpenBSD has vioN (different order)
** NetBSD has vioifN (different order)


Terminal 1: prepare west
  LEFT                                                              RIGHT
<pre>
 
ssh root@west
  192.0.3.0/24 -------------------------------------+-- 2001:db8:0:3::/64  (198.18.33)
cd /testing/pluto/basic-pluto-01
                                                    |
sh ./westinit.sh
                                            2001:db8:0:3::254
</pre>
                                              192.0.3.254(eth0)
                  ROAD                            NORTH
              192.1.3.209(eth0)            192.1.3.33(eth1)
            2001:db8:1:3::209            2001:db8:1:3::33
                    |                              |
  192.1.3.0/24 -----+----------------+--------------+-- 2001:db8:1:3::/64  (198.18.3
                                    |
                            2001:db8:1:3::254
                                192.1.3.254(eth2)
                                    NIC---swandefault(0)
                              192.1.2.254(eth1)
                            2001:db8:1:2::254
                                    |
  192.1.2.0/24 -----------------+----+----------------------+----- 2001:db8:1:2::/64 (198.18.2)
                                |                          |
                        2001:db8:1:2::45            2001:db8:1:2::23
                      (eth1)192.1.2.45            (eth1)192.1.2.23
                              WEST                        EAST
                      (eth0)192.0.1.254          (eth0)192.0.2.254
                        2001:db8:0:1::254          2001:db8:0:2::254
                                |                          |
                                |                          |
                                |                          |
                                |                      TEST-NET-1
                                |      192.0.2.0/24 -------+---+-- 2001:db8:0:2::/64  (198.18.23)
                                |                              |
                                |                    2001:db8:0:2::12/64
                                |                        192.0.2.12/24(if1)
                                |                          ${OS}RISE--198.18.12.12/24
                                |                      2001:db8:1::12/64
                                |                        198.18.1.12/24(if1)
                                |                              |
  192.0.1.0/24 -------------+---+------ 2001:db8:0:1::/64      |      (198.18.45)
                            |                                  |
                    001:db8:0:1::15/64                          |
                        192.0.1.15/24(if1)                      |
      192.18.15.15/24--${OS}SET                                |
                    2001:db8:1::15/64                          |
                      198.18.1.15/24(if1)                      |
                            |                                  |
  198.18.1.0/24 ------------+-----------------------------------+----- 2001:db8:1::/64


Terminal 2: prepare east
== Problems with the existing Network ==
<pre>
ssh root@east
cd /testing/pluto/basic-pluto-01
sh ./eastinit.sh
</pre>


Terminal 3: gdb
The current network has a number of limitations.  This section identifies those problems and proposes changes to address them:


This assumes that initialization worked and pluto hasn't yet crashed.
* the gateway had only 128 DHCP addresses
Pick the side you wish to gdb, ssh in, and start gdb
* public networks are being used: 192.0.1.0/24 is owned by elevatedcomputing.com; 192.1.2.0/24 is owned by raytheon.com; 192.1.3.0/24 is owned by raytheon.com
<pre>
* 192.168.234.0/24 (gateway) is reserved for private use networks and known to clash with toronto airport
ssh root@eastORwest
* using public interfaces means that they can't be used in documentation
gdb -p `pidof pluto`
* can't test two hosts where each is behind a gateway VPN
gdb> cont
</pre>
If pluto wasn't running, gdb would complain: ''<code>--p requires an argument</code>''


When pluto crashes, gdb will show that and await commands. For example, the bt command will show a backtrace.
see
Ref https://www.rfc-editor.org/rfc/rfc5737 https://www.rfc-editor.org/rfc/rfc3849 https://www.rfc-editor.org/rfc/rfc6890


Terminal 1: start the test
The suggestion is:
<pre>
* use the benchmarking network 198.18.0.0/15
sh ./westrun.sh
* reserve 198.19/16 for the gateway
</pre>
* revive [sun]RISE (behind EAST) and [sun]SET (behind WEST)
* reserve a number N for each machine / network: EAST: 23; WEST: 45; RISE: 123; SET: 145; NORTH: 33; NIC: 254
* use 198.18.2N.N/24 for IPsec Interfaces
* use 192.18.1N.1N/24 for RISE and SET


=== Method 2 ===
  LEFT                                                              RIGHT
 
  198.18.254.0/24 ----------------------------------+-------------- 2001:db8:254::/64
                                                    |
                                            2001:db8:254::254
                                              198.18.254.254(eth0)
                      ROAD                      NORTH
                  198.18.3.209(eth0)        198.18.3.33(eth1)
                2001:db8:3::209            2001:db8:3::33
                        |                          |
  198.18.3.0/254 --------+----------------+---------+---------------- 2001:db8:3::/64
                                          |
                                  2001:db8:3::254
                                    198.18.3.254(eth2)
                                        NIC---swandefault(0)
                                    198.18.2.254(eth1)
                                  2001:db8:2::254
                                          |
                                          |
  198.18.2.0/24 ----------------+---------+-----------------+-------- 2001:db8:2::/64
                                |                          |
                        2001:db8:2::45              2001:db8:2::23
                          198.18.2.45(eth1)          198.18.2.23(eth1)
              192.18.45.45/24--WEST                        EAST--198.18.23.23/24
                          192.0.1.45(eth0/2)          192.0.2.23(eth0/2)
                      001:db8:0:1::45            2001:db8:0:2::23
                                |                          |
                                |                          |
                                |                      TEST-NET-1
                                |      192.0.2.0/24 -------+---+-- 2001:db8:0:2::/64
                                |                              |
                                |                    2001:db8:0:2::12/64
                                |                        192.0.2.12/24(if1)
                                |                          ${OS}RISE--198.18.12.12/24
                                |                      2001:db8:1::12/64
                                |                        198.18.1.12/24(if1)
                                |                              |
  192.0.1.0/24 -------------+---+------ 2001:db8:0:1::/64      |
                            |                                  |
                  001:db8:0:1::15/64                          |
                      192.0.1.15/24(if1)                      |
    192.18.15.15/24--${OS}SET                                  |
                    2001:db8:1::15/64                          |
                      198.18.1.15/24(if1)                      |
                            |                                  |
  198.18.1.0/24 ------------+-----------------------------------+----- 2001:db8:1::/64


Once a testrun has completed, the VMs shut down the ipsec subsystem. You can use ssh to login as root on any host (password "swan") and rerun the testcase manually. This gives you a chance to repeat a crasher while using gdb:  
The changes are as follows:
<pre>
ssh root@east
ipsec setup start
pidof pluto
cd /source/OBJ*
gdb programs/pluto/pluto
gdb> attach <pid>
gdb> cont
</pre>


In another window, prepare west:
=== Hand Sketch of Current Network ===


<pre>
[[File:networksketch.png]]
ssh root@west
cd /testing/pluto/basic-pluto-01
sh ./westinit.sh
</pre>


In still another window, you can login to east and re-trigger the failure. You can either use the root command history using the arrow keys to start ipsec and load the right connection, or you can re-run the "eastinit.sh" file:
=== Original Network Diagram ===


<pre>
[[File:testnet.png]]
ssh root@east
cd /testing/pluto/basic-pluto-01
sh ./eastinit.sh
</pre>
 
In the west window, you can either continue with running "westrun.sh" or you can look at westrun.sh and issue the commands manually.
 
=== Using kvmrunner py ===
 
First, run the test up-to the run script:
 
<pre>
$ ./testing/utils/kvmrunner.py --stop-at westrun.sh testing/pluto/basic-pluto-01
...
runner basic-pluto-01 54.670/54.647: stopping test run at (before executing) script westrun.sh
...
$
</pre>
 
then, using a text console, start GDB and run the main part of the test:
 
<pre>
$ ./testing/utils/kvmsh.py west
[west]# pgrep pluto
<pid>
[west]# gdb -p $(pgrep pluto)
(gdb) shell ./westrun.sh &
(gdb) cont
</pre>
 
I suspect your build will also need to disable systemd and watchdog timers.
 
=== /root/.gdbinit ===
 
If you want to get rid of the warning "warning: File "/testing/pluto/ikev2-dpd-01/.gdbinit" auto-loading has been declined by your `auto-load safe-path'"
 
<pre>
echo "set auto-load safe-path /" >> /root/.gdbinit
</pre>
 
== Updating one or more VMs ==
 
Sometimes you want to update one or more VM's systems, adding additional packages.
 
=== Updating all the VMs ===
 
If all the VMS need updating then it is easier to just modify the base VM and then re-create the clones:
 
# delete all the copies of the base VM:
#: <tt>$ make uninstall-kvm-clones</tt>
# update the base:
#: <tt>$ ./testing/utils/kvmsh.py swanfedorabase dnf install new-package</tt>
# re-clone the test domains and install:
#: <tt>$ make kvm-install</tt>
 
=== Update a single VM ===
 
This requires an internet connection. While the VMs are completely isolated, the "nic" VM can be configured to give internet access to the machines:
 
<pre>
ssh root@nic
ifup eth3
iptables -I POSTROUTING -t nat -o eth3 -j MASQUERADE
route add default gw 192.168.234.1  # may be needed
exit
</pre>
 
On the other VMs, change the nameserver entry in /etc/resolv.conf to point to a valid resolver (eg 8.8.8.8 or 193.110.157.123) and the VM will have full internet connectivity.
 
{{ ambox | nocat=true | type=important | text = Do not enable eth3 on "nic" per default, as it will affect the actual test cases that are run. }}
 
== The /testing/guestbin directory ==
 
The guestbin directory contains scripts that are used within the VMs only.
 
=== swan-transmogrify ===
 
When the VMs were installed, an XML configuration file from testing/libvirt/vm/ was used to configure each VM with the right disks, mounts and nic cards. Each VM mounts the libreswan directory as /source and the libreswan/testing/ directory as /testing . This makes the /testing/guestbin/ directory available on the VMs. At boot, the VMs run /testing/guestbin/swan-transmogrify. This python script compares the nic of eth0 with the list of known MAC addresses from the XML files. By identifying the MAC, it knows which identity (west, east, etc) it should take on. Files are copied from /testing/baseconfigs/ into the VM's /etc directory and the network service is restarted.
 
=== swan-build, swans-install, swan-update ===
 
These commands are used to build, install or build+install (update) the libreswan userland and kernel code
 
=== swan-prep ===
 
This command is run as the first command of each test case to setup the host. It copies the required files from /testing/baseconfigs/ and the specific test case files onto the VM test machine. It does not start libreswan. That is done in the "init.sh" script.
 
The swan-prep command takes two options.
The --x509 option is required to copy in all the required certificates and update the NSS database.
The --46 /--6 option is used to give the host IPv4 and/or IPv6 connectivity. Hosts per default only get IPv4 connectivity as this reduces the noise captured with tcpdump
 
=== fipson and fipsoff ===
 
These are used to fake a kernel into FIPS mode, which is required for some of the tests.
 
 
== Various notes ==
 
* Currently, only one test can run at a time.
* You can peek at the guests using virt-manager or you can ssh into the test machines from the host.
* ssh may be slow to prompt for the password.  If so, start up the vm "nic"
* On VMs use only one CPU core. Multiple CPUs may cause pexpect to mangle output.
* 2014 Mar: DHR needed to do the following to make things work each time he rebooted the host
<pre>
$ sudo setenforce Permissive
$ ls -ld /var/lib/libvirt/qemu
drwxr-x---. 6 qemu qemu 4096 Mar 14 01:23 /var/lib/libvirt/qemu
$ sudo chmod g+w /var/lib/libvirt/qemu
$ ( cd testing/libvirt/net ; for i in * ; do sudo virsh net-start $i ; done ; )
</pre>
* to make the SELinux enforcement change persist across host reboots, edit /etc/selinux/config
* to remove "169.254.0.0/16 dev eth0  scope link  metric 1002" from "ipsec status output"
<pre> echo 'NOZEROCONF=1' >> /etc/sysconfig/network </pre>
 
=== Need Strongswan 5.3.2 or later ===
The baseline Strongswan needed for our interop tests is 5.3.2.  This isn't part of Fedora or RHEL/CentOS at this time (2015 September).
 
Ask Paul for a pointer to the required RPM files.
 
Strongswan has dependency libtspi.so.1
<pre>
sudo yum install trousers
sudo rpm -ev  strongswan
sudo rpm -ev strongswan-libipsec
sudo rpm -i strongswan-5.2.0-4.fc20.x86_64.rpm
</pre>
 
To update to a newer verson, place the rpm in the source tree on the host machine.  This avoids needing to connect the guests to the internet.  Then start up all the machines, wait until they are booted, and update the Strongswan package on each machine.  (DHR doesn't know which machines actually need a Strongswan.)
<pre>
for vm in west east north road ; do sudo virsh start $vm; done
# wait for booting to finish
for vm in west east north road ; do ssh root@$vm 'rpm -Uv /source/strongswan-5.3.2-1.0.lsw.fc21.x86_64.rpm' ; done
</pre>
 
== To improve ==
* install and remove RPM using swantest + make rpm support
* add summarizing script that generate html/json to git repo
* cordump. It has been a mystery :) systemd or some daemon appears to block coredump on the Fedora 20 systems.
* when running multiple tests from TESTLIST shutdown the hosts before copying OUTPUT dir. This way we get leak detect inf. However, for single test runs do not shut down.
 
== IPv6 tests ==
IPv6 test cases seems to work better when IPv6 is disabled on the KVM bridge interfaces the VMs use. The bridges are swanXX and their config files are /etc/libvirt/qemu/networks/192_0_1.xml . Remove the following line from it. Reboot/restart libvirt.
 
<pre>
libvirt/qemu/networks/192_0_1.xml
 
<ip family="ipv6" address="2001:db8:0:1::253" prefix="64"/>
 
</pre>
 
and ifconfig swan01 should have no IPv6 address, no fe:80 or any v6 address. Then the v6 testcases should work. 
 
<br> please give me feedback if this hack work for you. I shall try to add more info about this.
 
== Sanitizers ==
* summarize output from tcpdump
* count established IKE, ESP , AH states (there is count at the end of "ipsec status " that is not accurate. It counts instantiated connection as loaded.
 
* dpd ping sanitizer. DPD tests have unpredictable packet loss for ping.
 
== Publishing Results on the web: http://testing.libreswan.org/results/ ==
 
This is experimental and uses:
 
* CSS
* javascript
 
Two scripts are available:
 
* <tt>testing/web/setup.sh</tt>
: sets up the directory <tt>~/results</tt> adding any dependencies
* <tt>testing/web/publish.sh</tt>
: runs the testsuite and then copies the results to <tt>~/results</tt>
 
To view this, use file:///.
 
To get this working with httpd (Apache web server):
 
<pre>
sudo systemctl enable httpd
sudo systemctl start httpd
sudo ln -s ~/results /var/www/html/
sudo sh -c 'echo "AddType text/plain .diff" >/etc/httpd/conf.d/diff.conf'
</pre>
 
To view the results, use http://localhost/results.
 
== Speeding up "make kvm-test" by running things in parallel ==
 
Internally kvmrunner.py has two work queues:
 
* a pool of reboot threads; each thread reboots one domain at a time
* a pool of test threads; each thread runs one test at a time using domains with a unique prefix
 
The test threads uses the reboot thread pool as follows:
 
* get the next test
* submit required domains to reboot pool
* wait for domains to reboot
* run test
* repeat
 
My adjusting KVM_WORKERS and KVM_PREFIXES it is possible:
 
* speed up test runs
* run independent testsuites in parallel
 
=== The reboot thread pool - make KVM_WORKERS=... ===
 
Booting the domains is the most CPU intensive part of running a test, and trying to perform too many reboots in parallel will bog down the machine to the point where tests time out and interactive performance becomes hopeless.  For this reason a pre-sized pool of reboot threads is used to reboot domains:
 
* the default is 1 reboot thread limiting things to one domain reboot at a time
* KVM_WORKERS specifies the number of reboot threads, and hence, the reboot parallelism
* increasing this allows more domains to be rebooted in parallel
* however, increasing this consumes more CPU resources
 
To increase the size of the reboot thread pool set KVM_WORKERS.  For instance:
 
<pre>
$ grep KVM_WORKERS Makefile.inc.local
KVM_WORKERS=2
$ make kvm-install kvm-test
[...]
runner 0.019: using a pool of 2 worker threads to reboot domains
[...]
runner basic-pluto-01 0.647/0.601: 0 shutdown/reboot jobs ahead of us in the queue
runner basic-pluto-01 0.647/0.601: submitting shutdown jobs for unused domains: road nic north
runner basic-pluto-01 0.653/0.607: submitting boot-and-login jobs for test domains: east west
runner basic-pluto-01 0.654/0.608: submitted 5 jobs; currently 3 jobs pending
[...]
runner basic-pluto-01 28.585/28.539: domains started after 28 seconds
</pre>
 
Only if your machine has lots of cores should you consider adjusting this in Makefile.inc.local.
 
=== The tests thread pool - make KVM_PREFIXES=... ===
 
Note that this is still somewhat experimental and has limitations:
 
* stopping parallel tests requires multiple control-c's
* since the duplicate domains have the same IP address, things like "ssh east" don't apply; use "make kvmsh-<prefix><domain>" or "sudo virsh console <prefix><domain" or "./testing/utils/kvmsh.py <prefix><domain>".
 
Tests spend a lot of their time waiting for timeouts or slow tasks to complete.  So that tests can be run in parallel the KVM_PREFIX provides a list of prefixes to add to the host names forming unique domain groups that can each be used to run tests:
 
* the default is no prefix limiting things to a single global domain pool
* KVM_PREFIXES specifies the domain prefixes to use, and hence, the test parallelism
* increasing this allows more tests to be run in parallel
* however, increasing this consumes more memory and context switch resources
 
For instance, setting KVM_PREFIXES in Makefile.inc.local to specify a unique set of domains for this directory:
 
<pre>
$ grep KVM_PREFIX Makefile.inc.local
KVM_PREFIX=a.
$ make kvm-install
[...]
$ make kvm-test
[...]
runner 0.018: using the serial test processor and domain prefix 'a.'
[...]
a.runner basic-pluto-01 0.574: submitting boot-and-login jobs for test domains: a.west a.east
</pre>
 
And setting KVM_PREFIXES in Makefile.inc.local to specify two prefixes and, consequently, run two tests in parallel:
 
<pre>
$ grep KVM_PREFIX Makefile.inc.local
KVM_PREFIX=a. b.
$ make kvm-install
[...]
$ make kvm-test
[...]
runner 0.019: using the parallel test processor and domain prefixes ['a.', 'b.']
[...]
b.runner basic-pluto-02 0.632/0.596: submitting boot-and-login jobs for test domains: b.west b.east
[...]
a.runner basic-pluto-01 0.769/0.731: submitting boot-and-login jobs for test domains: a.west a.east
</pre>
 
creates and uses two dedicated domain/network groups (a.east ..., and b.east ...).
 
Finally, to get rid of all the domains use:
 
<pre>
$ make kvm-uninstall
</pre>
 
or even:
 
<pre>
$ make KVM_PREFIX=b. kvm-uninstall
</pre>
 
Two domain groups (e.x., KVM_PREFIX=a. b.) seems to give the best results.
 
=== Recommendations ===
 
==== Some Analysis ====
 
The test system:
 
* 4-core 64-bit intel
* plenty of ram
* the file mk/perf.sh
 
Increasing the number of parallel tests, for a given number of reboot threads:
 
[[File:tests-vs-reboots.png]]
 
* having #cores/2 reboot threads has the greatest impact
* having more than #cores reboot threads seems to slow things down
 
Increasing the number of reboots, for a given number of test threads:
 
[[File:reboots-vs-tests.png]]
 
* adding a second test thread has a far greater impact than adding a second reboot thread - contrast top lines
* adding a third and even fourth test thread - i.e., up to #cores - still improves things
 
Finally here's some ASCII art showing what happens to the failure rate when the KVM_PREFIX is set so big that the reboot thread pool is kept 100% busy:
 
<pre>
                  Fails  Reboots  Time
    ************  127      1    6:35  ****************************************
  **************  135      2    3:33  *********************
  ***************  151      3    3:12  *******************
  ***************  154      4    3:01  ******************
</pre>
 
Notice how having more than #cores/2 KVM_WORKERS (here 2) has little benefit and failures edge upwards.
 
==== Desktop Development Directory ====
 
* reduce build/install time - use only one prefix
* reduce single-test time - boot domains in parallel
* use the non-prefix domains east et.al. so it is easy to access the test domains using tools like ssh
 
Lets assume 4 cores:
 
<pre>
KVM_WORKERS=2
KVM_PREFIX=''
</pre>
 
You could also add a second prefix vis:
 
<pre>
KVM_PREFIX= '' a.
</pre>
 
but that, unfortunately, slows down the the build/install time.
 
==== Desktop Baseline Directory ====
 
* do not overload the desktop - reduce CPU load by booting sequentially
* reduce total testsuite time - run tests in parallel
* keep separate to development directory above
 
Lets assume 4 cores
 
* KVM_WORKERS=1
* KVM_PREFIX= b1. b2.
 
==== Dedicated Test Server ====
 
* minimize total testsuite time
* maximize CPU use
* assume only testsuite running
 
Assuming 4 cores:
 
<pre>
* KVM_WORKERS=2
* KVM_PREFIX= '' t1. t2. t3.
</pre>

Latest revision as of 15:43, 11 October 2024

Nightly Test Tesults

Libreswan's testsuite is run nightly. The results are published here, with the most recent result here. The tests are categorized as:

  • good: these tests are expected to pass (unfortunately, some still have timing problems and occasionally fail)
  • wip: these tests require further work, for instance the result may not be deterministic, or the bug they demonstrate hasn't yet been fixed
  • skiptest: these tests require manual intervention to run

To run tests locally, read on.

Running tests

The libreswan tests, in testing/pluto, can be run using several different mechanisms:

Test Frameworks
Framework Speed Host OS Guest OS initsystem testing (systemd, rc.d, ...) Post-mortem Interop testing Notes
KVM slower Fedora, Debian
(BSD anyone?)
Alpine, Fedora, FreeBSD, NetBSD, OpenBSD, Debian yes shutdown, core, leaks, refcnt, selinux strongswan (Linux, FreeBSD), iked (OpenBSD), racoon (NetBSD), racoon2 (NetBSD) gold standard
ideal for BSD builds
idea for testing custom kernels
used by the Testing machine
requires 9p (virtio anyone?)
Namespaces fast linux uses host's libreswan, kernel, and utilities no core, leaks strongswan (linux)? ideal for quick tests
requires libreswan to be built/installed on the host
requires all dependencies to be installed on the host
test results sensitive differing kernel and utilities
Docker linux uses host's kernel
uses distro's utilities
? ? ? ideal for cross-linux builds (CentOS 6, 7, 8, Fedora 28 - rawhide, Debian, Ubuntu)
sensitive to differing kernel and utilities

How tests work

All the test cases involving VMs are located in the libreswan directory under testing/pluto/. The most basic test case is called basic-pluto-01. Each test case consists of a few files:

  • description.txt to explain what this test case actually tests
  • ipsec.conf files - for host west is called west.conf. This can also include configuration files for strongswan or racoon2 for interop testig
  • ipsec.secret files - if non-default configurations are used. also uses the host syntax, eg west.secrets, east.secrets.
  • An init.sh file for each VM that needs to start (eg westinit.sh, eastinit.sh, etc)
  • One run.sh file for the host that is the initiator (eg westrun.sh)
  • Known good (sanitized) output for each VM (eg west.console.txt, east.console.txt)
  • testparams.sh if there are any non-default test parameters

Once the test run has completed, you will see an OUTPUT/ directory in the test case directory:

$ ls OUTPUT/
east.console.diff  east.console.verbose.txt  RESULT       west.console.txt          west.pluto.log
east.console.txt   east.pluto.log            swan12.pcap  west.console.diff  west.console.verbose.txt
  • RESULT is a text file (whose format is sure to change in the next few months) stating whether the test succeeded or failed.
  • The diff files show the differences between this testrun and the last known good output.
  • Each VM's serial (sanitized) console log (eg west.console.txt)
  • Each VM's unsanitized verbose console output (eg west.console.verbose.txt)
  • A network capture from the bridge device (eg swan12.pcap)
  • Each VM's pluto log, created with plutodebug=all (eg west.pluto.log)
  • Any core dumps generated if a pluto daemon crashed
testing/baseconfigs/
configuration files installed on guest machines
testing/guestbin/
shell scripts used by tests, and run on the guest
testing/linux-system-roles.vpn/
???
testing/packaging/
???
testing/pluto/TESTLIST
list of tests, and their expected outcome
testing/pluto/*/
individual test directories
testing/programs/
executables used by tests, and run on the guest
testing/sanitizers/
filters for cleaning up the test output
testing/utils/
test drivers and other host tools
testing/x509/
certificates, scripts are run on a guest

Network Diagram

  • interface-0 (eth0, vio0, vioif0) is connected to SWANDEFAULT which has a NAT gateway to the internet
    • the exceptions are the Linux test domains: EAST, WEST, ROAD, NORTH; should they?
    • the BSD domains always up inteface-0 so that /pool, /source, and /testing can be NFS mounted
    • NIC needs to run DHCP on eth0 manually; how?
    • transmogrify does not try to modify interface-0(SWANDEFAULT) (doing so would break established network sessions such as NFS)
  • the interface names do not have consistent order (see comment above about Fedora's interface-0 not pointing at SWANDEFAULT)
    • Fedora has ethN
    • OpenBSD has vioN (different order)
    • NetBSD has vioifN (different order)
 LEFT                                                              RIGHT
 
 192.0.3.0/24 -------------------------------------+-- 2001:db8:0:3::/64   (198.18.33)
                                                   |
                                           2001:db8:0:3::254
                                              192.0.3.254(eth0)
                 ROAD                            NORTH
              192.1.3.209(eth0)             192.1.3.33(eth1)
           2001:db8:1:3::209             2001:db8:1:3::33
                   |                               |
 192.1.3.0/24 -----+----------------+--------------+-- 2001:db8:1:3::/64   (198.18.3
                                    |
                            2001:db8:1:3::254
                               192.1.3.254(eth2)
                                   NIC---swandefault(0)
                             192.1.2.254(eth1)
                            2001:db8:1:2::254
                                    |
 192.1.2.0/24 -----------------+----+----------------------+----- 2001:db8:1:2::/64 (198.18.2)
                               |                           |
                       2001:db8:1:2::45            2001:db8:1:2::23
                      (eth1)192.1.2.45            (eth1)192.1.2.23
                              WEST                        EAST
                      (eth0)192.0.1.254           (eth0)192.0.2.254
                       2001:db8:0:1::254           2001:db8:0:2::254
                               |                           |
                               |                           |
                               |                           |
                               |                       TEST-NET-1
                               |       192.0.2.0/24 -------+---+-- 2001:db8:0:2::/64   (198.18.23)
                               |                               |
                               |                     2001:db8:0:2::12/64
                               |                        192.0.2.12/24(if1)
                               |                          ${OS}RISE--198.18.12.12/24
                               |                       2001:db8:1::12/64
                               |                         198.18.1.12/24(if1)
                               |                               |
 192.0.1.0/24 -------------+---+------ 2001:db8:0:1::/64       |       (198.18.45)
                           |                                   |
                   001:db8:0:1::15/64                          |
                       192.0.1.15/24(if1)                      |
     192.18.15.15/24--${OS}SET                                 |
                    2001:db8:1::15/64                          |
                      198.18.1.15/24(if1)                      |
                           |                                   |
 198.18.1.0/24 ------------+-----------------------------------+----- 2001:db8:1::/64

Problems with the existing Network

The current network has a number of limitations. This section identifies those problems and proposes changes to address them:

  • the gateway had only 128 DHCP addresses
  • public networks are being used: 192.0.1.0/24 is owned by elevatedcomputing.com; 192.1.2.0/24 is owned by raytheon.com; 192.1.3.0/24 is owned by raytheon.com
  • 192.168.234.0/24 (gateway) is reserved for private use networks and known to clash with toronto airport
  • using public interfaces means that they can't be used in documentation
  • can't test two hosts where each is behind a gateway VPN

see Ref https://www.rfc-editor.org/rfc/rfc5737 https://www.rfc-editor.org/rfc/rfc3849 https://www.rfc-editor.org/rfc/rfc6890

The suggestion is:

  • use the benchmarking network 198.18.0.0/15
  • reserve 198.19/16 for the gateway
  • revive [sun]RISE (behind EAST) and [sun]SET (behind WEST)
  • reserve a number N for each machine / network: EAST: 23; WEST: 45; RISE: 123; SET: 145; NORTH: 33; NIC: 254
  • use 198.18.2N.N/24 for IPsec Interfaces
  • use 192.18.1N.1N/24 for RISE and SET
 LEFT                                                               RIGHT
 
 198.18.254.0/24 ----------------------------------+-------------- 2001:db8:254::/64
                                                   |
                                           2001:db8:254::254
                                              198.18.254.254(eth0)
                      ROAD                       NORTH
                  198.18.3.209(eth0)         198.18.3.33(eth1)
                2001:db8:3::209            2001:db8:3::33
                        |                          |
 198.18.3.0/254 --------+----------------+---------+---------------- 2001:db8:3::/64
                                         |
                                 2001:db8:3::254
                                    198.18.3.254(eth2)
                                        NIC---swandefault(0)
                                   198.18.2.254(eth1)
                                 2001:db8:2::254
                                         |
                                         |
 198.18.2.0/24 ----------------+---------+-----------------+-------- 2001:db8:2::/64
                               |                           |
                       2001:db8:2::45              2001:db8:2::23
                         198.18.2.45(eth1)           198.18.2.23(eth1)
             192.18.45.45/24--WEST                        EAST--198.18.23.23/24
                          192.0.1.45(eth0/2)          192.0.2.23(eth0/2)
                      001:db8:0:1::45            2001:db8:0:2::23
                               |                           |
                               |                           |
                               |                       TEST-NET-1
                               |       192.0.2.0/24 -------+---+-- 2001:db8:0:2::/64
                               |                               |
                               |                     2001:db8:0:2::12/64
                               |                        192.0.2.12/24(if1)
                               |                          ${OS}RISE--198.18.12.12/24
                               |                       2001:db8:1::12/64
                               |                         198.18.1.12/24(if1)
                               |                               |
 192.0.1.0/24 -------------+---+------ 2001:db8:0:1::/64       |
                           |                                   |
                  001:db8:0:1::15/64                           |
                      192.0.1.15/24(if1)                       |
   192.18.15.15/24--${OS}SET                                   |
                   2001:db8:1::15/64                           |
                     198.18.1.15/24(if1)                       |
                           |                                   |
 198.18.1.0/24 ------------+-----------------------------------+----- 2001:db8:1::/64

The changes are as follows:

Hand Sketch of Current Network

Networksketch.png

Original Network Diagram

Testnet.png