Revision as of 16:08, 25 October 2018

Software - what is run by each test

Boot the VMs

Before a test can be run all the VMs are (re)booted. Consequently one obvious way to speed up testing is to reduce the amount of time it takes to boot:

make the boot faster - it should be around 1s

boot several machines in parallel - however booting is CPU intensive (see below for analysis)

To determine where a VM is spending its time during boot, use systemd-analyze blame (do several runs, the very first boot does extra configuration so is always be slower):

$ ./testing/utils/kvmsh.py --boot cold w.east systemd-analyze blame
virsh 0.01: starting domain
virsh 15.05: 'login: ' matched after 13.7 seconds
virsh 15.05: waiting 10 seconds for login prompt; 5 seconds for password prompt; 5 seconds for shell prompt
virsh 15.05: 'login: ' matched after 0.0 seconds
virsh 15.09: We're in after 0.5 seconds!

[root@east ~]# systemd-analyze blame
          7.576s network.service
           843ms dev-vda1.device
           781ms initrd-switch-root.service
           776ms systemd-vconsole-setup.service
           679ms sssd.service
           526ms systemd-udev-trigger.service
           235ms sysroot.mount
           180ms dev-mqueue.mount
           174ms dracut-cmdline.service
           170ms kmod-static-nodes.service
           162ms systemd-journald.service
           159ms sys-kernel-debug.mount
           152ms dev-hugepages.mount
           151ms user@0.service
           144ms systemd-tmpfiles-setup-dev.service
           144ms systemd-tmpfiles-setup.service
           143ms systemd-remount-fs.service
           137ms testing.mount
           135ms fedora-readonly.service
           135ms systemd-modules-load.service
           127ms source.mount
           119ms systemd-journal-flush.service
           107ms initrd-parse-etc.service
            ....

Run the Test Scripts

To establish a baseline, enumcheck-01, which pretty much nothing, takes ~2s to run the test scripts once things are booted:

w.runner enumcheck-01 32.08/32.05: start running scripts west:west.sh west:final.sh at 2018-10-24 22:00:44.706355
...
w.runner enumcheck-01 34.03/34.00: stop running scripts west:west.sh west:final.sh after 1.5 seconds

everything else is slower.

To get a list of script times:

$ awk '/: stop running scripts/ { print $3, $(NF-1) }' testing/pluto/*/OUTPUT/debug.log | sort -k2nr | head -5
newoe-05-hold-pass 295.6
newoe-04-pass-pass 226.7
ikev2-01-fallback-ikev1 212.5
newoe-10-expire-inactive-ike 205.6
ikev2-32-nat-rw-rekey 205.4

which can then be turned into a histogram:

sleep

ping

timeout

Perform Post-mortem

This seems to be in the noise vis:

m1.runner ipsec-hostkey-ckaid-01 12:44:50.01: start post-mortem ipsec-hostkey-ckaid-01 (test 725 of 739) at 2018-10-25 09:40:49.748041
m1.runner ipsec-hostkey-ckaid-01 12:44:50.03: ****** ipsec-hostkey-ckaid-01 (test 725 of 739) passed ******
m1.runner ipsec-hostkey-ckaid-01 12:44:50.03: stop post-mortem ipsec-hostkey-ckaid-01 (test 725 of 739) after 0.2 seconds

KVM Hardware

What the test runs on

Disk I/O

Something goes here?

Memory

How much is needed?

CPU

Anything Here? Allowing use of HOST's h/w accelerators?

Docker Hardware

?

Case Study: speeding up "make kvm-test" by running things in parallel =

Internally kvmrunner.py has two work queues:

a pool of reboot threads; each thread reboots one domain at a time
a pool of test threads; each thread runs one test at a time using domains with a unique prefix

The test threads uses the reboot thread pool as follows:

get the next test
submit required domains to reboot pool
wait for domains to reboot
run test
repeat

My adjusting KVM_WORKERS and KVM_PREFIXES it is possible:

speed up test runs
run independent testsuites in parallel

The reboot thread pool - make KVM_WORKERS=...

Booting the domains is the most CPU intensive part of running a test, and trying to perform too many reboots in parallel will bog down the machine to the point where tests time out and interactive performance becomes hopeless. For this reason a pre-sized pool of reboot threads is used to reboot domains:

the default is 1 reboot thread limiting things to one domain reboot at a time
KVM_WORKERS specifies the number of reboot threads, and hence, the reboot parallelism
increasing this allows more domains to be rebooted in parallel
however, increasing this consumes more CPU resources

To increase the size of the reboot thread pool set KVM_WORKERS. For instance:

$ grep KVM_WORKERS Makefile.inc.local
KVM_WORKERS=2
$ make kvm-install kvm-test
[...]
runner 0.019: using a pool of 2 worker threads to reboot domains
[...]
runner basic-pluto-01 0.647/0.601: 0 shutdown/reboot jobs ahead of us in the queue
runner basic-pluto-01 0.647/0.601: submitting shutdown jobs for unused domains: road nic north
runner basic-pluto-01 0.653/0.607: submitting boot-and-login jobs for test domains: east west
runner basic-pluto-01 0.654/0.608: submitted 5 jobs; currently 3 jobs pending
[...]
runner basic-pluto-01 28.585/28.539: domains started after 28 seconds

Only if your machine has lots of cores should you consider adjusting this in Makefile.inc.local.

The tests thread pool - make KVM_PREFIXES=...

Note that this is still somewhat experimental and has limitations:

stopping parallel tests requires multiple control-c's
since the duplicate domains have the same IP address, things like "ssh east" don't apply; use "make kvmsh-<prefix><domain>" or "sudo virsh console <prefix><domain" or "./testing/utils/kvmsh.py <prefix><domain>".

Tests spend a lot of their time waiting for timeouts or slow tasks to complete. So that tests can be run in parallel the KVM_PREFIX provides a list of prefixes to add to the host names forming unique domain groups that can each be used to run tests:

the default is no prefix limiting things to a single global domain pool
KVM_PREFIXES specifies the domain prefixes to use, and hence, the test parallelism
increasing this allows more tests to be run in parallel
however, increasing this consumes more memory and context switch resources

For instance, setting KVM_PREFIXES in Makefile.inc.local to specify a unique set of domains for this directory:

$ grep KVM_PREFIX Makefile.inc.local
KVM_PREFIX=a.
$ make kvm-install
[...]
$ make kvm-test
[...]
runner 0.018: using the serial test processor and domain prefix 'a.'
[...]
a.runner basic-pluto-01 0.574: submitting boot-and-login jobs for test domains: a.west a.east

And setting KVM_PREFIXES in Makefile.inc.local to specify two prefixes and, consequently, run two tests in parallel:

$ grep KVM_PREFIX Makefile.inc.local
KVM_PREFIX=a. b.
$ make kvm-install
[...]
$ make kvm-test
[...]
runner 0.019: using the parallel test processor and domain prefixes ['a.', 'b.']
[...]
b.runner basic-pluto-02 0.632/0.596: submitting boot-and-login jobs for test domains: b.west b.east
[...]
a.runner basic-pluto-01 0.769/0.731: submitting boot-and-login jobs for test domains: a.west a.east

creates and uses two dedicated domain/network groups (a.east ..., and b.east ...).

Finally, to get rid of all the domains use:

$ make kvm-uninstall

or even:

$ make KVM_PREFIX=b. kvm-uninstall

Two domain groups (e.x., KVM_PREFIX=a. b.) seems to give the best results.

Recommendations

Some Analysis

The test system:

4-core 64-bit intel
plenty of ram
the file mk/perf.sh

Increasing the number of parallel tests, for a given number of reboot threads:

having #cores/2 reboot threads has the greatest impact
having more than #cores reboot threads seems to slow things down

Increasing the number of reboots, for a given number of test threads:

adding a second test thread has a far greater impact than adding a second reboot thread - contrast top lines
adding a third and even fourth test thread - i.e., up to #cores - still improves things

Finally here's some ASCII art showing what happens to the failure rate when the KVM_PREFIX is set so big that the reboot thread pool is kept 100% busy:

                  Fails  Reboots  Time
     ************  127      1     6:35  ****************************************
   **************  135      2     3:33  *********************
  ***************  151      3     3:12  *******************
  ***************  154      4     3:01  ******************

Notice how having more than #cores/2 KVM_WORKERS (here 2) has little benefit and failures edge upwards.

Desktop Development Directory

reduce build/install time - use only one prefix
reduce single-test time - boot domains in parallel
use the non-prefix domains east et.al. so it is easy to access the test domains using tools like ssh

Lets assume 4 cores:

KVM_WORKERS=2
KVM_PREFIX=''

You could also add a second prefix vis:

KVM_PREFIX= '' a.

but that, unfortunately, slows down the the build/install time.

Desktop Baseline Directory

do not overload the desktop - reduce CPU load by booting sequentially
reduce total testsuite time - run tests in parallel
keep separate to development directory above

Lets assume 4 cores

KVM_WORKERS=1
KVM_PREFIX= b1. b2.

Dedicated Test Server

minimize total testsuite time
maximize CPU use
assume only testsuite running

Assuming 4 cores:

* KVM_WORKERS=2
* KVM_PREFIX= '' t1. t2. t3.

@@ Line 1: / Line 1: @@
-== Boot Time ==
+= Software - what is run by each test =
+== Boot the VMs ==
 Before a test can be run all the VMs are (re)booted.  Consequently one obvious way to speed up testing is to reduce the amount of time it takes to boot:
@@ Line 44: / Line 46: @@
 </pre>
-== Test Time ==
+== Run the Test Scripts ==
-First to establish a baseline, <tt>enumcheck-01</tt>, which pretty much nothing, takes ~2s to run the test scripts once things are booted:
+To establish a baseline, <tt>enumcheck-01</tt>, which pretty much nothing, takes ~2s to run the test scripts once things are booted:
 <pre>
@@ Line 52: / Line 54: @@
 ...
 w.runner enumcheck-01 34.03/34.00: stop running scripts west:west.sh west:final.sh after 1.5 seconds
-...
-w.runner enumcheck-01 34.04: stop processing test enumcheck-01 (test 1 of 739) after 34.1 seconds
 </pre>
 everything else is slower.
-== Disk Performance ==
+To get a list of script times:
+<pre>
+$ awk '/: stop running scripts/ { print $3, $(NF-1) }' testing/pluto/*/OUTPUT/debug.log | sort -k2nr | head -5
+newoe-05-hold-pass 295.6
+newoe-04-pass-pass 226.7
+ikev2-01-fallback-ikev1 212.5
+newoe-10-expire-inactive-ike 205.6
+ikev2-32-nat-rw-rekey 205.4
+</pre>
+which can then be turned into a histogram:
+[[File:test-script-time-histogram.jpg]]
+=== sleep ===
+=== ping ===
+=== timeout ===
+== Perform Post-mortem ==
+This seems to be in the noise vis:
+<pre>
+m1.runner ipsec-hostkey-ckaid-01 12:44:50.01: start post-mortem ipsec-hostkey-ckaid-01 (test 725 of 739) at 2018-10-25 09:40:49.748041
+m1.runner ipsec-hostkey-ckaid-01 12:44:50.03: ****** ipsec-hostkey-ckaid-01 (test 725 of 739) passed ******
+m1.runner ipsec-hostkey-ckaid-01 12:44:50.03: stop post-mortem ipsec-hostkey-ckaid-01 (test 725 of 739) after 0.2 seconds
+</pre>
+= KVM Hardware =
+What the test runs on
+== Disk I/O ==
 Something goes here?
-== Speeding up "make kvm-test" by running things in parallel ==
+== Memory ==
+How much is needed?
+== CPU ==
+Anything Here?  Allowing use of HOST's h/w accelerators?
+= Docker Hardware =
+?
+= Case Study: speeding up "make kvm-test" by running things in parallel ==
 Internally kvmrunner.py has two work queues:
@@ Line 82: / Line 127: @@
 * run independent testsuites in parallel
-=== The reboot thread pool - make KVM_WORKERS=... ===
+== The reboot thread pool - make KVM_WORKERS=... ==
 Booting the domains is the most CPU intensive part of running a test, and trying to perform too many reboots in parallel will bog down the machine to the point where tests time out and interactive performance becomes hopeless.  For this reason a pre-sized pool of reboot threads is used to reboot domains:
@@ Line 110: / Line 155: @@
 Only if your machine has lots of cores should you consider adjusting this in Makefile.inc.local.
-=== The tests thread pool - make KVM_PREFIXES=... ===
+== The tests thread pool - make KVM_PREFIXES=... ==
 Note that this is still somewhat experimental and has limitations:
@@ Line 170: / Line 215: @@
 Two domain groups (e.x., KVM_PREFIX=a. b.) seems to give the best results.
-=== Recommendations ===
+== Recommendations ==
-==== Some Analysis ====
+=== Some Analysis ===
 The test system:
@@ Line 206: / Line 251: @@
 Notice how having more than #cores/2 KVM_WORKERS (here 2) has little benefit and failures edge upwards.
-==== Desktop Development Directory ====
+=== Desktop Development Directory ===
 * reduce build/install time - use only one prefix
@@ Line 227: / Line 272: @@
 but that, unfortunately, slows down the the build/install time.
-==== Desktop Baseline Directory ====
+=== Desktop Baseline Directory ===
 * do not overload the desktop - reduce CPU load by booting sequentially
@@ Line 238: / Line 283: @@
 * KVM_PREFIX= b1. b2.
-==== Dedicated Test Server ====
+=== Dedicated Test Server ===
 * minimize total testsuite time

Test Suite - Performance: Difference between revisions

Revision as of 16:08, 25 October 2018

Contents

Software - what is run by each test

Boot the VMs

Run the Test Scripts

sleep

ping

timeout

Perform Post-mortem

KVM Hardware

Disk I/O

Memory

CPU

Docker Hardware

Case Study: speeding up "make kvm-test" by running things in parallel =

The reboot thread pool - make KVM_WORKERS=...

The tests thread pool - make KVM_PREFIXES=...

Recommendations

Some Analysis

Desktop Development Directory

Desktop Baseline Directory

Dedicated Test Server

Navigation menu

Test Suite - Performance: Difference between revisions

Revision as of 16:08, 25 October 2018

Software - what is run by each test

Boot the VMs

Run the Test Scripts

sleep

ping

timeout

Perform Post-mortem

KVM Hardware

Disk I/O

Memory

CPU

Docker Hardware

Case Study: speeding up "make kvm-test" by running things in parallel =

The reboot thread pool - make KVM_WORKERS=...

The tests thread pool - make KVM_PREFIXES=...

Recommendations

Some Analysis

Desktop Development Directory

Desktop Baseline Directory

Dedicated Test Server

Navigation menu

Search