Revision as of 18:06, 10 August 2020

Software - what is run by each test

Boot the VMs

Before a test can be run all the VMs are (re)booted. Consequently one obvious way to speed up testing is to reduce the amount of time it takes to boot:

make the boot faster - it should be around 1s

boot several machines in parallel - however booting is CPU intensive (see below for analysis)

To determine where a VM is spending its time during boot, use systemd-analyze blame (do several runs, the very first boot does extra configuration so is always be slower):

$ date
Mon 10 Aug 2020 11:04:03 AM EDT
[cagney@bernard wip-logging]$ ./testing/utils/kvmsh.py --boot cold l.east 'systemd-analyze time ; systemd-analyze critical-chain ; systemd-analyze blame'
...
[root@east ~]# systemd-analyze time ; systemd-analyze critical-chain ; systemd-analyze blame
Startup finished in 1.319s (kernel) + 2.141s (initrd) + 4.130s (userspace) = 7.592s 
multi-user.target reached after 4.087s in userspace
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

multi-user.target @4.087s
└─plymouth-quit.service @4.056s +27ms
  └─systemd-user-sessions.service @3.999s +41ms
    └─network.target @3.988s
      └─systemd-networkd.service @1.315s +180ms
        └─systemd-udevd.service @1.093s +207ms
          └─systemd-tmpfiles-setup-dev.service @965ms +102ms
            └─kmod-static-nodes.service @689ms +207ms
              └─systemd-journald.socket
                └─system.slice
                  └─-.slice
2.485s systemd-networkd-wait-online.service
 663ms initrd-switch-root.service          
 529ms systemd-vconsole-setup.service      
 483ms sssd.service                        
 422ms systemd-udev-trigger.service        
 275ms sysroot.mount                       
 216ms systemd-logind.service              
 211ms modprobe@drm.service                
 207ms systemd-udevd.service               
 207ms kmod-static-nodes.service           
 201ms dev-mqueue.mount                    
 200ms sys-kernel-debug.mount              
 199ms sys-kernel-tracing.mount            
 196ms tmp.mount                           
 195ms dev-hugepages.mount                 
 191ms systemd-homed.service               
 182ms systemd-journald.service            
 181ms initrd-parse-etc.service            
 180ms systemd-networkd.service            
 175ms systemd-modules-load.service        
 175ms systemd-tmpfiles-setup.service      
 162ms systemd-journal-flush.service       
 159ms systemd-remount-fs.service          
 140ms systemd-repart.service              
 130ms user@0.service                      
 128ms auditd.service                      
 128ms chronyd.service                     
 116ms source.mount                        
 114ms testing.mount                       
 102ms systemd-tmpfiles-setup-dev.service  
  96ms dbus-broker.service                 
  70ms initrd-cleanup.service              
  62ms sshd.service                        
  52ms systemd-sysctl.service              
  46ms systemd-userdbd.service             
  41ms systemd-user-sessions.service       
  38ms systemd-random-seed.service         
  31ms plymouth-read-write.service         
  31ms systemd-fsck-root.service           
  30ms dracut-shutdown.service             
  27ms plymouth-quit.service               
  27ms plymouth-switch-root.service        
  20ms user-runtime-dir@0.service          
  17ms systemd-update-utmp.service         
  16ms plymouth-quit-wait.service          
  16ms systemd-update-utmp-runlevel.service
  15ms initrd-udevadm-cleanup-db.service   
   9ms sys-kernel-config.mount

Run the Test Scripts

To establish a baseline, enumcheck-01, which pretty much nothing, takes ~2s to run the test scripts once things are booted:

w.runner enumcheck-01 32.08/32.05: start running scripts west:west.sh west:final.sh at 2018-10-24 22:00:44.706355
...
w.runner enumcheck-01 34.03/34.00: stop running scripts west:west.sh west:final.sh after 1.5 seconds

everything else is slower.

To get a list of script times:

$ awk '/: stop running scripts/ { print $3, $(NF-1) }' testing/pluto/*/OUTPUT/debug.log | sort -k2nr | head -5
newoe-05-hold-pass 295.6
newoe-04-pass-pass 226.7
ikev2-01-fallback-ikev1 212.5
newoe-10-expire-inactive-ike 205.6
ikev2-32-nat-rw-rekey 205.4

which can then be turned into a histogram:

sleep

ping

timeout

Perform Post-mortem

This seems to be in the noise vis:

m1.runner ipsec-hostkey-ckaid-01 12:44:50.01: start post-mortem ipsec-hostkey-ckaid-01 (test 725 of 739) at 2018-10-25 09:40:49.748041
m1.runner ipsec-hostkey-ckaid-01 12:44:50.03: ****** ipsec-hostkey-ckaid-01 (test 725 of 739) passed ******
m1.runner ipsec-hostkey-ckaid-01 12:44:50.03: stop post-mortem ipsec-hostkey-ckaid-01 (test 725 of 739) after 0.2 seconds

KVM Hardware

What the test runs on

Disk I/O

Something goes here?

Memory

How much is needed?

CPU

Anything Here? Allowing use of HOST's h/w accelerators?

Docker Hardware

?

Tuning kvm performance

Internally kvmrunner.py has two work queues:

a pool of reboot threads; each thread reboots one domain at a time
a pool of test threads; each thread runs one test at a time using domains with a unique prefix

The test threads uses the reboot thread pool as follows:

get the next test
submit required domains to reboot pool
wait for domains to reboot
run test
repeat

By adjusting KVM_WORKERS and KVM_PREFIXES it is possible to:

speed up test runs
run independent testsuites in parallel

By adjusting KVM_LOCALDIR it is possible to:

use a faster disk or even tmpfs (on /tmp)

KVM_WORKERS=... -- the number of test domains (machines) booted in parallel

Booting the domains is the most CPU intensive part of running a test, and trying to perform too many reboots in parallel will bog down the machine to the point where tests time out and interactive performance becomes hopeless. For this reason a pre-sized pool of reboot threads is used to reboot domains:

the default is 1 reboot thread limiting things to one domain reboot at a time
KVM_WORKERS specifies the number of reboot threads, and hence, the reboot parallelism
increasing this allows more domains to be rebooted in parallel
however, increasing this consumes more CPU resources

To increase the size of the reboot thread pool set KVM_WORKERS. For instance:

$ grep KVM_WORKERS Makefile.inc.local
KVM_WORKERS=2
$ make kvm-install kvm-test
[...]
runner 0.019: using a pool of 2 worker threads to reboot domains
[...]
runner basic-pluto-01 0.647/0.601: 0 shutdown/reboot jobs ahead of us in the queue
runner basic-pluto-01 0.647/0.601: submitting shutdown jobs for unused domains: road nic north
runner basic-pluto-01 0.653/0.607: submitting boot-and-login jobs for test domains: east west
runner basic-pluto-01 0.654/0.608: submitted 5 jobs; currently 3 jobs pending
[...]
runner basic-pluto-01 28.585/28.539: domains started after 28 seconds

Only if your machine has lots of cores should you consider adjusting this in Makefile.inc.local.

KVM_PREFIXES=... -- create a pool of test domains (machines)

Tests spend a lot of their time waiting for timeouts or slow tasks to complete. So that tests can be run in parallel the KVM_PREFIX provides a list of prefixes to add to the host names forming unique domain groups that can each be used to run tests:

the default is no prefix limiting things to a single global domain pool
KVM_PREFIXES specifies the domain prefixes to use, and hence, the test parallelism
increasing this allows more tests to be run in parallel
however, increasing this consumes more memory and context switch resources

For instance, setting KVM_PREFIXES in Makefile.inc.local to specify a unique set of domains for this directory:

$ grep KVM_PREFIX Makefile.inc.local
KVM_PREFIX=a.
$ make kvm-install
[...]
$ make kvm-test
[...]
runner 0.018: using the serial test processor and domain prefix 'a.'
[...]
a.runner basic-pluto-01 0.574: submitting boot-and-login jobs for test domains: a.west a.east

And setting KVM_PREFIXES in Makefile.inc.local to specify two prefixes and, consequently, run two tests in parallel:

$ grep KVM_PREFIX Makefile.inc.local
KVM_PREFIX=a. b.
$ make kvm-install
[...]
$ make kvm-test
[...]
runner 0.019: using the parallel test processor and domain prefixes ['a.', 'b.']
[...]
b.runner basic-pluto-02 0.632/0.596: submitting boot-and-login jobs for test domains: b.west b.east
[...]
a.runner basic-pluto-01 0.769/0.731: submitting boot-and-login jobs for test domains: a.west a.east

creates and uses two dedicated domain/network groups (a.east ..., and b.east ...).

Finally, to get rid of all the domains use:

$ make kvm-uninstall

or even:

$ make KVM_PREFIX=b. kvm-uninstall

Two domain groups (e.x., KVM_PREFIX=a. b.) seems to give the best results.

Note that this is still somewhat experimental and has limitations:

stopping parallel tests requires multiple control-c's
since the duplicate domains have the same IP address, things like "ssh east" don't apply; use "make kvmsh-<prefix><domain>" or "sudo virsh console <prefix><domain" or "./testing/utils/kvmsh.py <prefix><domain>".

KVM_LOCALDIR=/tmp/pool -- the directory containing the test domain (machine) disks

To reduce disk I/O, it is possible to store the test domain disks in ram using tmpfs and /tmp. Here's a nice graph illustrating what happens when the option is set:

Recommendations

Some Analysis

The test system:

4-core 64-bit intel
plenty of ram
the file mk/perf.sh

Increasing the number of parallel tests, for a given number of reboot threads:

having #cores/2 reboot threads has the greatest impact
having more than #cores reboot threads seems to slow things down

Increasing the number of reboots, for a given number of test threads:

adding a second test thread has a far greater impact than adding a second reboot thread - contrast top lines
adding a third and even fourth test thread - i.e., up to #cores - still improves things

Finally here's some ASCII art showing what happens to the failure rate when the KVM_PREFIX is set so big that the reboot thread pool is kept 100% busy:

                  Fails  Reboots  Time
     ************  127      1     6:35  ****************************************
   **************  135      2     3:33  *********************
  ***************  151      3     3:12  *******************
  ***************  154      4     3:01  ******************

Notice how having more than #cores/2 KVM_WORKERS (here 2) has little benefit and failures edge upwards.

Desktop Development Directory

reduce build/install time - use only one prefix
reduce single-test time - boot domains in parallel
use the non-prefix domains east et.al. so it is easy to access the test domains using tools like ssh

Lets assume 4 cores:

KVM_WORKERS=2
KVM_PREFIX=''

You could also add a second prefix vis:

KVM_PREFIX= '' a.

but that, unfortunately, slows down the the build/install time.

Desktop Baseline Directory

do not overload the desktop - reduce CPU load by booting sequentially
reduce total testsuite time - run tests in parallel
keep separate to development directory above

Lets assume 4 cores

KVM_WORKERS=1
KVM_PREFIX= b1. b2.

Dedicated Test Server

minimize total testsuite time
maximize CPU use
assume only testsuite running

Assuming 4 cores:

* KVM_WORKERS=2
* KVM_PREFIX= '' t1. t2. t3.

Test Suite - Performance: Difference between revisions

Revision as of 18:06, 10 August 2020

Contents

Software - what is run by each test

Boot the VMs

Run the Test Scripts

sleep

ping

timeout

Perform Post-mortem

KVM Hardware

Disk I/O

Memory

CPU

Docker Hardware

Tuning kvm performance

KVM_WORKERS=... -- the number of test domains (machines) booted in parallel

KVM_PREFIXES=... -- create a pool of test domains (machines)

KVM_LOCALDIR=/tmp/pool -- the directory containing the test domain (machine) disks

Recommendations

Some Analysis

Desktop Development Directory

Desktop Baseline Directory

Dedicated Test Server

Navigation menu

Test Suite - Performance: Difference between revisions

Revision as of 18:06, 10 August 2020

Software - what is run by each test

Boot the VMs

Run the Test Scripts

sleep

ping

timeout

Perform Post-mortem

KVM Hardware

Disk I/O

Memory

CPU

Docker Hardware

Tuning kvm performance

KVM_WORKERS=... -- the number of test domains (machines) booted in parallel

KVM_PREFIXES=... -- create a pool of test domains (machines)

KVM_LOCALDIR=/tmp/pool -- the directory containing the test domain (machine) disks

Recommendations

Some Analysis

Desktop Development Directory

Desktop Baseline Directory

Dedicated Test Server

Navigation menu

Search