Benchmarking and Performance testing: Difference between revisions

From Libreswan
Jump to navigation Jump to search
No edit summary
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:


The performance of an IPsec system depends on CPU, RAM, NICs, switches, kernel and configuration.
The performance of an IPsec system depends on CPU, RAM, NICs, switches, kernel and configuration.
{{ ambox | nocat=true | type=important | text = All tests were performed using a network MTU setting of 9000 unless otherwise noted. This is crucial when using 10GigE cards!}}
Note that the settings of the NIC and the settings for Replay Protction (replay-window=) can greatly influence performance. It might be useful to disable Replay Protection using replay-window=0 or to set it to a very large value (eg 2048)




Line 45: Line 51:


type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
aes-256-cbc    241508.56k  266220.03k  273663.06k  276314.11k  275479.81k
aes-256-cbc    241508.56k  266220.03k  273663.06k  276314.11k  275479.81k


Line 50: Line 57:


type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
aes-256-cbc    502470.66k  528580.69k  532890.45k  535901.87k  536368.47k
aes-256-cbc    502470.66k  528580.69k  532890.45k  535901.87k  536368.47k


Line 55: Line 63:


type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
aes-128-cbc    320425.43k  366515.97k  377561.00k  383643.99k  383777.51k
aes-128-cbc    320425.43k  366515.97k  377561.00k  383643.99k  383777.51k


Line 60: Line 69:


type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
aes-128-cbc    688604.26k  732936.83k  742459.28k  748241.92k  748756.99k
aes-128-cbc    688604.26k  732936.83k  742459.28k  748241.92k  748756.99k


With AES-NI, using all cores : openssl speed -multi 8 -evp aes-256-cbc
Without AES-NI, using all cores : openssl speed -multi 8 -evp aes-256-cbc


evp            3729202.24k  4009617.79k  4053305.43k  4065434.97k  4068764.33k
evp            3729202.24k  4009617.79k  4053305.43k  4065434.97k  4068764.33k
Line 70: Line 80:
evp            5033772.55k  5494390.59k  5632183.30k  5668856.15k  5679707.48k
evp            5033772.55k  5494390.59k  5632183.30k  5668856.15k  5679707.48k


=== NIC settigs ===
=== NIC settings ===


<pre>
<pre>
Line 131: Line 141:
loopback: off [fixed]
loopback: off [fixed]
</pre>
</pre>
== IBM x3550m4 ==
[http://www-03.ibm.com/systems/x/hardware/rack/x3550m4/specs.html Specifications from IBM]
* 12x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
* 32GB RAM
* Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01) cross cabled using ixgbe eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
* MTU set to 9000 unless specified otherwise
* RHEL 6.6 running 2.6.32-504.el6.x86_64
* AESNI supported and used for all IPsec operations
=== IPsec performance measured with iperf ===
iperf used with default settings
* 9.41 Gbits/sec unencrypted without IPsec
* 4.03 Gbits/sec IPsec AES_GCM128 (esp=aes_gcm128-null)
* 903 Mbit/sec IPsec AES_GCM128 (esp=aes_gcm128-null) on MTU 1500
* 1.26 Gbits/sec IPsec NULL-SHA1 (esp=null-sha1)
* 733 Mbits/sec IPsec NULL-AES_XCBC (esp=null-aes_xcbc)
* 643 Mbits/sec IPsec AES128-SHA1 (esp=aes128-sha1) at MTU 1500
* 935 Mbits/sec IPsec AES128-SHA1 (esp=aes128-sha1)
* 870 Mbits/sec IPsec AES256-SHA1 (esp=aes256-sha1)
* 656 Mbits/sec IPsec AES256-AES_XCBC (esp=aes256-aes_xcbc)
* 127 Mbits/sec IPsec 3DES-SHA1 (esp=3des-sha1)
* 1.10 Gbits/sec IPsec AES128_CTR-SHA1 (esp=aes_ctr128-sha1)
* 919 Mbits/sec IPsec AES256_CTR-SHA1 (esp=aes_ctr256-sha1)
=== CPU/crypto performance measured with openssl ===
(AES-NI disabling done via export OPENSSL_ia32cap=~0x200000200000000)
Without AES-NI, no multi: openssl speed -evp aes-256-cbc
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc    181371.98k  202129.30k  207514.37k  208667.99k  210778.24k
With AES-NI, no multi: openssl speed -evp aes-256-cbc
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc    369217.05k  390857.40k  393860.01k  394961.58k  395264.00k
Without AES-NI, no multi: openssl speed -evp aes-128-cbc
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc    231156.81k  275887.45k  285929.05k  289998.17k  293098.25k
With AES-NI, no multi: openssl speed -evp aes-128-cbc
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc    506361.18k  542297.64k  549176.92k  551389.53k  553905.23k
Without AES-NI, using all cores : openssl speed -multi 11 -evp aes-256-cbc
evp            1101164.44k  1207916.80k  1230362.03k  1242882.62k  1242842.52k
With AES-NI, using all cores : openssl speed -multi 11 -evp aes-128-cbc
evp            3918149.13k  5065989.57k  5471001.60k  5583504.38k  5609387.35k
=== NIC settings ===
<pre>
# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes:  100baseT/Full
                        1000baseT/Full
                        10000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  100baseT/Full
                        1000baseT/Full
                        10000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 10000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: external
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
      drv probe link
Link detected: yes
</pre>
== x86_64 NUMA Xeon with Intel QuickAssist PCIe ==
This RHEL7 Xeon system has 6 Xeon E5-2630 CPU's @ 2.60GHz. The NIC is a 10Gbps Intel 82599ES with 6 RSS channels (ixgbe). The interesting bit about this system is that is uses a  [https://www-ssl.intel.com/content/www/us/en/ethernet-products/gigabit-server-adapters/quickassist-adapter-for-servers.html Intel QuickAssist PCIe] crypto accelerator card. This device shows up in lspci as "[[Intel Corporation Coleto Creek PCIe Endpoint]]". The kernel modules for this card required are the ''icp_qat_netkey.ko'' and ''icp_qa_al.ko'' modules.
The system seems to max out at about [[7Gbps IPsec traffic]] using AES_CBC. The accelerator does not support AES_GCM, so using AES_GCM caused a reduction in performance. It used between 10-20 IPsec SA's at once. Without the QuickAssist card, the performance is only half - around 3 Gbps.
It was noticed that only two CPU's are loaded without moving load onto further CPU's. The XFRM crypto implementation uses a single workqueue for encrypt and a single workqueue for decrypt, resulting in seeing two CPUs pinned on SoftIRQ processing. Therefor adding more IPsec SA's to distribute the crypto load over the other CPU's has no effect - the limitation is in the decapsulation that for a single IPsec SA is always limited to a single CPU.
The pcrypt kernel module adds more work queues distributed over more CPU's, but does not actually improve the performance. The problem is that a lot of packets then arrive out of order and with the IPsec reply protection with a standard replay-window it actually reduces the overall throughput. (and it seems Linux currently doesn't allow setting a replay-window > 32)

Latest revision as of 17:02, 19 August 2016

The performance of an IPsec system depends on CPU, RAM, NICs, switches, kernel and configuration.



Note that the settings of the NIC and the settings for Replay Protction (replay-window=) can greatly influence performance. It might be useful to disable Replay Protection using replay-window=0 or to set it to a very large value (eg 2048)


The Alteeve Niche's Anvil RN2-M2 platform

Hardware used for this testing was supplied by Alteeve Niche's.

The platform is based on a set of Fujitsu RX300 S8 servers (specification) The machine has a number of Intel Corporation 82599ES 10-Gigabit cards that are bonded. All NICs are connected to a set of Brocade ICX6610-24 switches. We picked one bonded pair of 10Gbps on interface bond1 for our IPsec tests. The Anvil comes with an 8 core Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz with AES-NI support. The MTU was left at the default 9k setting. The kernel used was 2.6.32-504.1.3.el6.x86_64.

IPsec performance measured with iperf

iperf used with default settings

  • 9.78 Gbits/sec unencrypted without IPsec
  • 5.25 Gbits/sec IPsec AES_GCM128 (esp=aes_gcm128-null)
  • 1.78 Gbits/sec IPsec NULL-SHA1 (esp=null-sha1)
  • 1.19 Gbits/sec IPsec NULL-AES_XCBC (esp=null-aes_xcbc)
  • 1.39 Gbits/sec IPsec AES128-SHA1 (esp=aes128-sha1)
  • 1.27 Gbits/sec IPsec AES256-SHA1 (esp=aes256-sha1)
  • 904 Mbits/sec IPsec AES256-AES_XCBC (esp=aes256-aes_xcbc)
  • 197 Mbits/sec IPsec 3DES-SHA1 (esp=3des-sha1)

We did some additional tests, but those are less accurate. using protoport= we could use multiple IPsec SA's (in the hope that it would distribute better) or have encrypted and unencrypted streams going.

  • two streams, one plaintext 8.64 Gbits/sec plaintext plus 1.24 Gbits/sec AES256-SHA1
  • two streams AES256-SHA1: 819 Mbits/sec plus 615 Mbits/sec (possibly was aes128)


CPU/crypto performance measured with openssl

(AES-NI disabling done via export OPENSSL_ia32cap=~0x200000200000000)

Without AES-NI, no multi: openssl speed -evp aes-256-cbc

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes

aes-256-cbc 241508.56k 266220.03k 273663.06k 276314.11k 275479.81k

With AES-NI, no multi: openssl speed -evp aes-256-cbc

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes

aes-256-cbc 502470.66k 528580.69k 532890.45k 535901.87k 536368.47k

Without AES-NI, no multi: openssl speed -evp aes-128-cbc

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes

aes-128-cbc 320425.43k 366515.97k 377561.00k 383643.99k 383777.51k

With AES-NI, no multi: openssl speed -evp aes-128-cbc

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes

aes-128-cbc 688604.26k 732936.83k 742459.28k 748241.92k 748756.99k

Without AES-NI, using all cores : openssl speed -multi 8 -evp aes-256-cbc

evp 3729202.24k 4009617.79k 4053305.43k 4065434.97k 4068764.33k

With AES-NI, using all cores : openssl speed -multi 8 -evp aes-128-cbc

evp 5033772.55k 5494390.59k 5632183.30k 5668856.15k 5679707.48k

NIC settings

#ethtool eth1
Settings for eth1:
	Supported ports: [ FIBRE ]
	Supported link modes:   10000baseT/Full 
	Supported pause frame use: No
	Supports auto-negotiation: No
	Advertised link modes:  10000baseT/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: No
	Speed: 10000Mb/s
	Duplex: Full
	Port: Other
	PHYAD: 0
	Transceiver: external
	Auto-negotiation: off
	Supports Wake-on: umbg
	Wake-on: g
	Current message level: 0x00000007 (7)
			       drv probe link
	Link detected: yes

# ethtool -k eth1
Features for eth1:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-unneeded: off
	tx-checksum-ip-generic: off
	tx-checksum-ipv6: on
	tx-checksum-fcoe-crc: on [fixed]
	tx-checksum-sctp: on [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off
	tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: on
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
loopback: off [fixed]

IBM x3550m4

Specifications from IBM

  • 12x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
  • 32GB RAM
  • Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01) cross cabled using ixgbe eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
  • MTU set to 9000 unless specified otherwise
  • RHEL 6.6 running 2.6.32-504.el6.x86_64
  • AESNI supported and used for all IPsec operations

IPsec performance measured with iperf

iperf used with default settings

  • 9.41 Gbits/sec unencrypted without IPsec
  • 4.03 Gbits/sec IPsec AES_GCM128 (esp=aes_gcm128-null)
  • 903 Mbit/sec IPsec AES_GCM128 (esp=aes_gcm128-null) on MTU 1500
  • 1.26 Gbits/sec IPsec NULL-SHA1 (esp=null-sha1)
  • 733 Mbits/sec IPsec NULL-AES_XCBC (esp=null-aes_xcbc)
  • 643 Mbits/sec IPsec AES128-SHA1 (esp=aes128-sha1) at MTU 1500
  • 935 Mbits/sec IPsec AES128-SHA1 (esp=aes128-sha1)
  • 870 Mbits/sec IPsec AES256-SHA1 (esp=aes256-sha1)
  • 656 Mbits/sec IPsec AES256-AES_XCBC (esp=aes256-aes_xcbc)
  • 127 Mbits/sec IPsec 3DES-SHA1 (esp=3des-sha1)
  • 1.10 Gbits/sec IPsec AES128_CTR-SHA1 (esp=aes_ctr128-sha1)
  • 919 Mbits/sec IPsec AES256_CTR-SHA1 (esp=aes_ctr256-sha1)

CPU/crypto performance measured with openssl

(AES-NI disabling done via export OPENSSL_ia32cap=~0x200000200000000)

Without AES-NI, no multi: openssl speed -evp aes-256-cbc

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes

aes-256-cbc 181371.98k 202129.30k 207514.37k 208667.99k 210778.24k

With AES-NI, no multi: openssl speed -evp aes-256-cbc

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes

aes-256-cbc 369217.05k 390857.40k 393860.01k 394961.58k 395264.00k

Without AES-NI, no multi: openssl speed -evp aes-128-cbc

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes

aes-128-cbc 231156.81k 275887.45k 285929.05k 289998.17k 293098.25k

With AES-NI, no multi: openssl speed -evp aes-128-cbc

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes

aes-128-cbc 506361.18k 542297.64k 549176.92k 551389.53k 553905.23k

Without AES-NI, using all cores : openssl speed -multi 11 -evp aes-256-cbc

evp 1101164.44k 1207916.80k 1230362.03k 1242882.62k 1242842.52k

With AES-NI, using all cores : openssl speed -multi 11 -evp aes-128-cbc

evp 3918149.13k 5065989.57k 5471001.60k 5583504.38k 5609387.35k

NIC settings

# ethtool eth0
Settings for eth0:
	Supported ports: [ TP ]
	Supported link modes:   100baseT/Full 
	                        1000baseT/Full 
	                        10000baseT/Full 
	Supported pause frame use: No
	Supports auto-negotiation: Yes
	Advertised link modes:  100baseT/Full 
	                        1000baseT/Full 
	                        10000baseT/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Speed: 10000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 0
	Transceiver: external
	Auto-negotiation: on
	MDI-X: Unknown
	Supports Wake-on: d
	Wake-on: d
	Current message level: 0x00000007 (7)
			       drv probe link
	Link detected: yes

x86_64 NUMA Xeon with Intel QuickAssist PCIe

This RHEL7 Xeon system has 6 Xeon E5-2630 CPU's @ 2.60GHz. The NIC is a 10Gbps Intel 82599ES with 6 RSS channels (ixgbe). The interesting bit about this system is that is uses a Intel QuickAssist PCIe crypto accelerator card. This device shows up in lspci as "Intel Corporation Coleto Creek PCIe Endpoint". The kernel modules for this card required are the icp_qat_netkey.ko and icp_qa_al.ko modules.

The system seems to max out at about 7Gbps IPsec traffic using AES_CBC. The accelerator does not support AES_GCM, so using AES_GCM caused a reduction in performance. It used between 10-20 IPsec SA's at once. Without the QuickAssist card, the performance is only half - around 3 Gbps.

It was noticed that only two CPU's are loaded without moving load onto further CPU's. The XFRM crypto implementation uses a single workqueue for encrypt and a single workqueue for decrypt, resulting in seeing two CPUs pinned on SoftIRQ processing. Therefor adding more IPsec SA's to distribute the crypto load over the other CPU's has no effect - the limitation is in the decapsulation that for a single IPsec SA is always limited to a single CPU.

The pcrypt kernel module adds more work queues distributed over more CPU's, but does not actually improve the performance. The problem is that a lot of packets then arrive out of order and with the IPsec reply protection with a standard replay-window it actually reduces the overall throughput. (and it seems Linux currently doesn't allow setting a replay-window > 32)