Cryptographic Acceleration: Difference between revisions

From Libreswan
Jump to navigation Jump to search
No edit summary
No edit summary
Line 217: Line 217:


The pcrypt and tcrypt kernel module allows the Linux kernel CryptoAPI to spread the crypto load of single IPsec SA's over multiple CPU's. If your VPN server has many hunderds of IPsec connections, these will already be spread out over the CPU's and pcrypt does not gain you much. However, if you have only one or a few IPsec tunnels that run at a high capacity, the pcrypt module might help you better use all your available CPU power. Also, if you are looking at increasing the throughput, it is very important to pick an ESP algorithm that is fast. Currently, AES_GCM outperforms everything (including esp=null-md5 !!) so that should be the algorithm used when trying to use pcrypt.
The pcrypt and tcrypt kernel module allows the Linux kernel CryptoAPI to spread the crypto load of single IPsec SA's over multiple CPU's. If your VPN server has many hunderds of IPsec connections, these will already be spread out over the CPU's and pcrypt does not gain you much. However, if you have only one or a few IPsec tunnels that run at a high capacity, the pcrypt module might help you better use all your available CPU power. Also, if you are looking at increasing the throughput, it is very important to pick an ESP algorithm that is fast. Currently, AES_GCM outperforms everything (including esp=null-md5 !!) so that should be the algorithm used when trying to use pcrypt.
{{ ambox | nocat=true | type=speedy | text = The pcrypt module is VERY UNSTABLE. Please be careful. If you have stability tips, please let us know at swan-dev@lists.libreswan.org }}


The documentation of the pcrypt / tcrypt module is very limited. The pcrypt module has to be enabled per algorithm:
The documentation of the pcrypt / tcrypt module is very limited. The pcrypt module has to be enabled per algorithm:
Line 223: Line 226:
modprobe pcrypt
modprobe pcrypt
modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
# optionally accelerate aes-sha1 and aes-sha2
modprobe tcrypt alg="pcrypt(authenc(hmac(sha1),cbc(aes)))" type=3
modprobe tcrypt alg="pcrypt(authenc(hmac(sha256),cbc(aes)))" type=3
</pre>
</pre>


Note that it has been reported that a system may crash if these commands are activated before any IPsec tunnel has been established. Similarly, it has been reported that after running the modprobe commands, the IPsec tunnel should be re-established to activate the acceleration.
Note that it has been reported that a system may crash if these commands are activated before any IPsec tunnel has been established. Similarly, we have reports that these commands must be run before starting any tunnels or otherwise it crashes. And we have had reports that after running the modprobe commands, the IPsec tunnel should be re-established to activate the acceleration.


Using multiple CPU's can cause packets to become encrypted out-of-order. It is important to either increase the replay-window to a value of 64 or 128, or to disable replay protection by setting replay-window to 0.
Using multiple CPU's can cause packets to become encrypted out-of-order. It is important to either increase the replay-window to a value of 64 or 128, or to disable replay protection by setting replay-window to 0.


There is a tool called [[crconf]] that is supposed to make this easier, but it seems to not be working for those who have tried to use it.
There is a tool called [[crconf]] that is supposed to make this easier, but it seems to not be working for those who have tried to use it.

Revision as of 00:31, 11 July 2017

OCF Hardware Crypto Acceleration

In general, cryptographic acceleration is mostly useful for embedded and low power systems. High end systems tend to have a CPU that is fast enough, especially when using AES_GCM on a multi-CPU system with AESNI support. Although if you are trying to get 5gbps or higher throughput, you might want to use a dedicated crypto acceleration card on top of the CPUs (such as the Intel QuickAssist PCIe)


There are a few methods for crypto hardware acceleration. The most complete one is the Open Cryptographic Framework ("OCF"), a port of the OpenBSD code. A newer more native implementation is the CryptoAPI async interface. The latter implementation is still extremely limited. It does not have as many drivers as OCF. But it just works without needing to do anything on the system. There is also the pcrypt kernel module that uses multiple CPUs for parallel processing.

Libreswan supports OCF using the KLIPS IPsec stack and using the XFRM/NETKEY stack. The latter needs the OCF cryptosoft.ko kernel module.

On an IPsec gateway, almost all the encryption is done inside the kernel when doing the actual IPsec encryption and decryption. The IKE negotiations also use encryption, but since this only involves a few packets per hour, there isn't really much point in accelerating these. While some acceleration might be available to the userland, support for that would be automatically enabled in the NSS crypto library when supported. The overhead of sending IKE packets to the kernel accelerator for encryption/decryption is not worth it and might actually slow things down.


OCF: Kernel and Userland crypto acceleration

OCF provides kernel crypto acceleration for IPsec (ESP and AH) as well as userland crypto acceleration via the /dev/crypto interface. Libreswan has removed OCF support for IKE, as the overhead is simply not worth the efforts even on embedded systems. It also required OCF patches to the TTY subsystem and thus an entire kernel recompile. If only kernel level acceleration is needed, OCF can be build as a kernel module without requiring a recompile of the entire base kernel. The userland /dev/crypto interface can however be used by other userland based applications that use openssl. For example, if the system also supports OpenVPN, it will be very useful to have OCF userland support in openssl.

For more information about OCF, see ocf-linux.sourceforge.net. To get an idea of the crypto acceleration see these OCF benchmarks

Supported hardware via OCF

  • Safenet SafeXcel 1741 and SafeXcel 1142
  • Intel IXP465, IXP425 and IXP422
  • Freescale SEC (Talitos) (this is also the bsec driver used on Linksys WRT54g, AsusWL500g)
  • PA Semi PWRficient DMA Crypto Engine
  • Intel EP80579 (Intel QuickAssist enabled EP80579 Integrated Processor Product Line)
  • PMC Sierra MSP-8520 (requires vendor-supplied source code for OCF)
  • Cavium Octeon (requires vendor-supplied source code for OCF)
  • Hifn 7951 and Hifn 7956

Supported hardware via native kernel

The OCF subsystem can interface with the native Linux kernel crypto acceleration system. It does so via the OCF cryptosoft kernel module.

  • VIA Padlock (via cryptosoft)
  • AMD Geode LX (via cryptosoft)
  • SMP (multi-core) support (via cryptosoft)

Supported algorithms and ciphers with libreswan

  • 3DES and 1DES
  • AES
  • SHA1
  • MD5

Note that SHA2 support for OCF with KLIPS is missing from the list!

Note that not all hardware implementations support all these algorithms and ciphers. Libreswan no longer supports 1DES because it is too insecure.

SMP support

The Linux NETKEY/XFRM native IPsec stack does not load balance a single IPsec SA over multiple CPU cores. Using the OCF cryptosoft driver, a single IPsec SA can be offloaded over multiple CPU's

Building libreswan with OCF support

Some OCF kernel builds are made available at libreswan.org, usually in the form of source and binary .deb or .rpm packages. the kernel packages provided support both kernel and userland OCF acceleration. You will also find patched openssl packages there which can be used in combination with openvpn to support hardware acceleration in userland.


Building kernel only OCF support as a module for running kernel

tar zxf ocf-linux-20120127.tar.gz
cd ocf-linux-20120127/ocf
make ocf_modules
sudo make ocf_install
OCF_DIR=`pwd`

You can test the OCF acceleration using a special benchmark module called ofc-bench. This is a kernel module that performs benchmarking when modprobe'd into the kernel. It will also fake an error to unload itself

modprobe ocf
modprobe cryptosoft
modprobe ocf-bench
dmesg | tail -5

You should see something along the lines of:

[  583.128741] OCF: 45133 requests of 1488 bytes in 251 jiffies (535.122 Mbps)

Building KLIPS with OCF support

To build libreswan KLIPS with OCF support, instead of using make module, use:

make KBUILD_EXTRA_SYMBOLS=$OCF_DIR/Module.symvers \
     MODULE_DEF_INCLUDE=`pwd`/packaging/ocf/config-all.hmodules \
     MODULE_DEFCONFIG=`pwd`/packaging/ocf/defconfig \
     module
sudo make KBUILD_EXTRA_SYMBOLS=$OCF_DIR/Module.symvers \
     MODULE_DEF_INCLUDE=`pwd`/packaging/ocf/config-all.hmodules \
     MODULE_DEFCONFIG=`pwd`/packaging/ocf/defconfig \
     minstall


Building userland OCF support for IKE

OCF support for IKE has been removed, as it is faster to use the CPU instruction support via the userland CPU instructions these days. Support for those instructions come in via the NSS library that libreswan uses. It has the advantage of not requiring a kernel patch.


Generate the kernel patches and apply the appropriate one

If you want OCF random support, you cannot just build the ocf as module, you also need to patch the kernel.

cd ocf
make patch

This will provide four files:

  • linux-2.4-ocf.patch
  • linux-2.6-ocf.patch
  • linux-3.1-ocf.patch
  • ocf-linux-base.patch

Depending on your kernel, pick the appropriate patch and apply it to your kernel, for example for a 3.1 kernel use:

cd linux-3.1
patch -p1 < linux-3.1-ocf.patch

To compile userland applications with OCF support, the cryptodev.h file needs to be installed on the system, for example in /usr/include/crypto/cryptodev.h


The OCF source code comes with openssl patches as well, please see the OCF source for further instructions.

How to load the OCF modules into the kernel

Libreswan comes with the _stackmanager script that loads all kernel modules and sets various parameters. These include all the native CryptoAPI acceleration modules. It does not auto-detect OCF support on disk, so before starting _stackmanager, ensure that the system has loaded the OCF core kernel module:

modprobe ocf

Libreswan will detect OCF support and load the userland (cryptodev) and software driver (cryptosoft)

To load any of the OCF hardware drivers, ensure you load the appropriate hardware driver, eg one of:

modprobe safe
modprobe hifn7751
modprobe ixp4xx
...

Debugging OCF

To enable debugging (which will ruin your acceleration gains!) you can issue some of the following commands based on your hardware/software:

    echo 1 > /sys/module/ocf/parameters/crypto_debug
    echo 1 > /sys/module/cryptodev/parameters/cryptodev_debug
    echo 1 > /sys/module/cryptosoft/parameters/swcr_debug
    echo 1 > /sys/module/hifn7751/parameters/hifn_debug
    echo 1 > /sys/module/safe/parameters/safe_debug
    echo 1 > /sys/module/ixp4xx/parameters/ixp_debug

OCF benchmarking

The ocf-bench driver accepts the following parameters:

- request_q_len - Maximum number of outstanding requests to OCF - request_num - run for at least this many requests - request_size - size of each request (multiple of 16 bytes recommended) - request_batch - enable OCF request batching - request_cbimm - enable OCF immediate callback on completion

An example benchmark use:

modprobe ocf-bench request_size=1024 request_cbimm=0
dmesg |tail -5

OCF KLIPS tuning

The following parameters are managed by _stackmanager but can be changed to suit your specific need based on the hardware capabilities of your platform:

  • /sys/module/ocf/parameters/crypto_q_max
  • /sys/module/ipsec/parameters/ipsec_irs_cache_allocated_max
  • /sys/module/ipsec/parameters/ipsec_ixs_cache_allocated_max

Additional parameters that can be tuned manually:

  • /sys/module/ocf/parameters/crypto_all_kqblocked
  • /sys/module/ocf/parameters/crypto_verbose
  • /sys/module/ocf/parameters/crypto_debug
  • /sys/module/ocf/parameters/crypto_q_cnt
  • /sys/module/ocf/parameters/crypto_userasymcrypto
  • /sys/module/ocf/parameters/crypto_all_qblocked
  • /sys/module/ocf/parameters/crypto_usercrypto
  • /sys/module/ocf/parameters/crypto_max_loopcount
  • /sys/module/ocf/parameters/crypto_devallowsoft


  • /sys/module/ipsec/parameters/ipsec_ocf_batch
  • /sys/module/ipsec/parameters/ipsec_ocf_cbimm

ipsec_ocf_cbimm

The OCF layer will call back on completion immediately rather than calling back from a work queue (softirq) context. Your callbacks need to be very careful and re-entrant safe to use this mode. KLIPS is typically safe to use in this mode.

ipsec_ocf_batch

Instruct OCF to batch requests if possible. Typically this should be enabled.


The pcrypt kernel module

The pcrypt and tcrypt kernel module allows the Linux kernel CryptoAPI to spread the crypto load of single IPsec SA's over multiple CPU's. If your VPN server has many hunderds of IPsec connections, these will already be spread out over the CPU's and pcrypt does not gain you much. However, if you have only one or a few IPsec tunnels that run at a high capacity, the pcrypt module might help you better use all your available CPU power. Also, if you are looking at increasing the throughput, it is very important to pick an ESP algorithm that is fast. Currently, AES_GCM outperforms everything (including esp=null-md5 !!) so that should be the algorithm used when trying to use pcrypt.


The documentation of the pcrypt / tcrypt module is very limited. The pcrypt module has to be enabled per algorithm:

modprobe pcrypt
modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
# optionally accelerate aes-sha1 and aes-sha2
modprobe tcrypt alg="pcrypt(authenc(hmac(sha1),cbc(aes)))" type=3
modprobe tcrypt alg="pcrypt(authenc(hmac(sha256),cbc(aes)))" type=3

Note that it has been reported that a system may crash if these commands are activated before any IPsec tunnel has been established. Similarly, we have reports that these commands must be run before starting any tunnels or otherwise it crashes. And we have had reports that after running the modprobe commands, the IPsec tunnel should be re-established to activate the acceleration.

Using multiple CPU's can cause packets to become encrypted out-of-order. It is important to either increase the replay-window to a value of 64 or 128, or to disable replay protection by setting replay-window to 0.

There is a tool called crconf that is supposed to make this easier, but it seems to not be working for those who have tried to use it.