XFRM pCPU

From Libreswan
Revision as of 12:10, 17 November 2019 by Antony (talk | contribs)
Jump to navigation Jump to search

Goal: scalable IPsec throughput with multiple CPUs(no HW offload)

The idea called per CPU sa in the outgoing SA was discussed at Linux IPsec workshop 2019 in Prague. During the following days a small group of people worked on a prototype of user space(IKE), Libreswan, and Linux kernel, xfrm. The libreswan called the options "clones". In Kernel it is called pCPU. These names may change.

Results

The test result show the aggregated throughput increase linearly with the number of CPUs. We tested using Mellonex CX4 NICS, which support RSS for ESP. Clear text traffic was generated using hardware traffic generator and IPsec gateway fowarding it and the clear text received on the traffic generator.

The initial number are 6-7Gbps per CPU with 3 flows we see about 17-18Gbps.

How to test this

Libreswan source with pCPU support branch #clones-3

git clone --single-branch --branch clones-3 https://github.com/antonyantony/libreswan

Sample config | ipsec.conf

conn westnet-eastnet
	rightid=@east
        leftid=@west
        left=192.1.2.45
        right=192.1.2.23
	rightsubnet=192.0.2.0/24
	leftsubnet=192.0.1.0/24
	authby=secret
        clones=2
        auto=add
        nic-offload=no

ipsec auto --up westnet-eastnet
taskset 0x1 ping -n -c 2 -I 192.0.1.254 192.0.2.254
taskset 0x2 ping -n -c 2 -I 192.0.1.254 192.0.2.254

ipsec trafficstatus

ipsec whack --trafficstatus
006 #2: "westnet-eastnet-0", type=ESP, add_time=1234567890, inBytes=0, outBytes=0, id='@east'
006 #4: "westnet-eastnet-1", type=ESP, add_time=1234567890, inBytes=168, outBytes=168, id='@east'
006 #3: "westnet-eastnet-2", type=ESP, add_time=1234567890, inBytes=168, outBytes=168, id='@east'

NOTE both SA #3 and #4 has outgoing traffic on it.

Kernel source pcpu-2

git clone -b pcpu-2 https://github.com/antonyantony/linux

Kernel / xfrm plans

  • Release private branch on Steffen's repository to get wider testing.
  • Kernel support for rekey. One could rekey in any order - either head SA or the sub SA.
  • One main difference is installing new sub SA, rekey, also delete the old sub SA. Libreswan should not try to delete it.
  • Ben would like to add feature bind a sub sa to a head SA?
  • when add_sa()
  • seems to need latest iproute2 otherwise "ip x s" may loop.


Libreswan Plans

  • Currently support clones=n. Both sides should have same number.
  • support for asymmetric configuration, one side 8(initiator) and responder (4).
  • fix rekey, we should not delete a sub SA. Only delete the head SA.
  • fix bugs ipsec auto --down and delete
  • don't allow clone instance on its own to be add|delete|down on the unaliased name.
  • test interop with unsupported version. ideally we should figure it out and not install clones. It could be that we will install clones and the last one would be used.

nCPU < nSAs

Lets say there are 4 cpus and number of clone configured is 8, because the other end has 8 CPUs. The head SA's list only has 4 places for sub SAs. And the 4CPU side is the initiator. As I understand the RFC, from Tero, when an initiator send a request to setup an SA, bi directional, the initiator is committing to receive on that SA. The 4CPU side IKE daemon will install 8 Receive SAs and 4 send SAs then everything would work.


Addition XFRM flags and attribute when adding SA to the Linux kernel

You need extra flags to XFRM_MSG_GETSA and XFRM_MSG_UPDSA, XFRM_MSG_GETSA when dealing with out going s

XFRM_MSG_GETSA | XFRM_MSG_UPDSA

both head SA and sub SA need extra attributes.

  • head SA set XFRMA_SA_EXTRA_FLAGS to XFRM_SA_PCPU_HEAD*
  • sub sa set XFRMA_SA_EXTRA_FLAGS to XFRM_SA_PCPU_SUB AND XFRMA_SA_PCPU to <sub-sa-id>. Sub SA ID start from 0-u32

XFRM_MSG_GETSA call only change for sub sda

  • sub SA set XFRMA_SA_EXTRA_FLAGS to XFRM_SA_PCPU_SUB AND XFRMA_SA_PCPU to <sub-sa-id>.
  • also set XFRMA_SRCADDR to src addr

what kind work load is supported

can we distribute 4 tuple workload

yes. The application on the sender side must run on the right CPU, aka use something like "taskset 0x1 ping -n -c 2 -I 192.0.1.254 192.0.2.254" or numactl, or something

Receiver side RSS support

To get this working you need Receive Side Scaling RSS The receiver NIC should be able steer different flows, based on SPI, into separate Qs otherwise receiver seems to getting overwhelmed. We used Mellonex CX4 to test. Some cards initially tested did not seems to support RSS for ESP flows, instead only TCP and UDP. While figuring out RSS for these cards we tried a bit different approch. ESP in UDP encapsulation, along with ESP in UDP GRO patches we could see the flows getting distributed on the receiver.

= RSS Commands

Enable GRO and it should work. ideally you should be able to run the following,

 ethtool -N <nic> rx-flow-hash esp4 

Another argument is if the NIC agnostic the 16 bits of SPI, of ESP packet, is aligned with UDP port number and should provide enough entropy.

 ethtool -N eno2 rx-flow-hash udp4 sdfn