FAQ: Difference between revisions

From Libreswan
Jump to navigation Jump to search
No edit summary
(v6neighbour-hole faq)
Line 191: Line 191:


This has been fixed in libreswan-3.9. Please upgrade
This has been fixed in libreswan-3.9. Please upgrade
== IPv6 tunnel works manually but fails after a machine reboots ==
When one machine reboots and loses state, the other machine still has an encryption policy for the rebooted machine and will insist on receiving only encrypted packets. Obviously, after a reboot the host cannot send encrypted packets. For that reason, an "IKE hole" is present in the host's kernel. This means that any UDP 500 and UDP 4500 packets for IKE are allowed in plaintext even if we have an encryption policy active for that host. On at least the Linux kernel that hole does not include ipv6-icmp Neighbour Discovery packets, which is a unicast reply from the host that did not reboot to the just rebooted host. You can see this in "ipsec status" as:
<pre>
000 Shunt list:
000 000 2620:52:0:ab0:42f2:e9ff:fe09:a16c/128:136 -58-> 2620:52:0:ab0:ca1f:66ff:fef1:c74c/128:0 => %hold 0 %acquire-netlink
</pre>
Note protocol 58 (ipv6-icmp)
A workaround is to add the following connection:
<pre>
conn v6neighbour-hole
        left=::1
        leftsubnet=::0/0
        leftprotoport=58/0
        rightprotoport=58/34816
        rightsubnet=::0/0
        right=::0
        connaddrfamily=ipv6
        authby=never
        type=passthrough
        auto=route
        priority=1
</pre>
(if you wonder where 34816 comes from please see the leftprotoport= entry of the [https://libreswan.org/man/ipsec.conf.5.html ipsec.conf] man page.

Revision as of 02:44, 19 March 2015

general

( we will sort this in categories once we have more )

Which IKE Exchange modes does libreswan support?

The IANA Registry lists all official Exchange Modes. There are a few IKEv1 Modes that are very common despite never gotten past the draft stage.

Supported:

Not supported


Should I use the NETKEY or KLIPS IPsec stack with libreswan?

At this point we recommend using the NETKEY stack for most deployments. If you are using an embedded platform with a cryptographic hardware offload device, it might be better to use KLIPS.

The NETKEY IPsec stack requires no kernel recompiles on most Linux distributions, so it is the easiest stack to use in most standard deployments. It offers a larger selection of cryptographic algorithm support, including the IPsec Suite B algorithms AES CTR, AES GCM and SHA2. It does cause a little additional delay with on-demand IPsec tunnels because it does not implement first+last packet caching. NETKEY supports OCF only using the cryptosoft driver, and is lacking native driver support for most cryptographic hardware cards. NETKEY also does not distribute the load of a single IPsec SA over different CPU's. NETKEY has support for Linux VTI for IPsec SA reference tracking.

The KLIPS IPsec stack offers easier debugging with tcpdump and easier iptables firewall rules due to its use of separate ipsecX interfaces. It also plays a little nicer with on-demand tunneling as it will hold on the first+last packet sent while the tunnel is being setup, and will release those packets once the IPsec tunnel is established. KLIPS distributes the load of a single IPsec SA over multiple CPU's. It supports all OCF hardware devices when compiled with OCF support. The MAST variant of KLIPS use IPsec SAref for IPsec SA reference tracking and is also used for L2TP/IPsec deployments requiring SAref tracking. Although it is recommended to use VPN_server_for_remote_clients_using_IKEv1_XAUTH instead of L2TP/IPsec.

Is Libreswan vulnerable to the OpenSSL "Heartbleed" exploit?

No, see Libreswan_and_Heartbleed

Is Libreswan vulnerable to bash CVE-2014-6271 or CVE-2014-7169?

No, libreswan is not vulnerable

Libreswan sanitizers strings that may come from the network, such as XAUTH username, domain and DNS servers by passing it through filter functions remove_metachar() and cisco_stringify() before assigning it to environment variables that are passed to the updown scripts that invoke bash. These filters remove dangerous characters including the ' character needed for these bash exploits.

Is Libreswan vulnerable to NSS CVE-2014-1568 RSA Signature Forgery?

Yes, please upgrade NSS to one of 3.17.1, 3.16.1 or 3.16.5.

This only affects libreswan when using X.509 certificates. Raw RSA keys using leftrsasigkey/rightrsasigkey are not affected. Connections using auth=secret (PSK) are also not affected.

See Mozilla Foundation Security Advisory 2014-73

configurations

My ssh sessions hang or connectivity is very slow

This could be an MTU issue. The overhead of IPsec encryption (and possibly ESPinUDP encapsulation) yields a slightly smaller packet size. This can cause problems. A good way to confirm MTU problems is if you can login remotely over the IPsec tunnel using ssh, but issuing "ls -l /usr" causes the session to hang. Try adjusting the MTU with:

iptables -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS  --clamp-mss-to-pmtu

If that does not help, try hardcoding it yourself:

iptables -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1380

If these settings don't help, adding mtu=1420 to the connection might work, although it will affect all traffic that the connection covers.

As a last case alternative, you can try lowering the MTU on the internal interface of your IPsec server so that the PMTU discovery locally already goes back to 1440, eg using ip link set dev eth1 mtu 1440. This will not only affect packets for the VPN tunnel, but all packets received and sent on that inerface. Only use this as a last resort.

using auto=route slows down TCP establishments when using XFRM

(also known as rhbz#1010347 )

This should be fixed on recent kernels (3.x) and backported to some older kernels (notably rhel 6.6)

The issue: The ESP packets are arriving sometimes very late or they do not arrive at all. The issues are most noticeable after restarting the IPSec daemon.

The problem as explained by Herbert Xu:

Your first TCP SYN packet triggers the IPsec lookup, however, the packet itself is dropped. TCP then retransmits but it only gets through after the IPsec SAs are fully instated, resulting in the delay.

What happens in some kernels is that the IPsec trigger occurs in a sleepable context, which means that the sending process will wait for the IPsec SAs to be installed before sending the first SYN. However, this was never meant to be a complete solution to supporting auto=route as it relies on the fact that there must be some sleepable context prior to the SYN packet being sent.

Evidently this is no longer the case for some kernels. Going forward I suggest two courses of action:

1) Doo not rely on auto=route. Instead use auto=start and ensure that you synchronously wait for the SAs to complete. For example, ipsec auto --up foo will bring foo up synchronously, while ipsec auto --asynchronous --up foo will not wait and thus may fail.

2) I will take this issue to the IPsec maintainer and the network maintainer to see if we can make adjustments to allow at least the TCP connection case to work with auto=route. However, there is no guarantee that this will be done as we may not be able to insert the requisite sleepable context into the general network stack just so that IPsec auto=route can work.

Longer term for auto=route to be properly supported someone needs to implement packet queueing on larval SAs.

Possible work around:

        echo 0 > /proc/sys/net/core/xfrm_larval_drop
        echo 3 > /proc/sys/net/ipv4/tcp_syn_retries

This means that the first retransmit of the SYN packet (+1s) should make it through, rather than the current behaviour where only the fourth retransmit (+15s) makes it through.

Note that this workaround causes a regression on the connect() call to immediately return on a non-blocking socket with an appropriate POSIX compliant errno, which is why the workaround also sets the TCP SYN retry count to 3.

When using hundreds of tunnels on a xen based cloud system like AWS, a fraction of tunnels fail regularly

This is a known issue that could be a problem of the aesni kernel module in combination with the xen hypervisor. Try unloading the aesni.ko kernel module on the xen server. If you can confirm this fixes your issue (we cannot change the AWS servers), please email the swan-dev list with a confirmation.

My XAUTH authentication via PAM always claims the password is incorrect on centos6

This is an odd bug (feature?) that shows up when you have disabled selinux in /etc/sysconfig/selinux. Running selinux in permissive (or enforcing) mode seems to resolve this.

Why is it recommended to disable send_redirects in /proc/sys/net ?

Let's say you have a VPN server in a cloud that you use with your phone. Your phone will setup an IPsec VPN and all its traffic is encrypted and send to the cloud instance, which decrypts it and sends it on the internet, using SNAT. Replies it receives are encrypted and send to your phone.

Your phone will send the VPN server an encrypted packet. The server receives it on eth0 (its only interface!) and decrypts it. The decrypted packet is then ready to get routed. The server looks which interface it should send the packet to. It is destined to go out eth0. Since the packet came in via eth0 and would go out via eth0, the server concludes there clearly must be a better path not involving itself, since it is going out the same interface. It has no idea the packet arrived encrypted and got decrypted.

This is why we recommend disabling "send_redirects" in /etc/sysctl.conf using

net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0

Why is it recommended to disable rp_filter in /proc/sys/net ?

The kernel has a notion of which interface a packet came from and where it will go to and it determines if the path through the machine makes sense based on the IP address it sees. If 10.0.2.0/24 lives on eth0 and 1.2.3.4 has eth1 with the default route, then rp_filter will automatically block a 10.0.2.1 packet coming in on eth1. The rp_filter code is an implementation of RFC-3704 https://tools.ietf.org/html/rfc3704. Of course, you should created had firewall rules on the machine that would block these packets too. AND firewall rules on the router in front of the machine.

The problem with IPsec appears when you hand out a 10.0.2.13 address, like via XAUTH/IPsec. A packet with IP a.b.c.d comes in on eth1 for 1.2.3.4, which passes rp_filter, then gets decrypted to 10.0.2.13. Now the packet is still seen as coming from eth1, so rp_filter will drop the packet as 10.0.2.0/24 packets are only expected to originate from eth0.

This is why we recommend disabling "rp_filter" in /etc/sysctl.conf using

net.ipv4.conf.default.rp_filter = 0

A network restart or reboot might be neccessary for this entry to be picked up. As a one shot disabling for all interfaces, you can use:

for i in /proc/sys/net/ipv4/conf/*; do echo 0 > $i/rp_filter; done

Common error messages

ERROR: asynchronous network error report on eth0 (sport=4500) for message to xx.xx.xxx.xxx port 4500, complainant yy.yy.yyy.yyy: Message too long [errno 90, origin ICMP type 3 code 4 (not authenticated)]

These errors are often intermittent, it depends on your application data that is getting encrypted. Your NAT'ed IPsec tunnel is using ESPinUDP, and the additional UDP header caused some of your packets to be too big. See the previous answer and try lowing your mtu. Use an insanely small mtu like 1300 or 1200 for confirmation. Then try to bring it up higher to what seems to work reliably for you.

ERROR: asynchronous network error report on eth0 (sport=4500) for message to xx.xx.xxx.xxx port 4500, complainant yy.yy.yyy.yyy: No route to host [errno 113, origin ICMP type 3 code 1(not authenticated)]

These errors often happen 15 minutes after the tunnel successfully established. It's most likely that the tunnel was idle and the NAT router removed the nat mapping. Or the NAT router rebooted and lost state. It no longer knows which client to send the packet to. Ensure your connection uses nat-keepalive=yes. Possibly decrease the global keep-alive= value to send more frequent keep-alive packets. Alternatively, enable DPD on the connection to cause some regular traffic on idle tunnels.

ERROR: asynchronous network error report on eth0 (sport=500) for message to xx.xx.xxx.xxx port 500, complainant yy.yy.yyy.yyy: Connection refused [errno 111, origin ICMP type 3 code 3 (not authenticated)]

This error means the other end is not (or no longer) running an IKE daemon. Ensure the IKE daemon is running on the remote system. If you see this error during a negotiation, it could be that the remote IKE daemon crashed or stopped listening. On Mac OSX if the IKE daemon is not allowed to read the proper X.509 certificate, it will only realize this partially into the IKE negotiation and terminate, resulting in this error.

error: ignoring informational payload, type NO_PROPOSAL_CHOSEN msgid=00000000

This error means exactly what i says. The IKE proposal(s) sent to the server were rejected. This means there is a configuration mismatch between libreswan and the remote IPsec server. Usually this is a configuration mismatch in the ike= or esp= (phase2alg=) setting. But other options could also be wrong, such as authby= or pfs= or aggrmode=

ssh gives error: Corrupted MAC on input. Disconnecting: Packet corrupt

This usually indicates MTU issues. You can try lowering the mtu using the mtu= option or by changing the actual mtu on the proper interface on the libreswan server. This error is known to happen on Amazon EC2 AMI types that use PV (xen) instances. Switching to Amazon HVM instances seems to resolve the problem on AWS.

Using aes_gcm or aes_ctr results in ERROR: netlink response for Add SA esp.XXXXXXXX@IPADDRESS included errno 22: Invalid argument

This usually indicates that the ESP algorithm selected using the phasealg= (esp=) line is not available in the kernel. These usually indicate kernel bugs.

Linux kernels up to 3.2.x have a bug in the aesni-intel driver on x86_64. See rhbz#1176211 The AESNI hardware acceleration kernel module does not properly support 256 or 192 bit keys for AES_GCM. You can either switch to 128 bit keys or blacklist or unload the aesni-intel kernel module. Another alternative is to switch from phase2alg=aes_gcm to phase2alg=aes, although that will cut the performance in half.

Linux kernels to date seem to have a bug in the aes_ctr code on the POWER8BE VM - use phase2alg=aes there as well to use AES_CBC,

Can't find the private key from the NSS CERT (err -8177)

The old libreswan-3.8 /etc/ipsec.d/nsspassword requires just the password to be entered. In later libreswan's, you must add the NSS prefix to it. So to specify the password "secret", use:

NSS Certificate DB:secret

old problems fixed in newer releases

Module unloading error on shutdown or restart: Module esp4 is in use

ERROR: Module xfrm4_mode_tunnel is in use
ERROR: Module esp4 is in use
FAILURE to unload NETKEY esp4/esp6 module

This has been fixed in libreswan-3.9. Please upgrade

IPv6 tunnel works manually but fails after a machine reboots

When one machine reboots and loses state, the other machine still has an encryption policy for the rebooted machine and will insist on receiving only encrypted packets. Obviously, after a reboot the host cannot send encrypted packets. For that reason, an "IKE hole" is present in the host's kernel. This means that any UDP 500 and UDP 4500 packets for IKE are allowed in plaintext even if we have an encryption policy active for that host. On at least the Linux kernel that hole does not include ipv6-icmp Neighbour Discovery packets, which is a unicast reply from the host that did not reboot to the just rebooted host. You can see this in "ipsec status" as:

000 Shunt list:
000 000 2620:52:0:ab0:42f2:e9ff:fe09:a16c/128:136 -58-> 2620:52:0:ab0:ca1f:66ff:fef1:c74c/128:0 => %hold 0 %acquire-netlink

Note protocol 58 (ipv6-icmp)

A workaround is to add the following connection:

conn v6neighbour-hole
        left=::1
        leftsubnet=::0/0
        leftprotoport=58/0
        rightprotoport=58/34816
        rightsubnet=::0/0
        right=::0
        connaddrfamily=ipv6
        authby=never
        type=passthrough
        auto=route
        priority=1

(if you wonder where 34816 comes from please see the leftprotoport= entry of the ipsec.conf man page.