Namespace-Magic: Difference between revisions
No edit summary |
No edit summary |
||
Line 31: | Line 31: | ||
== route cache filling up == | == route cache filling up == | ||
< | <pre> | ||
[2936616.607520] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size. | [2936616.607520] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size. | ||
[2936616.607908] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size. | [2936616.607908] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size. | ||
Line 40: | Line 40: | ||
net.ipv4.route.max_size = 2147483647 | net.ipv4.route.max_size = 2147483647 | ||
</ | </pre> | ||
google suggest flushing the route cache. That does not seems to help. I think routes are stuck inside namespaces. | google suggest flushing the route cache. That does not seems to help. I think routes are stuck inside namespaces. | ||
<pre> | |||
ip route flush cache | ip route flush cache | ||
Line 55: | Line 56: | ||
setting the network namespace "east-ikev2-mobike-06" failed: Invalid argument | setting the network namespace "east-ikev2-mobike-06" failed: Invalid argument | ||
</ | </pre> | ||
Line 74: | Line 74: | ||
That seems to have reduced the failure rate from 50% to 2%, when running 500 tests 10 tests in parallel. Changing to --wait 120 still cause 1.3% errror. Going above 120 seconds would skew tests. They usually timeout in 120 seconds | That seems to have reduced the failure rate from 50% to 2%, when running 500 tests 10 tests in parallel. Changing to --wait 120 still cause 1.3% errror. Going above 120 seconds would skew tests. They usually timeout in 120 seconds | ||
Line 120: | Line 119: | ||
It seems often the the workers, started using ProcessPoolExecutor seems to lock up. They sit there doing nothinng | It seems often the the workers, started using ProcessPoolExecutor seems to lock up. They sit there doing nothinng | ||
I have been looking at them using gdb | I have been looking at them using gdb | ||
< | <pre> | ||
gdb /usr/bin/python3.7 -p <pid> | gdb /usr/bin/python3.7 -p <pid> | ||
(gdb) py-bt | (gdb) py-bt | ||
Line 159: | Line 158: | ||
dnf debuginfo-install python3 | dnf debuginfo-install python3 | ||
dnf debuginfo-install python3-3.7.3-3.fc29.x86_64 | dnf debuginfo-install python3-3.7.3-3.fc29.x86_64 | ||
</ | </pre> | ||
= good to know = | = good to know = | ||
== nsenter alias/function == | == nsenter alias/function == |
Revision as of 03:30, 22 July 2019
The namespaces have been around for long time however, it still feel magic. So I start a page to enable magic, in 2019. As time pass it may not be magic anymore or even may become obsolete. An early attempt in Libreswan with Paul.
FAQ
How detect from inside the namespace
* one way seems to look at eth0. inside namespace "eth1@if107" kvm "eth0:"
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:9e:81:71 brd ff:ff:ff:ff:ff:ff </rep> * How find veth's peer inside namespace from a host : link-netns <pre> on the host ip link output: 107: hweste164512@if106: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master brswan12-64512 state UP mode DEFAULT group default qlen 1000 link/ether 4a:34:cd:0e:0c:13 brd ff:ff:ff:ff:ff:ff link-netns west-ikev2-03-basic-rawrsa from inside the name space 106: eth1@if107: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 02:10:c8:8e:d2:7e brd ff:ff:ff:ff:ff:ff link-netnsid 0 from the host you get the name space name: "link-netns west-ikev2-03-basic-rawrsa" for exaactly which interface from "ip link" you see "106: eth1@if107", "107: hweste164512@if106"
route cache filling up
[2936616.607520] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size. [2936616.607908] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size. [2936616.609100] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size. sysctl -a | grep "route.max_size" net.ipv4.route.max_size = 2147483647
google suggest flushing the route cache. That does not seems to help. I think routes are stuck inside namespaces.
ip route flush cache ip route show cache is empty ip -all netns exec ip link show show weired errors. ip -all netns exec ip link show netns: east-ikev2-mobike-06 setting the network namespace "east-ikev2-mobike-06" failed: Invalid argument
iptable fails -w 60 seems to help
sudo /usr/bin/nsenter --mount=/run/mountns/west-nstest-4 --net=/run/netns/west-nstest-4 --uts=/run/utsns/west-nstest-4 /bin/bash -c 'cd /testing/pluto/nstest-4;iptables -I INPUT -m policy --dir in --pol ipsec -j ACCEPT ' Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
I tried putting less /root/.bashrc
alias iptables="iptables --wait 60 --wait-interval=100000"
That seems to have reduced the failure rate from 50% to 2%, when running 500 tests 10 tests in parallel. Changing to --wait 120 still cause 1.3% errror. Going above 120 seconds would skew tests. They usually timeout in 120 seconds
would this work on foo 7/CentOS7: not yet too old util-linux
unshare and or nsenter do not suppor --mount[=file] option.
seems to be some options.
fedora 28 unshare -V unshare from util-linux 2.32.1 -m, --mount[=file] Unshare the mount namespace. If file is specified, then a persistent namespace is cre‐ated by a bind mount ---- old one foo 7 ----- unshare -V unshare from util-linux 2.23.2 -m, --mount Unshare the mount namespace.
outstanding issues
reduce the use of iptables =
This would go in steps. First make sure the swan-prep crate the LOGDROP traget only when a test need it. grep in the test scripts for LOGDROP. So mostly it will only run on the initiator.
"Error: Peer netns reference is invalid." =
It seems when the namespaces get mangled up any "ip" command would output this error. For now I am sanitizing it.
running the tests in parallel worker pool =
convert brctl to "ip link" done in python 201907
support bind mount installation using RPM or make install-base
route cache filling up
python threads locking up
It seems often the the workers, started using ProcessPoolExecutor seems to lock up. They sit there doing nothinng I have been looking at them using gdb
gdb /usr/bin/python3.7 -p <pid> (gdb) py-bt Traceback (most recent call first): File "/usr/lib64/python3.7/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/usr/lib64/python3.7/multiprocessing/queues.py", line 93, in get with self._rlock: File "/usr/lib64/python3.7/concurrent/futures/process.py", line 226, in _process_worker call_item = call_queue.get(block=True) File "/usr/lib64/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/usr/lib64/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/usr/lib64/python3.7/multiprocessing/popen_fork.py", line 74, in _launch code = process_obj._bootstrap() File "/usr/lib64/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/usr/lib64/python3.7/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/usr/lib64/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/usr/lib64/python3.7/concurrent/futures/process.py", line 593, in _adjust_process_count p.start() File "/usr/lib64/python3.7/concurrent/futures/process.py", line 569, in _start_queue_management_thread self._adjust_process_count() File "/usr/lib64/python3.7/concurrent/futures/process.py", line 615, in submit self._start_queue_management_thread() File "/home/build/libreswan/testing/utils/nsrun", line 3790, in run_ns_q_conc except KeyboardInterrupt: File "/home/build/libreswan/testing/utils/nsrun", line 3744, in do_test_list_new else: File "/home/build/libreswan/testing/utils/nsrun", line 3985, in main return File "/home/build/libreswan/testing/utils/nsrun", line 4069, in <module> gdb magic to get python bt dnf debuginfo-install python3 dnf debuginfo-install python3-3.7.3-3.fc29.x86_64
good to know
nsenter alias/function
NSENTER() { ns=$1 nsargs="--mount=/run/mountns/${ns} --net=/run/netns/${ns} --uts=/run/utsns/${ns}" NSENTER_CMD="/usr/bin/nsenter ${nsargs} " sudo ${NSENTER_CMD} /bin/bash } # Then type NSENTER east-basic-pluto-01