Linux内核NAT原理与实验
概述
Linux内核中提供了三层路由功能(默认关闭)和NAT功能,这些特性在原生openstack网络中被大量应用,比如DVR(Distributed Virtual Router)在计算节点的实现就是通过namespace+kernel路由功能实现的,而NAT功能则被应用在IPv4访问公网/公网访问云内主机的场景中,为云内主机提供访问公网能力。本文通过实验的方式带你了解Linux内核为我们提供的强大的网络功能。
NAT(Network Address Translation)
网络地址转换,将一个IP地址转换成另外一个IP地址,典型应用是私网主机访问Internet互联网,可以说在IPv4大行其道的现在NAT的身影无处不在,小到家里的路由器,大到数据中心的防火墙都提供了NAT功能。NAT可以缓解IPv4地址枯竭问题,通过将一系列内网主机地址映射到一个公网IP(利用上层协议标识符进行复用);另外通过NAT也可以隐藏内网的网络结构,提高一定的安全性。
NAT分为SNAT和DNAT,分别对应源地址转换和目的地址转换,SNAT常常用在内网访问公网的场景(内网主机是客户端),而DNAT则常用在公网访问内网场景(内网主机是服务端),本文的实验场景是SNAT。
实验组网
网元说明:
router:实现三层路由和NAT功能,使用Linux netns + kernel路由功能 + linux nat实现
br0:虚拟机接入交换机,使用linux网桥实现
br1:外网接入交换机,使用linux网桥实现(这里的“外网”并不是Internet,而是相对虚拟机二层网络之外的网络)
vm1、vm2:内网主机,使用linux network namespace实现(只对网络进行模拟)
vm3:外网主机,同样使用linux network namespace实现。
说明:本实验所有流量只在实验宿主机内流转,不出宿主机。
搭建实验环境
物料:一台linux主机(物理机或虚拟机)
创建路由器
# 创建路由器 [root@centos7 ~]# ip netns add router # 开启路由器 [root@centos7 ~]# netns=router; ip netns exec ${netns} bash --rcfile <(echo "PS1=\"${netns}$ \"") router$ echo 1 > /proc/sys/net/ipv4/ip_forward router$ ip link set lo up # 创建好的router里现在只有一个lo接口 router$ ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever # 退出路由器 router$ exit
创建接入交换机(br0和br1)
# 创建网桥 [root@centos7 ~]# brctl addbr br0 [root@centos7 ~]# brctl addbr br1 # 启动网桥设备 [root@centos7 ~]# ip link set br0 up [root@centos7 ~]# ip link set br1 up # 查看网桥 [root@centos7 ~]# brctl show bridge name bridge id STP enabled interfaces br0 8000.000000000000 no br1 8000.000000000000 no
连接交换机和路由器
交换机和网关之间通过veth pair连接
# 创建router和br0之间的veth pair(创建网线) [root@centos7 ~]# ip link add veth-02464-a type veth peer name veth-02464-b # 将veth-02464-a连接到router上,设置接口IP地址并启动接口(将网线插入路由器上) [root@centos7 ~]# ip link set veth-02464-a netns router [root@centos7 ~]# ip netns exec router ip addr add 172.16.0.1/24 dev veth-02464-a [root@centos7 ~]# ip netns exec router ip link set veth-02464-a up [root@centos7 ~]# ip netns exec router ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 6: veth-02464-a@if5: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN qlen 1000 link/ether 1e:bb:e5:8b:2a:e5 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.16.0.1/24 scope global veth-02464-a valid_lft forever preferred_lft forever # 将veth-02464-b连接到br0网桥上,并启动接口(将网线插入交换机上) [root@centos7 ~]# brctl addif br0 veth-02464-b [root@centos7 ~]# ip link set veth-02464-b up [root@centos7 ~]# brctl show bridge name bridge id STP enabled interfaces br0 8000.1ad92fc22b4e no veth-02464-b br1 8000.000000000000 no ### ============================== # 创建router和br1之间的veth pair(创建网线) [root@centos7 ~]# ip link add veth-29532-a type veth peer name veth-29532-b # 将veth-29532-a连接到router上,设置接口IP地址并启动接口(将网线插入路由器上) [root@centos7 ~]# ip link set veth-29532-a netns router [root@centos7 ~]# ip netns exec router ip addr add 203.0.1.1/24 dev veth-29532-a [root@centos7 ~]# ip netns exec router ip link set veth-29532-a up [root@centos7 ~]# ip netns exec router ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 6: veth-02464-a@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 1e:bb:e5:8b:2a:e5 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.16.0.1/24 scope global veth-02464-a valid_lft forever preferred_lft forever inet6 fe80::1cbb:e5ff:fe8b:2ae5/64 scope link valid_lft forever preferred_lft forever 8: veth-29532-a@if7: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN qlen 1000 link/ether 66:4c:34:a2:fd:52 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 203.0.1.1/24 scope global veth-29532-a valid_lft forever preferred_lft forever # 将veth-29532-b连接到br1网桥上,并启动接口(将网线插入交换机上) [root@centos7 ~]# brctl addif br1 veth-29532-b [root@centos7 ~]# ip link set veth-29532-b up [root@centos7 ~]# brctl show bridge name bridge id STP enabled interfaces br0 8000.1ad92fc22b4e no veth-02464-b br1 8000.4e65a04ce970 no veth-29532-b # 查看router中的接口配置和状态是否正常(IP地址和接口是否UP) [root@centos7 ~]# ip netns exec router ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 6: veth-02464-a@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 1e:bb:e5:8b:2a:e5 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.16.0.1/24 scope global veth-02464-a valid_lft forever preferred_lft forever 8: veth-29532-a@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 66:4c:34:a2:fd:52 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 203.0.1.1/24 scope global veth-29532-a valid_lft forever preferred_lft forever
创建两台内网主机
# 创建vm1 [root@centos7 ~]# ip netns add vm1 # 创建veth pair [root@centos7 ~]# ip link add veth-vm1-a type veth peer name veth-vm1-b # 连接网桥并启动设备 [root@centos7 ~]# brctl addif br0 veth-vm1-a [root@centos7 ~]# ip link set veth-vm1-a up # 连接vm1 [root@centos7 ~]# ip link set veth-vm1-b netns vm1 # 进入vm1 [root@centos7 ~]# netns=vm1; ip netns exec ${netns} bash --rcfile <(echo "PS1=\"${netns}$ \"") # 修改接口名称 vm1$ ip link set veth-vm1-b name eth0 # 设置IP地址 vm1$ ip addr add 172.16.0.101/24 dev eth0 # 增加默认路由 vm1$ ip route add default via 172.16.0.1 # 启动网络设备 vm1$ ip link set lo up vm1$ ip link set eth0 up # 查看接口状态 vm1$ ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 9: eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether a6:1d:db:6a:62:d7 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.16.0.101/24 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::a41d:dbff:fe6a:62d7/64 scope link valid_lft forever preferred_lft forever # 退出vm1 vm1$ exit ### ============================== # 创建vm2 [root@centos7 ~]# ip netns add vm2 # 创建veth pair [root@centos7 ~]# ip link add veth-vm2-a type veth peer name veth-vm2-b # 连接网桥并启动设备 [root@centos7 ~]# brctl addif br0 veth-vm2-a [root@centos7 ~]# ip link set veth-vm2-a up # 连接vm2 [root@centos7 ~]# ip link set veth-vm2-b netns vm2 # 进入vm2 [root@centos7 ~]# netns=vm2; ip netns exec ${netns} bash --rcfile <(echo "PS1=\"${netns}$ \"") # 修改接口名称 vm2$ ip link set veth-vm2-b name eth0 # 设置IP地址 vm2$ ip addr add 172.16.0.102/24 dev eth0 # 增加默认路由 vm2$ ip route add default via 172.16.0.1 # 启动网络设备 vm2$ ip link set lo up vm2$ ip link set eth0 up # 查看接口状态 vm2$ ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 11: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether d6:72:b3:bc:9d:77 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 172.16.0.102/24 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::d472:b3ff:febc:9d77/64 scope link valid_lft forever preferred_lft forever # 退出vm2 vm2$ exit
创建外网主机
# 创建vm3 [root@centos7 ~]# ip netns add vm3 # 创建veth pair [root@centos7 ~]# ip link add veth-vm3-a type veth peer name veth-vm3-b # 连接网桥并启动设备 [root@centos7 ~]# brctl addif br1 veth-vm3-a [root@centos7 ~]# ip link set veth-vm3-a up # 连接vm3 [root@centos7 ~]# ip link set veth-vm3-b netns vm3 # 进入vm3 [root@centos7 ~]# netns=vm3; ip netns exec ${netns} bash --rcfile <(echo "PS1=\"${netns}$ \"") # 修改接口名称 vm3$ ip link set veth-vm3-b name eth0 # 设置IP地址 vm3$ ip addr add 203.0.1.101/24 dev eth0 # 启动网络设备 vm3$ ip link set lo up vm3$ ip link set eth0 up # 查看接口状态 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 13: eth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 3a:c2:a6:3e:ef:f2 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 203.0.1.101/24 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::38c2:a6ff:fe3e:eff2/64 scope link valid_lft forever preferred_lft forever # 退出vm3 vm3$ exit
连通性检查
# 进入vm1 [root@centos7 ~]# netns=vm1; ip netns exec ${netns} bash --rcfile <(echo "PS1=\"${netns}$ \"") vm1$ ping 172.16.0.1 PING 172.16.0.1 (172.16.0.1) 56(84) bytes of data. 64 bytes from 172.16.0.1: icmp_seq=1 ttl=64 time=0.027 ms vm1$ ping 172.16.0.102 PING 172.16.0.102 (172.16.0.102) 56(84) bytes of data. 64 bytes from 172.16.0.102: icmp_seq=1 ttl=64 time=0.052 ms vm1$ ping 203.0.1.1 PING 203.0.1.1 (203.0.1.1) 56(84) bytes of data. 64 bytes from 203.0.1.1: icmp_seq=1 ttl=64 time=0.028 ms vm1$ ping 203.0.1.101 PING 203.0.1.101 (203.0.1.101) 56(84) bytes of data. # 进入vm2 [root@centos7 ~]# netns=vm2; ip netns exec ${netns} bash --rcfile <(echo "PS1=\"${netns}$ \"") vm2$ ping 172.16.0.1 PING 172.16.0.1 (172.16.0.1) 56(84) bytes of data. 64 bytes from 172.16.0.1: icmp_seq=1 ttl=64 time=0.027 ms vm2$ ping 172.16.0.102 PING 172.16.0.101 (172.16.0.101) 56(84) bytes of data. 64 bytes from 172.16.0.101: icmp_seq=1 ttl=64 time=0.028 ms vm2$ ping 203.0.1.1 PING 203.0.1.1 (203.0.1.1) 56(84) bytes of data. 64 bytes from 203.0.1.1: icmp_seq=1 ttl=64 time=0.028 ms vm2$ ping 203.0.1.101 PING 203.0.1.101 (203.0.1.101) 56(84) bytes of data. # 进入vm3 [root@centos7 ~]# netns=vm3; ip netns exec ${netns} bash --rcfile <(echo "PS1=\"${netns}$ \"") vm3$ ping 203.0.1.1 PING 203.0.1.1 (203.0.1.1) 56(84) bytes of data. 64 bytes from 203.0.1.1: icmp_seq=1 ttl=64 time=0.034 ms
SNAT实验
ICMP
使用nping工具,因为它可以设置icmp的id,我们要构造icmp id相同的场景。
# 进入路由器 [root@centos7 ~]# netns=router; ip netns exec ${netns} bash --rcfile <(echo "PS1=\"${netns}$ \"") router$ # 在router上配置网络SNAT router$ iptables -t nat -A POSTROUTING -s 172.16.0.0/24 -j SNAT --to-source 203.0.1.1 # 查看NAT转发表 router$ iptables -t nat -vL ... Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 SNAT all -- any any 172.16.0.0/24 anywhere to:203.0.1.1 # 在路由器中开始抓包 router$ tcpdump -s 0 -i any -v -nn tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes # 在vm1和vm2中同时nping外网主机(203.0.1.100)并设置icmp-id为相同的值,如1111 vm1$ nping --icmp-id 1111 -c 1 203.0.1.101 Starting Nping 0.7.70 ( https://nmap.org/nping ) at 2019-02-18 14:28 CST SENT (0.0147s) ICMP [172.16.0.101 > 203.0.1.101 Echo request (type=8/code=0) id=1111 seq=1] IP [ttl=64 id=3611 iplen=28 ] RCVD (0.0148s) ICMP [203.0.1.101 > 172.16.0.101 Echo reply (type=0/code=0) id=1111 seq=1] IP [ttl=63 id=5678 iplen=28 ] Max rtt: 0.030ms | Min rtt: 0.030ms | Avg rtt: 0.030ms Raw packets sent: 1 (28B) | Rcvd: 1 (28B) | Lost: 0 (0.00%) Nping done: 1 IP address pinged in 1.03 seconds # vm2 vm2$ nping --icmp-id 1111 -c 1 203.0.1.101 Starting Nping 0.7.70 ( https://nmap.org/nping ) at 2019-02-18 14:29 CST SENT (0.0149s) ICMP [172.16.0.102 > 203.0.1.101 Echo request (type=8/code=0) id=1111 seq=1] IP [ttl=64 id=47859 iplen=28 ] RCVD (0.0150s) ICMP [203.0.1.101 > 172.16.0.102 Echo reply (type=0/code=0) id=1111 seq=1] IP [ttl=63 id=8239 iplen=28 ] Max rtt: 0.030ms | Min rtt: 0.030ms | Avg rtt: 0.030ms Raw packets sent: 1 (28B) | Rcvd: 1 (28B) | Lost: 0 (0.00%) Nping done: 1 IP address pinged in 1.03 seconds # 路由器中连接跟踪信息 router$ cat /proc/net/nf_conntrack ipv4 2 icmp 1 22 src=172.16.0.102 dst=203.0.1.101 type=8 code=0 id=1111 src=203.0.1.101 dst=203.0.1.1 type=0 code=0 id=0 mark=0 zone=0 use=2 ipv4 2 icmp 1 19 src=172.16.0.101 dst=203.0.1.101 type=8 code=0 id=1111 src=203.0.1.101 dst=203.0.1.1 type=0 code=0 id=1111 mark=0 zone=0 use=2
TCP
使用nc在vm3上启动监听服务,同时在vm1和vm2上使用相同的端口连接vm3上的服务。
# 在vm3上启动监听服务 vm3 $ nc -klp 80 # 在路由器中开始抓包 router$ tcpdump -s 0 -i any -v -nn tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes # 在vm1中使用2222端口连接vm3的80端口 vm1$ nc -p 2222 203.0.1.101 80 # 在vm2中使用2222端口连接vm3的80端口 vm2$ nc -p 2222 203.0.1.101 80 # 路由器中连接跟踪信息 router$ cat /proc/net/nf_conntrack ipv4 2 tcp 6 431979 ESTABLISHED src=172.16.0.101 dst=203.0.1.101 sport=2222 dport=80 src=203.0.1.101 dst=203.0.1.1 sport=80 dport=2222 [ASSURED] mark=0 zone=0 use=2 ipv4 2 tcp 6 431985 ESTABLISHED src=172.16.0.102 dst=203.0.1.101 sport=2222 dport=80 src=203.0.1.101 dst=203.0.1.1 sport=80 dport=1024 [ASSURED] mark=0 zone=0 use=2
由于vm1和vm2使用了相同的源端口(2222),为了防止NAT转换后端口冲突,在进行第2个内部主机(vm2)NAT转换时将源端口改为了1024(2222 -> 1024),并在响应返回时将端口修改回去(1024 -> 2222)。
Linux SNAT总结
Linux系统中以元组(tuple /ˈtʌpəl/)对网络连接进行跟踪与标识,通常网络五元组是指:源IP、源端口、目的IP、目的端口和协议。这是对于常见的传输层协议TCP和UDP而言。
ICMP协议由于是没有端口的,它的元组是指:源IP、目的IP、协议、type、code、包id
tuple在内核中的定义
/* # include/net/netfilter/nf_conntrack_tuple.h A `tuple' is a structure containing the information to uniquely identify a connection. ie. if two packets have the same tuple, they are in the same connection; if not, they are not. We divide the structure along "manipulatable" and "non-manipulatable" lines, for the benefit of the NAT code. */ /* This contains the information to distinguish a connection. */ struct nf_conntrack_tuple { struct nf_conntrack_man src; /* These are the parts of the tuple which are fixed. */ struct { union nf_inet_addr u3; union { /* Add other protocols here. */ __be16 all; struct { __be16 port; } tcp; struct { __be16 port; } udp; struct { u_int8_t type, code; } icmp; struct { __be16 port; } dccp; struct { __be16 port; } sctp; struct { __be16 key; } gre; } u; /* The protocol. */ u_int8_t protonum; /* The direction (for tuplehash) */ u_int8_t dir; } dst; }; /* The protocol-specific manipulable parts of the tuple: always in * network order */ union nf_conntrack_man_proto { /* Add other protocols here. */ __be16 all; struct { __be16 port; } tcp; struct { __be16 port; } udp; struct { __be16 id; } icmp; struct { __be16 port; } dccp; struct { __be16 port; } sctp; struct { __be16 key; /* GRE key is 32bit, PPtP only uses 16bit */ } gre; }; /* The manipulable part of the tuple. */ struct nf_conntrack_man { union nf_inet_addr u3; union nf_conntrack_man_proto u; /* Layer 3 protocol */ u_int16_t l3num; };
查看连接跟踪信息
router$ cat /proc/net/nf_conntrack ipv4 2 tcp 6 431995 ESTABLISHED src=172.16.0.101 dst=203.0.1.101 sport=2222 dport=80 src=203.0.1.101 dst=203.0.1.1 sport=80 dport=2222 [ASSURED] mark=0 zone=0 use=2 ipv4 2 tcp 6 431996 ESTABLISHED src=172.16.0.102 dst=203.0.1.101 sport=2222 dport=80 src=203.0.1.101 dst=203.0.1.1 sport=80 dport=1024 [ASSURED] mark=0 zone=0 use=2
记录格式
网络层协议名、网络层协议编号 、传输层协议名、传输层协议编号、记录失效前剩余秒数、连接状态(不是所有协议都有),之后都是key=value或flag格式,1行里最多2个同名key(如 src 和 dst),第1次出现的来自请求,第2次出现的来自响应。
flag
[ASSURED] 已收到响应,连接已确认。
[UNREPLIED] 没收到响应,哈希表满的时候这些连接先扔掉。
协议编号
/etc/protocols
- 点赞
- 收藏
- 关注作者
评论(0)