Calico BGP RouteReflector策略实践
一 背景
容器网络组件Calico支持多种后端模式,有Overlay的IPIP、Vxlan模式,也有Underlay纯路由的BGP模式。
相比于Overlay网络模型,Underlay网络具有更高的数据面转发性能。同时在纯路由模式下,也有两种方案:Calico BGP的fullmesh方案,该方案存在一些限制,适用于小规模kubernetes集群,集群节点越多,BGP连接就越多,需要建立大量连接来保证网络的互通性,每增加一个节点就要成倍的增加连接保证网络的互通性,这样的话就会使用大量的网络消耗。所以这时就可以使用Route Reflector模式,也称为RR模式。RR模式
中会指定一个或多个BGP Speaker为RouterReflecor,它与网络中其他Speaker建立连接,每个Speaker只要与Router Reflector建立BGP就可以获得全网的路由信息。
二 Calico BGP RouteReflector模式组网架构
在不改变IDC机房内部网络拓扑的情况下,接入层交换机和核心层交换机建立BGP连接,借助于机房内部已有的路由策略实现,针对Node所处的物理位置分配Pod CIDR,并在每个节点上将Pod CIDR通过BGP协议宣告给接入层交换机,实现全网通信的能力。下图基于Leaf-Spine架构做详细说明。
组网原则:
- 每个接入层交换机与其管理的Node二层联通,共同构成一个AS。每个节点上跑BGP服务,用于宣告本节点路由信息。
- 核心层交换机和接入层交换机之间的每个路由器单独占用一个AS,物理直连,跑BGP协议。核心层交换机可以感知到全网的路由信息,接入层交换机可以感知与自己直连的Node上的路由信息。
- 同一个主机上的pod互访通过宿主机路由器。(将linux主机当成一个路由器)
- 同一个机架上不同node上的pod通信通过TOR(leaf)交换机
- 不同机架上pod通信走核心交换机
三 模拟生产场景组网搭建环境
提前准备一台Ubuntu2204操作系统的机器(规格8U16G即可)。需要在虚拟机上安装如下软件工具:
- Docker
- go开发环境
- Kind(kubernetes兴趣小组开发的一款kuberntes in docker软件,可用来快速搭建k8s测试环境,kind安装需要主机上先安装go,kind安装版本可选v0.20.0版本)
- ContainerLab(使用容器技术构建的虚拟网络平台,可以使用vyos镜像构建虚拟的交换机路由器。建议安装v0.42.0版本的containerlab)
3.1 kubernetes 环境搭建
kubernetes集群版本为: 1.27.3
集群规模为1 master,3 work node
集群构建脚本如下: 1-setup-env.sh
#!/bin/bash
date
set -v
# 1.prep noCNI env
cat <<EOF | kind create cluster --name=calico-bgp-rr --image=kindest/node:v1.27.3 --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
disableDefaultCNI: true
podSubnet: "10.244.0.0/16"
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-ip: 10.1.5.10
node-labels: "rack=rack0"
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
node-ip: 10.1.5.11
node-labels: "rack=rack0"
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
node-ip: 10.1.8.10
node-labels: "rack=rack1"
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
node-ip: 10.1.8.11
node-labels: "rack=rack1"
EOF
# 2.remove taints
kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/control-plane:NoSchedule-
kubectl get nodes -o wide
# 3. install tools
for i in $(docker ps -a --format "table {{.Names}}" |grep calico-bgp-rr)
do
echo $i
docker cp /usr/bin/ping $i:/usr/bin/ping
docker cp /usr/local/bin/calicoctl $i:/usr/local/bin/
# docker exec -it $i bash -c "apt-get -y update > /dev/null && apt-get -y install net-tools tcpdump lrzsz > /dev/null 2>&1"
done
执行脚本创建集群,由于未安装cni组件,集群部分pod会出现pending等状态,集群node 也会处于NotReady状态,这是正常现象。后面安装calico cni组件后,就可以解决。
3.2 创建网桥
在主机上创建网桥,主要作用是为了连通kind创建的K8s node和containerlab构建的交换机之间的网络。
brctl addbr br-leaf0;ifconfig br-leaf0 up;brctl addbr br-leaf1;ifconfig br-leaf1 up
3.3 借助containerLab搭建三层交换机并配置BGP规则
containerlab构建交换机脚本如下:2-setup-clab.sh
#!/bin/bash
set -v
cat <<EOF>clab.yaml | clab deploy -t clab.yaml -
name: calico-bgp-rr
topology:
nodes:
spine0:
kind: linux
image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9
cmd: /sbin/init
binds:
- /lib/modules:/lib/modules
- ./startup-conf/spine0-boot.cfg:/opt/vyatta/etc/config/config.boot
spine1:
kind: linux
image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9
cmd: /sbin/init
binds:
- /lib/modules:/lib/modules
- ./startup-conf/spine1-boot.cfg:/opt/vyatta/etc/config/config.boot
leaf0:
kind: linux
image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9
cmd: /sbin/init
binds:
- /lib/modules:/lib/modules
- ./startup-conf/leaf0-boot.cfg:/opt/vyatta/etc/config/config.boot
leaf1:
kind: linux
image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/vyos:1.4.9
cmd: /sbin/init
binds:
- /lib/modules:/lib/modules
- ./startup-conf/leaf1-boot.cfg:/opt/vyatta/etc/config/config.boot
br-leaf0:
kind: bridge
br-leaf1:
kind: bridge
server1:
kind: linux
image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool
network-mode: container:calico-bgp-rr-control-plane
exec:
- ip addr add 10.1.5.10/24 dev net0
- ip route replace default via 10.1.5.1
server2:
kind: linux
image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool
network-mode: container:calico-bgp-rr-worker
exec:
- ip addr add 10.1.5.11/24 dev net0
- ip route replace default via 10.1.5.1
server3:
kind: linux
image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool
network-mode: container:calico-bgp-rr-worker2
exec:
- ip addr add 10.1.8.10/24 dev net0
- ip route replace default via 10.1.8.1
server4:
kind: linux
image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool
network-mode: container:calico-bgp-rr-worker3
exec:
- ip addr add 10.1.8.11/24 dev net0
- ip route replace default via 10.1.8.1
links:
- endpoints: ["br-leaf0:br-leaf0-net0", "server1:net0"]
- endpoints: ["br-leaf0:br-leaf0-net1", "server2:net0"]
- endpoints: ["br-leaf1:br-leaf1-net0", "server3:net0"]
- endpoints: ["br-leaf1:br-leaf1-net1", "server4:net0"]
- endpoints: ["leaf0:eth1", "spine0:eth1"]
- endpoints: ["leaf0:eth2", "spine1:eth1"]
- endpoints: ["leaf0:eth3", "br-leaf0:br-leaf0-net2"]
- endpoints: ["leaf1:eth1", "spine0:eth2"]
- endpoints: ["leaf1:eth2", "spine1:eth2"]
- endpoints: ["leaf1:eth3", "br-leaf1:br-leaf1-net2"]
EOF
可以看到containerlab组网成功,vyos对应的交换机上的bgp路由协议配置参照文档末尾。
3.4 Calico cni插件部署安装
由于Calico默认安装的是ipip模式,需要手动进行关闭,不通过ipip/vxlan封装就会开启bgp模式。
kubectl apply -f calico.yaml
#kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.23/manifests/calico.yaml
Calico组件安装完成后,节点之间建立的BGP连接是fullmesh全连接的形式
3.5 Calico BGP RR模式开启
fullmesh全连接形式在大规模集群中并不适用,我们需要关闭bgp fullmesh的模式,采取bgp route reflector
方法如下: 3-disable-bgp-full-mesh.sh
#!/bin/bash
set -v
# 1. disable bgp fullmesh
cat <<EOF | calicoctl apply -f -
apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
name: default
spec:
logSeverityScreen: Info
nodeToNodeMeshEnabled: false
kind: BGPConfigurationList
metadata:
EOF
3.6 Calico node 配置BGP RR规则
kubernetes 集群中的节点作为BGP 路由反射器的客户端,需要和BGP路由反射器配置peer信息以达到同步路由的功能。
#!/bin/bash
set -v
# 1.3. add() bgp configuration for the nodes
cat <<EOF | calicoctl apply -f -
apiVersion: projectcalico.org/v3
kind: Node
metadata:
annotations:
labels:
rack: rack0
name: calico-bgp-rr-control-plane
spec:
addresses:
- address: 10.1.5.10
type: InternalIP
bgp:
asNumber: 65005
ipv4Address: 10.1.5.10/24
orchRefs:
- nodeName: calico-bgp-rr-control-plane
orchestrator: k8s
EOF
cat <<EOF | calicoctl apply -f -
apiVersion: projectcalico.org/v3
kind: Node
metadata:
labels:
rack: rack0
name: calico-bgp-rr-worker
spec:
addresses:
- address: 10.1.5.11
type: InternalIP
bgp:
asNumber: 65005
ipv4Address: 10.1.5.11/24
orchRefs:
- nodeName: calico-bgp-rr-worker
orchestrator: k8s
EOF
cat <<EOF | calicoctl apply -f -
apiVersion: projectcalico.org/v3
kind: Node
metadata:
labels:
rack: rack1
name: calico-bgp-rr-worker2
spec:
addresses:
- address: 10.1.8.10
type: InternalIP
bgp:
asNumber: 65008
ipv4Address: 10.1.8.10/24
orchRefs:
- nodeName: calico-bgp-rr-worker2
orchestrator: k8s
EOF
cat <<EOF | calicoctl apply -f -
apiVersion: projectcalico.org/v3
kind: Node
metadata:
labels:
rack: rack1
name: calico-bgp-rr-worker3
spec:
addresses:
- address: 10.1.8.11
type: InternalIP
bgp:
asNumber: 65008
ipv4Address: 10.1.8.11/24
orchRefs:
- nodeName: calico-bgp-rr-worker3
orchestrator: k8s
EOF
# 1.4. peer to leaf0 switch
cat <<EOF | calicoctl apply -f -
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: rack0-to-leaf0
spec:
peerIP: 10.1.5.1
asNumber: 65005
nodeSelector: rack == 'rack0'
EOF
# 1.5. peer to leaf1 switch
cat <<EOF | calicoctl apply -f -
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: rack1-to-leaf1
spec:
peerIP: 10.1.8.1
asNumber: 65008
nodeSelector: rack == 'rack1'
EOF
登录到集群中任意节点,查看BGP信息: 发现已经不再是BGP full mesh的形式了。node specific 表示该节点是路由反射器的客户端,对端即路由反射器是10.1.5.1这个地址
四 集群外访问Pod进行BGP验证测试
-
部署测试业务
apiVersion: apps/v1 kind: DaemonSet #kind: Deployment metadata: labels: app: app name: app spec: #replicas: 2 selector: matchLabels: app: app template: metadata: labels: app: app spec: containers: - image: swr.cn-north-4.myhuaweicloud.com/k8s-solution/nettool name: nettoolbox --- apiVersion: v1 kind: Service metadata: name: app spec: type: NodePort selector: app: app ports: - name: app port: 8080 targetPort: 80 nodePort: 32000
-
登录集群任意节点查看路由规则
例如:10.244.210.64/26 via 10.1.5.1 dev net0 proto bird
, 就是表示通过BGP协议学习的路由,bird则是calico中的BGP客户端 -
登录leaf0交换机查看BGP信息和路由规则
查看路由表:
可以发现leaf0交换机上存在k8s集群中的pod路由信息,也就是说可以访问集群中的pod
查看BGP信息:show ip bgp
可以明显看到:
前往地址为:10.1.8.0/24
||10.244.192.0/26
||10.244.210.64
的设备 下一跳有两个10.1.12.2
和10.1.10.2
属于EBGP路由,包含ECMP策略
前往地址为:10.244.81.64/26
||10.244.205.64/26
下一跳分别为10.1.5.10
||10.1.5.11
属于IBGP路由 -
访问测试
集群中pod互访
核心交换机访问集群pod
如果说核心交换机和公网配置ebgp规则同步路由后,公网流量也就能进入kubernetes集群中了。
五 Containerlab中的vyos容器镜像模拟交换机的配置文件
- spine0-boot.cfg如下:
interfaces { ethernet eth1 { address 10.1.10.2/24 duplex auto speed auto } ethernet eth2 { address 10.1.34.2/24 duplex auto speed auto } loopback lo { } } protocols { bgp { address-family { ipv4-unicast { network 10.1.10.0/24 { } network 10.1.34.0/24 { } } } neighbor 10.1.10.1 { address-family { ipv4-unicast { } } remote-as 65005 } neighbor 10.1.34.1 { address-family { ipv4-unicast { } } remote-as 65008 } parameters { bestpath { as-path { multipath-relax } } } system-as 500 } } system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name spine0 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC } // Warning: Do not remove the following line. // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // Release version: 1.4-rolling-202307070317
- spine1-boot.cfg
interfaces { ethernet eth1 { address "10.1.12.2/24" duplex "auto" mtu "9000" offload { gso { } sg { } } speed "auto" } ethernet eth2 { address "10.1.11.2/24" duplex "auto" mtu "9000" offload { gso { } sg { } } speed "auto" } loopback lo { } } protocols { bgp { address-family { ipv4-unicast { network 10.1.11.0/24 { } network 10.1.12.0/24 { } } } neighbor 10.1.11.1 { address-family { ipv4-unicast { } } remote-as "65008" } neighbor 10.1.12.1 { address-family { ipv4-unicast { } } remote-as "65005" } parameters { bestpath { as-path { multipath-relax { } } } router-id "10.1.8.1" } system-as "800" } } system { config-management { commit-revisions "100" } conntrack { modules { ftp { } h323 { } nfs { } pptp { } sip { } sqlnet { } tftp { } } } console { device ttyS0 { speed "9600" } } host-name "spine1" login { user vyos { authentication { encrypted-password "$6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/" plaintext-password "" } } } time-zone "UTC" } // Warning: Do not remove the following line. // // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // // Release version: 1.4-rolling-202307070317
- leaf0-boot.cfg
interfaces { ethernet eth1 { address 10.1.10.1/24 duplex auto mtu 9000 speed auto } ethernet eth2 { address 10.1.12.1/24 duplex auto mtu 9000 speed auto } ethernet eth3 { address 10.1.5.1/24 duplex auto mtu 9000 speed auto } loopback lo { } } nat { source { rule 100 { outbound-interface eth0 source { address 10.1.0.0/16 } translation { address masquerade } } } } protocols { bgp { address-family { ipv4-unicast { network 10.1.5.0/24 { } network 10.1.10.0/24 { } network 10.1.12.0/24 { } } } neighbor 10.1.5.10 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65005 } neighbor 10.1.5.11 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65005 } neighbor 10.1.10.2 { address-family { ipv4-unicast { } } remote-as 500 } neighbor 10.1.12.2 { address-family { ipv4-unicast { } } remote-as 800 } parameters { bestpath { as-path { multipath-relax } } router-id 10.1.5.1 } system-as 65005 } } system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name leaf0 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC } // Warning: Do not remove the following line. // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // Release version: 1.4-rolling-202307070317
- leaf1-boot.cfg
interfaces { ethernet eth1 { address 10.1.34.1/24 duplex auto mtu 9000 speed auto } ethernet eth2 { address 10.1.11.1/24 duplex auto mtu 9000 speed auto } ethernet eth3 { address 10.1.8.1/24 duplex auto mtu 9000 speed auto } loopback lo { } } nat { source { rule 100 { outbound-interface eth0 source { address 10.1.0.0/16 } translation { address masquerade } } } } protocols { bgp { address-family { ipv4-unicast { network 10.1.8.0/24 { } network 10.1.11.0/24 { } network 10.1.34.0/24 { } } } neighbor 10.1.8.10 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65008 } neighbor 10.1.8.11 { address-family { ipv4-unicast { nexthop-self { } route-reflector-client } } remote-as 65008 } neighbor 10.1.11.2 { address-family { ipv4-unicast { } } remote-as 800 } neighbor 10.1.34.2 { address-family { ipv4-unicast { } } remote-as 500 } parameters { bestpath { as-path { multipath-relax } } router-id 10.1.8.1 } system-as 65008 } } system { config-management { commit-revisions 100 } console { device ttyS0 { speed 9600 } } host-name leaf1 login { user vyos { authentication { encrypted-password $6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/ plaintext-password "" } } } time-zone UTC } // Warning: Do not remove the following line. // vyos-config-version: "bgp@4:broadcast-relay@1:cluster@1:config-management@1:conntrack@3:conntrack-sync@2:container@1:dhcp-relay@2:dhcp-server@6:dhcpv6-server@1:dns-dynamic@1:dns-forwarding@4:firewall@10:flow-accounting@1:https@4:ids@1:interfaces@29:ipoe-server@1:ipsec@12:isis@3:l2tp@4:lldp@1:mdns@1:monitoring@1:nat@5:nat66@1:ntp@2:openconnect@2:ospf@2:policy@5:pppoe-server@6:pptp@2:qos@2:quagga@11:rip@1:rpki@1:salt@1:snmp@3:ssh@2:sstp@4:system@26:vrf@3:vrrp@3:vyos-accel-ppp@2:wanloadbalance@3:webproxy@2" // Release version: 1.4-rolling-202307070317
- 点赞
- 收藏
- 关注作者
评论(0)