OA办公网络中断故障分析处理案例
一、故障现象
某日公司突然出现OA办公网络中断故障,终端无法认证、ping网关不通等现象。
二、网络拓扑
如上图所次,各部门的办公室终端主要通过楼层的华为S9306交换机上联到核心层的两台思科C6509 “倒三角”二层组网,网关在SRX3400防火墙。
三、故障原因分析
1、故障发生时,使用console线登陆核心交换机C6509-1查看日志等生成树状态信息:
SW1-C6509#sh spanning-tree bri
Hello Max Fwd
Vlan Bridge ID Time Age Dly Protocol
---------------- --------------------------------- ----- --- --- --------
VLAN0001 28673 (28672,1) 001f.9d01.6000 2 20 15 rstp
VLAN0002 28674 (28672,2) 001f.9d01.6000 2 20 15 rstp
VLAN0003 28675 (28672,3) 001f.9d01.6000 2 20 15 rstp
VLAN0004 28676 (28672,4) 001f.9d01.6000 2 20 15 rstp
VLAN0005 28677 (28672,5) 001f.9d01.6000 2 20 15 rstp
VLAN0006 28678 (28672,6) 001f.9d01.6000 2 20 15 rstp
VLAN0007 28679 (28672,7) 001f.9d01.6000 2 20 15 rstp
VLAN0008 28680 (28672,8) 001f.9d01.6000 2 20 15 rstp
VLAN0009 28681 (28672,9) 001f.9d01.6000 2 20 15 rstp
VLAN0010 28682 (28672,10) 001f.9d01.6000 2 20 15 rstp
VLAN0011 28683 (28672,11) 001f.9d01.6000 2 20 15 rstp
VLAN0012 28684 (28672,12) 001f.9d01.6000 2 20 15 rstp
VLAN0013 28685 (28672,13) 001f.9d01.6000 2 20 15 rstp
。。。。。。。。。。。。。。
VLAN0025 28697 (28672,25) 001f.9d01.6000 2 20 15 rstp
VLAN0026 28698 (28672,26) 001f.9d01.6000 2 20 15 rstp
VLAN0027 28699 (28672,27) 001f.9d01.6000 2 20 15 rstp
VLAN0028 28700 (28672,28) 001f.9d01.6000 2 20 15 rstp
VLAN0030 28702 (28672,30) 001f.9d01.6000 2 20 15 rstp
VLAN0033 28705 (28672,33) 001f.9d01.6000 2 20 15 rstp
VLAN0035 28707 (28672,35) 001f.9d01.6000 2 20 15 rstp
VLAN0044 28716 (28672,44) 001f.9d01.6000 2 20 15 rstp
查看到RSTP生成树的根桥还是在C6509-2的根桥上。
SW1-C6509# sh spanning-tree root port
VLAN0001 Port-channel1
VLAN0002 Port-channel1
VLAN0003 Port-channel1
VLAN0004 Port-channel1
VLAN0005 Port-channel1
VLAN0006 Port-channel1
VLAN0007 Port-channel1
VLAN0008 Port-channel1
VLAN0009 Port-channel1
VLAN0010 Port-channel1
VLAN0011 Port-channel1
VLAN0012 Port-channel1
。。。。。。。。。。。。。。。。
VLAN0026 Port-channel1
VLAN0027 Port-channel1
VLAN0028 Port-channel1
VLAN0030 Port-channel1
VLAN0033 Port-channel1
VLAN0035 Port-channel1
VLAN0044 Port-channel1
VLAN0048 Port-channel1
再检查分析C6509-1上的日志,如下:
SW1-C6509# sh logging | include SE
Sep 8 17:12:11.039 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control
Sep 8 17:12:17.123 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control
Sep 8 17:13:02.263 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control
Sep 8 17:13:43.479 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control
Sep 8 17:14:24.376 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control
Sep 8 17:14:45.368 word: %BGP-3-NOTIFICATION: received from neighbor 192.168.239.153 4/0 (hold time expired) 0 bytes
Sep 8 17:14:45.372 word: %BGP-5-ADJCHANGE: neighbor 192.168.239.153 Down BGP protocol initialization
Sep 8 17:15:05.600 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control
Sep 8 17:15:17.372 word: %BGP-5-ADJCHANGE: neighbor 192.168.239.153 Up
Sep 8 17:15:36.348 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control
可以看到从两台C6509互联端口 GigabitEthernet5/2、GigabitEthernet6/2不断产生广播抑制的日志,说明对端设备在不断开发送广播报文。为了抢通业务恢复,关闭C6509互联端口 GigabitEthernet5/2、 GigabitEthernet6/2后,测试业务仍未恢复。
2、由于广播报文是从另一台主交换机C6509-2发送过来的,因此使用console线登陆核心交换机C6509-2查看日志信息。
SW2-C6509#sh logging | include Sep
Sep 8 17:10:46.168 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:10:54.716 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet6/2 dropped packets due to storm control
Sep 8 17:11:06.860 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:11:24.681 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet6/2 dropped packets due to storm control
Sep 8 17:11:38.133 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:11:58.897 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:12:09.461 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:12:17.989 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:12:30.461 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:12:40.725 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:12:51.377 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:13:01.637 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:13:12.021 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:13:22.641 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:13:29.413 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet6/2 dropped packets due to storm control
Sep 8 17:13:43.397 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:13:53.729 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
Sep 8 17:14:00.853 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control
发现C6509-2的端口GigabitEthernet1/21不断出现广播抑制日志,说明该端口正在接收到大量的广播报文,查看该端口配置及流量状态:
SW2-C6509# sh run int gigabitEthernet 1/21
Building configuration...
Current configuration : 238 bytes
!
interface GigabitEthernet1/21
description to_4F_D7_HW9306_G3/0/14
switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 3,10
switchport mode trunk
speed nonegotiate
storm-control broadcast level 1.00
end
SW2-C6509#sh int gigabitEthernet 1/21
GigabitEthernet1/21 is administratively down, line protocol is down (disabled)
Hardware is C6k 1000Mb 802.3, address is 001e.f7c9.9abc (bia 001e.f7c9.9abc)
Description: to_4F_D7_HW9306_G3/0/14
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
reliability 255/255, txload 0/255, rxload 0/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is SX
input flow-control is off, output flow-control is off
Clock mode is auto
ARP type: ARPA, ARP Timeout 04:00:00
Last input 6w2d, output 00:00:06, output hang never
Last clearing of "show interface" counters never
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
1554681600 packets input, 187243649749 bytes, 0 no buffer
Received 1148254598 broadcasts (1138689465 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
可以看到该端口接收到大量的广播包,为了抢通业务,将该端口关闭,再测试业务即时得到了恢复。
3、接着检查接入交换机HW9306的状态,查看其日志与生成树状态。
< SW1-HW_S9303>dis logbuffer
Sep 8 2016 18:11:38 SW1-HW_S9303 %%01IFNET/4/IF_STATE(l): Interface GigabitEthernet3/0/14 has turned into DOWN state.
Sep 8 2016 18:10:45 SW1-HW_S9303 %%01MSTP/4/ROOT_GUARD(l): MSTP process 0 Instance0's ROOT-Protection port GigabitEthernet3/0/14 received superior message!
Jul 26 2016 13:49:34 SW1-HW_S9303 %%01HWCM/4/TRAPLOG(l): OID 1.3.6.1.4.1.2011.6.10.2.1 configure changed. (EventIndex=15, CommandSource=1, ConfigSource=2, ConfigDestination=4)
发现HW9306上面的故障发生时段日志几乎没有,继续检查其生成树状态。
< SW1-HW_S9303>dis stp bri
MSTID Port Role STP State Protection
0 GigabitEthernet1/0/0 DESI FORWARDING BPDU
0 GigabitEthernet1/0/1 DESI FORWARDING BPDU
0 GigabitEthernet1/0/2 DESI FORWARDING BPDU
0 GigabitEthernet3/0/12 DESI FORWARDING NONE
0 GigabitEthernet4/0/0 DESI FORWARDING BPDU
0 GigabitEthernet4/0/1 DESI FORWARDING BPDU
0 GigabitEthernet4/0/2 DESI FORWARDING BPDU
0 GigabitEthernet4/0/3 DESI FORWARDING BPDU
在这里发现HW9306的端口状态全部是DESI指向端口,说明目前它是根桥,这是不正常的,而且上连C6509-1的端口GigabitEthernet3/0/12状态也是DESI,应该变为root状态才对。而原来上连C6509-2的端口GigabitEthernet3/0/14状态因关闭而没显示。
4、为了确定生成树的配置是否正确,检查HW9306上连两台C6509的端口配置。
1)上连C6509-1端口配置:
interface GigabitEthernet3/0/12
description to_6509-1_G1/2
port link-type trunk
port trunk allow-pass vlan 3 10
undo negotiation auto
2)上连C6509-2端口配置:
interface GigabitEthernet3/0/14
description to_6509-2_G1/21
port link-type trunk
undo port trunk allow-pass vlan 1
port trunk allow-pass vlan 3 10
stp root-protection
undo negotiation auto
broadcast-suppression 10
发现HW9306上连C6509-2的端口GigabitEthernet3/0/14下面配置了根端口保护功能,该功能的目的是确保启用了根保护的端口成为指定端口。通常一个根桥的所有端口均为指定端口。在这里配置不合适。
在HW9306上连C6509-2的端口GigabitEthernet3/0/14配置根保护功能的情况下,HW9306另一个上连C6509-1端口GigabitEthernet3/0/12也是指定端口,这样HW9306的所有端口都是指定端口,会间接造成HW9306变成根桥。
当HW9306变成根桥时,它的两个上连C6509端口都指定端口处于FORWARDING状态,跟两台C6509组成倒三角拓扑产生网络环路,产生大量广播风暴影响到核心交换机C6509从而导致整个办公网络瘫痪无法正常运作。
因此,根据上述的日志分析和推理,可以判断汇聚交换机S9306由于其上连C6509-2的端口配置了根保护,间接造成S9306变成根桥,与上联核心交换机C6509“倒三角”组网架构形成环路产生大量广播风暴,造成两台核心交换机C6509产生主备HSRP频繁切换,二层数据报文处理出问题,最终导致整个M办公网络瘫痪,出现网络业务中断故障。
四、解决方法
1. 将HW S9306的上连C6509-2端口GigabitEthernet3/0/14下面配置根端口保护命令删除,再重新开启C6509-2下联HW S9306的端口Gi1/21,检测HW S9306的生成树状态,正常情况下其上连C6509-2端口GigabitEthernet3/0/14为根端口,上连C6509-1端口GigabitEthernet3/0/12为阻塞端口。
2. 此外,为了防范未经充许的其它设备接入导致环路,将HW S9306的未使用端口全部手工关闭。
3. S9306交换机新接入端口全部配置为边缘端口egde-port。
- 点赞
- 收藏
- 关注作者
评论(0)