OA办公网络中断故障分析处理案例

举报
highwin 发表于 2019/01/27 20:08:37 2019/01/27
【摘要】 本人亲身处理过的一个多厂家设备组网环境下,关于二层生成树STP+HRSP组网环境下的故障处理案例。

一、故障现象

某日公司突然出现OA办公网络中断故障,终端无法认证、ping网关不通等现象。

二、网络拓扑

QQ截图20190127182258.jpg

如上图所次,各部门的办公室终端主要通过楼层的华为S9306交换机上联到核心层的两台思科C6509 “倒三角”二层组网,网关在SRX3400防火墙。

三、故障原因分析

1、故障发生时,使用console线登陆核心交换机C6509-1查看日志等生成树状态信息:

SW1-C6509#sh spanning-tree bri

                                                   Hello  Max  Fwd

Vlan                         Bridge ID              Time  Age  Dly  Protocol

---------------- --------------------------------- -----  ---  ---  --------

VLAN0001            28673 (28672,1) 001f.9d01.6000    2    20   15  rstp       

VLAN0002            28674 (28672,2) 001f.9d01.6000    2    20   15  rstp       

VLAN0003            28675 (28672,3) 001f.9d01.6000    2    20   15  rstp       

VLAN0004            28676 (28672,4) 001f.9d01.6000    2    20   15  rstp       

VLAN0005            28677 (28672,5) 001f.9d01.6000    2    20   15  rstp       

VLAN0006            28678 (28672,6) 001f.9d01.6000    2    20   15  rstp       

VLAN0007            28679 (28672,7) 001f.9d01.6000    2    20   15  rstp       

VLAN0008            28680 (28672,8) 001f.9d01.6000    2    20   15  rstp       

VLAN0009            28681 (28672,9) 001f.9d01.6000    2    20   15  rstp       

VLAN0010           28682 (28672,10) 001f.9d01.6000    2    20   15  rstp       

VLAN0011           28683 (28672,11) 001f.9d01.6000    2    20   15  rstp       

VLAN0012           28684 (28672,12) 001f.9d01.6000    2    20   15  rstp        

VLAN0013           28685 (28672,13) 001f.9d01.6000    2    20   15  rstp       

。。。。。。。。。。。。。。

VLAN0025           28697 (28672,25) 001f.9d01.6000    2    20   15  rstp       

VLAN0026           28698 (28672,26) 001f.9d01.6000    2    20   15  rstp       

VLAN0027           28699 (28672,27) 001f.9d01.6000    2    20   15  rstp       

VLAN0028           28700 (28672,28) 001f.9d01.6000    2    20   15  rstp       

VLAN0030           28702 (28672,30) 001f.9d01.6000    2    20   15  rstp       

VLAN0033           28705 (28672,33) 001f.9d01.6000    2    20   15  rstp       

VLAN0035           28707 (28672,35) 001f.9d01.6000    2    20   15  rstp       

VLAN0044           28716 (28672,44) 001f.9d01.6000    2    20   15  rstp       

查看到RSTP生成树的根桥还是在C6509-2的根桥上。

SW1-C6509# sh spanning-tree root port                                                                                          

VLAN0001         Port-channel1

VLAN0002         Port-channel1

VLAN0003         Port-channel1

VLAN0004         Port-channel1

VLAN0005         Port-channel1

VLAN0006         Port-channel1

VLAN0007         Port-channel1

VLAN0008         Port-channel1

VLAN0009         Port-channel1

VLAN0010         Port-channel1

VLAN0011         Port-channel1

VLAN0012         Port-channel1

。。。。。。。。。。。。。。。。

VLAN0026         Port-channel1

VLAN0027         Port-channel1

VLAN0028         Port-channel1

VLAN0030         Port-channel1

VLAN0033         Port-channel1

VLAN0035         Port-channel1

VLAN0044         Port-channel1

VLAN0048         Port-channel1

再检查分析C6509-1上的日志,如下:

SW1-C6509#    sh logging | include SE

Sep  8 17:12:11.039 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control

Sep  8 17:12:17.123 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control

Sep  8 17:13:02.263 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control

Sep  8 17:13:43.479 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control

Sep  8 17:14:24.376 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control

Sep  8 17:14:45.368 word: %BGP-3-NOTIFICATION: received from neighbor 192.168.239.153 4/0 (hold time expired) 0 bytes

Sep  8 17:14:45.372 word: %BGP-5-ADJCHANGE: neighbor 192.168.239.153 Down BGP protocol initialization

Sep  8 17:15:05.600 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control

Sep  8 17:15:17.372 word: %BGP-5-ADJCHANGE: neighbor 192.168.239.153 Up

Sep  8 17:15:36.348 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet5/2 dropped packets due to storm control

   可以看到从两台C6509互联端口 GigabitEthernet5/2、GigabitEthernet6/2不断产生广播抑制的日志,说明对端设备在不断开发送广播报文。为了抢通业务恢复,关闭C6509互联端口 GigabitEthernet5/2、 GigabitEthernet6/2后,测试业务仍未恢复。

2、由于广播报文是从另一台主交换机C6509-2发送过来的,因此使用console线登陆核心交换机C6509-2查看日志信息。

SW2-C6509#sh logging | include Sep

Sep  8 17:10:46.168 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:10:54.716 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet6/2 dropped packets due to storm control

Sep  8 17:11:06.860 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:11:24.681 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet6/2 dropped packets due to storm control

Sep  8 17:11:38.133 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:11:58.897 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:12:09.461 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:12:17.989 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:12:30.461 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:12:40.725 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:12:51.377 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:13:01.637 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:13:12.021 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:13:22.641 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:13:29.413 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet6/2 dropped packets due to storm control

Sep  8 17:13:43.397 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:13:53.729 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

Sep  8 17:14:00.853 word: %PM_PLATFORM-5-PORTDROP: Port GigabitEthernet1/21 dropped packets due to storm control

发现C6509-2的端口GigabitEthernet1/21不断出现广播抑制日志,说明该端口正在接收到大量的广播报文,查看该端口配置及流量状态:

SW2-C6509# sh run int gigabitEthernet 1/21

Building configuration...

Current configuration : 238 bytes

!

interface GigabitEthernet1/21

 description to_4F_D7_HW9306_G3/0/14

 switchport

 switchport trunk encapsulation dot1q

 switchport trunk allowed vlan 3,10

 switchport mode trunk

 speed nonegotiate

 storm-control broadcast level 1.00

end

SW2-C6509#sh int gigabitEthernet 1/21

GigabitEthernet1/21 is administratively down, line protocol is down (disabled)

  Hardware is C6k 1000Mb 802.3, address is 001e.f7c9.9abc (bia 001e.f7c9.9abc)

  Description: to_4F_D7_HW9306_G3/0/14

  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,

     reliability 255/255, txload 0/255, rxload 0/255

  Encapsulation ARPA, loopback not set

  Keepalive set (10 sec)

  Full-duplex, 1000Mb/s, media type is SX

  input flow-control is off, output flow-control is off

  Clock mode is auto

  ARP type: ARPA, ARP Timeout 04:00:00

  Last input 6w2d, output 00:00:06, output hang never

  Last clearing of "show interface" counters never

  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0

  Queueing strategy: fifo

  Output queue: 0/40 (size/max)

  5 minute input rate 0 bits/sec, 0 packets/sec

  5 minute output rate 0 bits/sec, 0 packets/sec

     1554681600 packets input, 187243649749 bytes, 0 no buffer

     Received 1148254598 broadcasts (1138689465 multicasts)

     0 runts, 0 giants, 0 throttles

     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

     0 watchdog, 0 multicast, 0 pause input

可以看到该端口接收到大量的广播包,为了抢通业务,将该端口关闭,再测试业务即时得到了恢复。

3、接着检查接入交换机HW9306的状态,查看其日志与生成树状态。

< SW1-HW_S9303>dis logbuffer

Sep  8 2016 18:11:38 SW1-HW_S9303 %%01IFNET/4/IF_STATE(l): Interface GigabitEthernet3/0/14 has turned into DOWN state.

Sep  8 2016 18:10:45 SW1-HW_S9303 %%01MSTP/4/ROOT_GUARD(l): MSTP process 0 Instance0's ROOT-Protection port GigabitEthernet3/0/14 received superior message!

Jul 26 2016 13:49:34 SW1-HW_S9303 %%01HWCM/4/TRAPLOG(l): OID 1.3.6.1.4.1.2011.6.10.2.1 configure changed. (EventIndex=15, CommandSource=1, ConfigSource=2, ConfigDestination=4)

   发现HW9306上面的故障发生时段日志几乎没有,继续检查其生成树状态。

< SW1-HW_S9303>dis stp bri

 MSTID      Port                   Role  STP State     Protection

   0        GigabitEthernet1/0/0   DESI  FORWARDING      BPDU

   0        GigabitEthernet1/0/1   DESI  FORWARDING      BPDU

   0        GigabitEthernet1/0/2   DESI  FORWARDING      BPDU

   0        GigabitEthernet3/0/12  DESI  FORWARDING      NONE

   0        GigabitEthernet4/0/0   DESI  FORWARDING      BPDU

   0        GigabitEthernet4/0/1   DESI  FORWARDING      BPDU

   0        GigabitEthernet4/0/2   DESI  FORWARDING      BPDU

       0        GigabitEthernet4/0/3   DESI  FORWARDING      BPDU

     在这里发现HW9306的端口状态全部是DESI指向端口,说明目前它是根桥,这是不正常的,而且上连C6509-1的端口GigabitEthernet3/0/12状态也是DESI,应该变为root状态才对。而原来上连C6509-2的端口GigabitEthernet3/0/14状态因关闭而没显示。

4、为了确定生成树的配置是否正确,检查HW9306上连两台C6509的端口配置。

1)上连C6509-1端口配置:

interface GigabitEthernet3/0/12

 description to_6509-1_G1/2

 port link-type trunk

 port trunk allow-pass vlan 3 10

 undo negotiation auto

2)上连C6509-2端口配置:

interface GigabitEthernet3/0/14

 description to_6509-2_G1/21

 port link-type trunk

 undo port trunk allow-pass vlan 1

 port trunk allow-pass vlan 3 10

 stp root-protection

 undo negotiation auto

 broadcast-suppression 10

发现HW9306上连C6509-2的端口GigabitEthernet3/0/14下面配置了根端口保护功能,该功能的目的是确保启用了根保护的端口成为指定端口。通常一个根桥的所有端口均为指定端口。在这里配置不合适。

在HW9306上连C6509-2的端口GigabitEthernet3/0/14配置根保护功能的情况下,HW9306另一个上连C6509-1端口GigabitEthernet3/0/12也是指定端口,这样HW9306的所有端口都是指定端口,会间接造成HW9306变成根桥。

    当HW9306变成根桥时,它的两个上连C6509端口都指定端口处于FORWARDING状态,跟两台C6509组成倒三角拓扑产生网络环路,产生大量广播风暴影响到核心交换机C6509从而导致整个办公网络瘫痪无法正常运作。

     因此,根据上述的日志分析和推理,可以判断汇聚交换机S9306由于其上连C6509-2的端口配置了根保护,间接造成S9306变成根桥,与上联核心交换机C6509“倒三角”组网架构形成环路产生大量广播风暴,造成两台核心交换机C6509产生主备HSRP频繁切换,二层数据报文处理出问题,最终导致整个M办公网络瘫痪,出现网络业务中断故障。

四、解决方法

1.        HW S9306的上连C6509-2端口GigabitEthernet3/0/14下面配置根端口保护命令删除,再重新开启C6509-2下联HW S9306的端口Gi1/21,检测HW S9306的生成树状态,正常情况下其上连C6509-2端口GigabitEthernet3/0/14为根端口,上连C6509-1端口GigabitEthernet3/0/12为阻塞端口。

2.        此外,为了防范未经充许的其它设备接入导致环路,将HW S9306的未使用端口全部手工关闭。

3.        S9306交换机新接入端口全部配置为边缘端口egde-port


【版权声明】本文为华为云社区用户原创内容,未经允许不得转载,如需转载请自行联系原作者进行授权。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。