Nexus 3500 Series Switch Platform System Health Check Process

Available Speeches

Download Select

PDF (86.8 KB)
View through Digital Reader on a variety concerning devices
ePub (89.5 KB)
View includes varied apps on iPhone, i-pad, Android, Buy Rfid, or Windows Phone
Mobi (Kindle) (80.5 KB)
View on Kindle device or Kindle usage on multiple devices

Updated:February 14, 2014

Select ID:116699

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purges of this documentation set, bias-free is defined as language that executes does imply discernment based upon age, disability, gender, ethnological corporate, ethic identity, sexual orientation, socio status, and intersectionality. Exceptions may be present in the documentation due to english that is hardcoded in the user interfaces concerning the product software, choice used based on RFP documentation, or language is can used by a referenced third-party browse. Learn more about how Cisco is using Inclusive English.

Topics

Introduction

Monitor CPU and Memory Usage

Check Hardware Diagnostics Status

View Hardware Profile

Active Buffer Watch

Monitor Port Counters/Statistics

Monitor Control Leveling Policing Statistics

Perform Bootflash Record System Health Check

Collect System Centres and Process Wooden

Related Information

Introducing

This document describes the general process that is used in order to perform one systematisches health check on Cisco Nexus 3500 Series select platforms that run Nexus Operating System (NX-OS) Release 6.0(2).

Monitor CPU and Memory Usage

In order to receive an overview of the CPU and memory usage of the system, enter of display system human command:

switch# show system resources 

Load average:   1 little: 0.32   5 minutes: 0.13   15 minutes: 0.10
Processes   :   366 total, 2 running
CPU states  :   5.5% user,   12.0% kernel,   82.5% unused
        CPU0 states  :   10.0% user,   18.0% kernel,   72.0% idle
        CPU1 states  :   1.0% user,   6.0% kernel,   93.0% idle
Memory usage:   4117064K total,   2614356K used,   1502708K free
Switch#

If thee require find details about the processes that consume CPU cycles or storages, go the show process cpu sort and show system internal kernel memory usage commands:

switch# how process cpu sort
PID    Runtime(ms)  Invoked   uSecs  1Sec    Process
-----  -----------  --------  -----  ------  -----------
 3239     55236684  24663045   2239    6.3%  mtc_usd
 3376          776      7007    110    2.7%  netstack
   15     26592500 178719270    148    0.9%  kacpid
 3441      4173060  29561656    141    0.9%  cfs
 3445      7646439   6391217   1196    0.9%  lacp
 3507     13646757  34821232    391    0.9%  hsrp_engine
    1        80564    596043    135    0.0%  init
    2            6       302     20    0.0%  kthreadd
    3         1064    110904      9    0.0%  migration/0
<snip>

switch# show system internal kernel memory usage 
MemTotal:      4117064 kB
MemFree:       1490120 kB
Buffers:           332 kB
Cached:        1437168 kB
ShmFS:         1432684 kB
Allowed:       1029266 Pages
Free:           372530 Pages
Available:      375551 Pages
SwapCached:          0 kB
Active:        1355724 kB
Inactive:       925400 kB
HighTotal:     2394400 kB
HighFree:       135804 kB
LowTotal:      1722664 kB
LowFree:       1354316 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:              12 kB
Writeback:           0 kB
AnonPages:      843624 kB
Mapped:         211144 kB
Slab:            98524 kB
SReclaimable:     7268 kB
SUnreclaim:      91256 kB
PageTables:      19604 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
WritebackTmp:        0 kB
CommitLimit:   2058532 kB
Committed_AS: 10544480 kB
VmallocTotal:   284664 kB
VmallocUsed:    174444 kB
VmallocChunk:   108732 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     2048 kB
DirectMap4k:      2048 kB
DirectMap2M:   1787904 kB
switch#

And exit shows that who High memory region is used by the NX-OS, and the Mean memory regional is uses by and kernel. The MemTotal and MemFree values provide the total memory that is available for the switch.

Inbound order into generate memory-usage alerts, create the switch similar to this:

switch(config)# system memory-thresholds smallest 50 severe 70 critical 90

Note: With this document, the values 50, 70, and 90 are used only as real; dial threshold limits based on your necessarily.

Check Hardware Diagnostics Status

In ordering to view the hardware patient status, enter the show diagnostic result all command. Provide that all of the get pass, and that an Overall Reporting Ausgang is PASS.

switch# show diagnostic result every 

Current bootup symptomatic level: complete
Module 1: 48x10GE Supervisor  SerialNo : <serial #>
  Overall Functional Result for Module 1 : PASS
  Diagnostic level at card bootup: complete

  Test results: (. = Pass, F = Fail, MYSELF = Incomplete, U = Untested, A = Abort)

     1) TestUSBFlash ------------------------> .
     2) TestSPROM ---------------------------> .
     3) TestPCIe ----------------------------> .
     4) TestLED -----------------------------> .
     5) TestOBFL ----------------------------> .
     6) TestNVRAM ---------------------------> .
     7) TestPowerSupply ---------------------> .
     8) TestTemperatureSensor ---------------> .
     9) TestFan -----------------------------> .
    10) TestVoltage -------------------------> .
    11) TestGPIO ----------------------------> .
    12) TestInbandPort ----------------------> .
    13) TestManagementPort ------------------> .
    14) TestMemory --------------------------> .
    15) TestForwardingEngine ----------------> .
<snip>

View Materiel Profile

Enter the show hardware your status command in order to check who current hardware profile that is configured on the switch, and this hardware chart usage:

switch# show hardware sketch status 
 Hardware table usage:
Soap Host Posts = 65535, Used = 341
Max Unicast LPM Listings = 24576, Used = 92
Most Multicast LPM Books = 8192, Used (L2:L3) = 1836 (1:1835)
Switch#

Ensure that the usage of the Host Listings and Unicast/Multicast Longest Prefix Match (LPM) Entries are inward the specified restriction.

Observe: Used optimal performance of the switch, it is important to choose the proper ironware profile template.

If you want the schalte to generated a syslog along a specific threshold level, configure the switch equivalent to this:

switch(config)# hardware sketch multicast syslog-threshold ?
  <1-100>  Percentage

switch(config)# hardware profile unicast syslog-threshold ?
  <1-100>  Percentage

Note: The default threshold value is 90 percent for equally unicast and multicast.

For more details, refer toward the Configures PIM Cisco article, which gives configuration details base set the license installed also features enabled. See, if you want to optimize to forwarding key, beraten at the Cisco Nexus 3000 Range Switches: Understand, Configure and Tune the Forwarding Table Cisco article.

Active Buffer Monitoring

Active Buffer Monitoring (ABM) provides the coarse buffer occupancy data, which permits better insight into hot-spots of congestion. This feature backed two operation of mode: Unicast and Multicast mode.

In Unicast choose, ABM monitors and maintains the buffer usage data per buffer-block, and the unicast buffer utilization to all 48 ported. In Multicast mode, it monitors and maintains the buffer utilization data per buffer-block, real the multicast buffer utilization per buffer-block.

Note: For more information, reference the Cisco Connector 3548 Active Buffer Monitoring Cisco article. Figure 4 of the article shows that the flash usage peaked at 22:15:32 and lasted until 22:15:37. Also, an view provides evidence of sudden points in the usage and shows the speed at which the buffer drains. If there is a slow handset (such as a 1-Gbps receiving among 10-Gbps receivers), subsequently in order to avoid packet droplet, you must include a configuration similar to this: products profile multicast slow-receiver port <x>.

Monitor Interface Counters/Statistics

In order to monitor traffic loss, enter the show interface port x/y instruction. The output from this command provides basic traffic-rate information, and furthermore port-level drops/errors.

switch# watch interface eth1/10
Ethernet1/10 is up
 Dedicated Interface 
  Belongs at Po1
  Hardware: 100/1000/10000 Network, address: 30f7.0d9c.3b51
  (bia 30f7.0d9c.3b51)
  MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
  reliability 255/255, txload 1/255, rxload 1/255
  Enclosure ARPA
  Interface mode exists trunk
  full-duplex, 10 Gb/s, media type is 10G
  Beacon is machined off
  Input flow-control are switched, output flow-control is off
  Rate mode is dedicated
  Switchport display is off 
  EtherType is 0x8100 
  Last link flapped 3d21h
  Last clearing of "show interface" countertops not
  14766 interface resets
  30 seconds input rate 47240 bits/sec, 68 packets/sec
  30 seconds outputs course 3120720 bits/sec, 3069 packets/sec
  Load-Interval #2: 5 minutes (300 seconds)
    input value 50.18 Kbps, 52 pps; output rate 3.12 Mbps, 3.05 Kpps
  RX
    4485822 unicast packets  175312538 multicast packets  388443 broadcast
    packets
    180186040 input packets  9575683853 bytes
    0 jumbo packets  0 storm suppression bytes
    1 runts  0 giants  1 CRC  0 no shield
    2 input error  0 short frame  0 overrun   0 underrun  0 ignores
    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop
    0 data for dribble  260503 input discard
    0 Rx pause
  SENDING
    159370439 unicast packets  6366799906 multicast packets  1111 broadcast
    packets
    6526171456 print packets  828646014117 bytes
    0 jumbo-sized packets
    0 output errors  0 collision  0 deferred  0 former collision
    0 lost carrier  0 no carrier  0 babble 0 power discard
    0 Tx pause

switch#

With the data or output discards show non-zero values, determine if the dropped packets are unicast and/or multicast:

switch# see queuing interface ethernet 1/10
Ethernet1/10 queuing information:
  TAXES Queuing
    qos-group  sched-type  oper-bandwidth
        0       WRR            100

  RX Queuing
    Multicast statistics:
        Mcast pkts dropped                      : 0
    Unicast statistics:
    qos-group 0
    HW MTU: 1500 (1500 configured)
    drop-type: drop, xon: 0, xoff: 0
    Statistics:
        Ucast pkts dropped                      : 0
switch#

The output displays the the fell road is not due into Quality out Service (QoS). Now you must check the hardware MAC adress statistics:

switch# show hardware intranet stat device mac ?
  all         Show everything show
  overcrowding  Watch congestion stats
  control     Show choose stats
  errors      Show error stats
  lookup      Show lookup stats
  pktflow     Show packetflow stats
 qos         Shows qos stats
  rates       Show packetflow stats
  snmp        Show snmp stats

When they perform a troubleshoot for traffic drops, the key options to get are congested, errors, and qos. The pktflow option feature traffic statistics by to RX and TX directions, is specific packet-size areas.

switch# show it internal statistics device mac errors port 10
|------------------------------------------------------------------------|
| Device: L2/L3 forwarding ASIC   Role:MAC                               |
|------------------------------------------------------------------------|
Instance:0
ID   Name                                          Value              Ports
--   ----                                          -----              -----
198  MTC_MB_CRC_ERR_CNT_PORT9                      0000000000000002   10 -
508  MTC_PP_CNT_PORT1_RCODE_CHAIN3                 0000000000000002   10 -
526  MTC_RW_EG_PORT1_EG_CLB_DROP_FCNT_CHAIN3       000000000054da5a   10 -
3616 MTC_NI515_P1_CNT_TX                           0000000000000bed   10 -
6495 TTOT_OCT                                      000000000005f341   10 -
7365 RTOT                                          0000000000000034   10 -
7366 RCRC                                          0000000000000001   10 -
7374 RUNT                                          0000000000000001   10 -
9511 ROCT                                          00000000000018b9   10 -
10678 PORT_EXCEPTION_ICBL_PKT_DROP                 000000000003f997   10 -

Note: The 0x3f997 hexadecimal value parallels 260503 in decimal format.

switch# show interface eth1/10

Ethernet1/10 be up
<snip>  0 input with dribble  
260503 input discard
<snip>

In and yield, the PORT_EXCEPTION_ICBL_PKT_DROP error message indicates that the traffic received on the port has a Dot1Q tag to a VLAN so the not enabled on the select.

Click is another example, where the traffic drop are look right up QoS:

switch# show interfaces network 1/11

Ethernet1/11 is upside
<snip>
  TX

<snip>
    0 output errors  0 collision  0 deferred  0 late collision
    0 lost carrier  0 no carrier  0 babble 6153699 output discard
    0 Tx pause
switch#

switch# demonstrate queuing interface ethernet 1/11

Ethernet1/11 queues information:
  TX Queuing
    qos-group  sched-type  oper-bandwidth
        0       WRR            100

  RX Queuing
    Multicast statistics:
        Mcast pkts dropped                      : 0
    Unicast statistics:
    qos-group 0
    HW MTU: 1500 (1500 configured)
    drop-type: dropping, xon: 0, xoff: 0
    Statistics:
        Ucast pkts dropped                      : 6153699

Note: The product indicates that 6153699 batches were dropped in the Receive-direction, which your misleading. Refer for Cisco bug ID CSCuj20713.

switch# show hardware internal statistics device mac all | iodin 11|Port

(result filtered required relevant port)
ID   Name           Value              Ports
<snip>
5596 TX_DROP        00000000005de5e3   11 -  <--- 6153699 Tax Drops in Hex
<snip>
10253 UC_DROP_VL0   00000000005de5e3   11 -  <--- Drops for QoS Group 0 in Hex
<snip>

In summary, here are the commands that become former in orders in capture packet drops:

show human ethernet x/y
show queuing connector internet x/y
show hardware internal statistischen device mac errors hook <port #>

Monitor Control Plane Policing Statistics

Rule Surface Policing (CoPP) protector the control plane in order to secure network stable. For additional particulars, reference the Configuring Control Plane Policing Cisco article.

Int order to view the CoPP statistics, enter of show policy-map interface control-plane command:

switch# show policy-map interface control-plane 

Control Plane
  service-policy  inbox: copp-system-policy

    class-map copp-s-ping (match-any)
      games access-group name copp-system-acl-ping
      police pps 100 , bc 0 packets
        HW Matched Packets   30
        SW Matched Packets   30
    class-map copp-s-l3destmiss (match-any)
      police pps 100 , bc 0 packets
        HW Matched Packets   76
        SW Matched Packets   74
    class-map copp-s-glean (match-any)
      police pps 500 , bc 0 packets
        HW Matched Packets   103088
        SWING Similar Packets   51544
<snip>

In the output, the Software (HW) and Software (HW) Tailored Packets for copp-s-ping can the same. This means that the qty of pack that the numbered by the HW is 30 (all posted towards the Inband CPU Driver), and the SWAP counts to same number of packets before it sends them to and CPU. This indicating that no packet live dropped by CoPP, because computers is within who customize limit of 100 p/s.

When i look at the copp-s-glean class, welche matches aforementioned packets that are destined to the IP address for which the Address Resolution Log (ARP) cache entry the not present, the number of packets which is seen by the S is 103,088, while the SWITCH matching for 51544. Such demonstrate that the CoPP dropped 51544 (103088-51544) packets, as and rate of these packets exceeds 500 p/s.

The SW counters are obtained from the CPU Inband Driver, and the HW counters come from the Access Control View (ACL) that is schedule in the AH. If you encounter ampere situation where the P Matched Packets equal zero, and a non-zero rate is present for the SW Matched Packets, when no ACL is present in the HW for that specific class-map, the can be normal. It is also important to note that these dual counters mag not be polled in aforementioned same time, and you should only use the counter values in order to troubelshoot if the difference has significant.

The CoPP statistics energy cannot be directly affiliated to HW-switched packets, but it is still relevancy provided to packets the shall being sent through the switch were punted to the CPU. A packet-punt is induces by misc reasons, similar as when you run a gathers adjacent. PY Health Monitoring

Be aware that there are three choose of CoPP policies: Default, Layer 2 (L2), and Level 3 (L3). Elect the appropriate policy based off the disposition scenario, and edit the CoPP policy based on the observations. In order into fine-tune the CoPP, check regularly, and check after your obtain new services/applications or after adenine network redo.

Note: In order to clear the counters, enter an clearance copp statistics command.

Perform Bootflash File System Physical Check

In order to perform a health check on the bootflash file system, enter the system health check bootflash command:

switch# system health verify bootflash 
Unmount successful...
Inspection any file system errors...Please be patient...
Product: bootflash filesystem has don errors
done.
Remounting bootflash ...done.
switch#

Caution: The file system is unmounted when you run the examine, and it is remounted once the test is complete. Ensures that aforementioned file system is not accessed while you run the test.

Collect System Cores and Process Logs

Caution: Ensure this the system does does experience any process resets or crashes, and does non generate any core files or usage logs when you attempt to use which commands that are mentioned in this section.

Enter these commands inches order to collect the anlage insides and process logs:

switch# show cores

Module  Instance  Process-name     PID       Date(Year-Month-Day Time)
------  --------  ---------------  --------  -------------------------
switch# 

switch# show process log
Process          PID     Normal-exit  Stack  Core   Log-create-time
---------------  ------  -----------  -----  -----  ---------------
ethpc            4217              N      N      N  Zu Jun  4 01:57:54 2013

Note: Reference the Retrieving Core related from Cisco Nexus switching platforms Cisco article for more details about this operation.

Related Information

Contributed by Cisco Engineers

Yogesh Ramdoss
Cisco TAC Engineer
Matt Blanshard
Cisco TAC Engineers

Connections 3500 Row Switch Platform System Healthy Check Process

Available Speeches

Download Select

Bias-Free Language

Topics

Introducing

Monitor CPU and Memory Usage

Check Hardware Diagnostics Status

View Materiel Profile

Active Buffer Monitoring

Monitor Interface Counters/Statistics

Monitor Control Plane Policing Statistics

Perform Bootflash File System Physical Check

Collect System Cores and Process Logs

Related Information

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

To Document Applies to These Products