Collaborating to solve Latency Issues on our Network

I had noticed some minor latency issues on our network recently.  By latency I mean significant differences in ping times, and intermittent packet loss.  No one has noticed the issue or complained so it is not really an “Official Problem” however I would like to resolve this before it becomes an issue.

In Dec 2010 I upgraded and simplified our network by purchasing about 20 x 6249P switches.  I need POE for the IP Phones, Wireless Access Points and in the future for Video Surveillance and School Bells.  Dell incidentally did a great job with my orginal network configuration.

So here we are, six months after the upgrade wondering what is causing this latency and how we are going to solve it.  I invited Brad Joyce from Suncoast Christian College to give me a hand and together we addressed this problem.

As it turns out some of our CAT 5 cable distances are simply too long. The permanent buildings are connected to the CORE by fibre optic cable, and the prefabricated temporary buildings depend on CAT 5 back to the CORE.  E Block is 107m and C Block is 149m away from the core.

We discovered issues on 3 of the CAT5 Cables leading to the Primary Buildings, on one of the cables we had 1744493 recalculations.  And on the CORE Switch we discovered that IPMapFordwardingTask was running at about .3% at the start of the day and reached about 30% by midday. In an ideal situation it should be less that 5%.

To solve this issue we turned the SPANNING TREE on the edge switches, this causes them to run as dumb switches which has it’s drawbacks. (We have no Loopback protection without SPANNING TREE running).

Brad’s Comments

As regards a summary of what happened.
At first I had to familiarise myself with your network design/layout.
After this I had a look at your switches running configs.
Then looked a the switch logs and counters to see if anything stood out.
Found some problems with the spanning tree counters and also a high figure on the IPMapFordwardingTask was high at times indicating spanning tree recalculations happening.
On your core switch there were several connections that had moderate levels (thousands) of BPDU received packets and one (port channel 9) that had millions.
This was the aggregated link going to the admin building.
On the admin building switch there were 3 connections with high levels of received BPDU packets (C and E the long cable lengths) and an unknown on port 2/15 going to patch panel port E13 (with millions of BPDU packets) unfortunately we were unable to find the destination, the cable length was 67 meters so that is not the cause, I suspect we will find a faulty or wrongly configured switch at the end of it when it is discovered. This is the main source of the BPDU packets and we left it unplugged.

Regards Brad

Lessons Learned

  • I can’t over emphasize the importance of good network documentation, which has certainly made things easier of us to diagnose and manage.
  • Collaboration is an important part of good IT management. Find someone you trust and work together to resolve issues.
  • Resolve issues before they become significant.
  • Be proactive and future proof your environment by providing a robust network / infrastructure.

Summary:

Our efforts today have not resolved our network problems, however it has eliminated a few options, which will make it easier for us to address this in the future.

Advertisements

About Roland

Family, God, People, Architecture, Pursuit of Truth, Wisdom, Education, Community, Truth, Patience and Prosperity.
This entry was posted in Geeking and tagged , . Bookmark the permalink.

2 Responses to Collaborating to solve Latency Issues on our Network

  1. Have you thought perhaps of adding fiber links between your switches? We have a redundant fiber connection at the College as we moved from the Library to our new location, and took the opportunity to add a second fiber pathway around the College. We are not experiencing any latency issues.

  2. munyard says:

    I don’t pretend to be a Network Expert: I have about 80 IP Phones throughout the school so I have gone for a POE Switch Everywhere about 30 + Switches. I only have 2 types of switch now Dell 5448 and Dell 6249P, it makes it so much easier, and as soon as I can I will get rid of the Dell 5448’s and replace them with 6248P’s however they are all working extremely well except they don’t have POE.

    – They are ALL Dell 6248P Layer 3. – except the few mentioned above.
    – The permanent building are connected by Optical Fibre.
    – The temporary building are connected by by cat 5 cable.
    – Each is on a VLAN 10.10.1.x – 10.10.2.x et al. ….. up to about about 10.10.20.x, In 3 buildings I have stacked them 3 or 5 high with a 10 GB Back Plain.
    – I provide a private VLAN for the Wireless Network, and also the Printer Network, I did this because I don’t want to have to have different IP ranges for my printers, so all Printers are on 10.0.0.x
    – The IP phones can go in any switch / port except those reserved for Printers and Wireless

    Restrictions: No much really, I have loop back protection on everything, and that is about it.
    SCSI: I used to used iscsi on VLANs on the IP network but have chosen a dedicated switch of SCSI Traffic, between my VMWARE Boxes.

    I run a Dual Platform Network, supporting both Windows and Mac. – Thus far the Network has been 100% reliable and easy to manage.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s