Cat 6500

All posts tagged Cat 6500

New Year’s resolutions are made to be broken, and I haven’t been keeping up with my resolution to do more blog posts.  Now that I am back at Cisco, I am focusing on programmability and automation, and I do have a lot to say.  However, in honor of my return to Cisco, I thought I would post a new Tac Tales entry.  There is a moral to this story.

One day my boss came to me and said my team would be supporting the MWAM module in the Cat 6K.  I had done a lot of Cat 6K work at that point, but I had never even heard of an MWAM, and failed to see why cases on it would be sent to the routing protocols team.  My boss didn’t seem too concerned with my objections, and said, “Just go watch the VoD.”  VoD = Video on Demand.  So, I did.  I watched the VoD, and it started out by telling me how many processors were on the card and of what kind;  what types of buses were used to transmit data;  what kind of memory it had;  and how it interfaced with the Catalyst and its backplane.  Never did the video ever tell me what the card actually did.  I had no idea why one would buy an MWAM or what one would do with it.  I hoped a case wouldn’t come in on the card, and when it did I immediately escalated to engineering because I had no idea what to do.  Fortunately I only ever had one case on the MWAM.  (And the fun thing about coming back to Cisco after 10 years is that I can go look up all these cases I remember and read my notes.  Very cool!)

What is the moral, you ask?  Well, as a Technical Marketing Engineer, a big part of my role is communicating technical concepts clearly to others.  How often have you bought a book or looked at a web page to learn some new protocol, only to find that the description of it begins with packet header formats or state machines?  Fine, but tell me what it actually does before you tell me how it works.  Imagine if you went out into the jungle and encountered someone who has never seen a car.  You wouldn’t start explaining it to him by saying “it uses an internal combustion engine which has a four-stroke cycle of intake, compression, power, exhaust.”  You’d say, “it has wheels and takes me places very fast.”  Now in defense of the MWAM VoD guy, he may have designed his video for people who already knew what the card was.  But often I have found that people make this assumption, and when I backtrack and start at the beginning when explaining something, often people say, “you know, I’ve always been afraid to ask about that, but thanks for explaining it.”

Meanwhile, my second try at Cisco is much more fun than the last.  And thankfully, no MWAMs.  TAC was an great experience and period of growth, but it’s a not a fun job.

I don’t advertise this blog so I’m always amazed that people even find it. I figured the least-read articles on this blog were my “TAC Tales,” but someone recently commented that they wanted to see more… Well, I’m happy to oblige.

The recent events at United reminded me of a case where operations were down for one of the major airlines at Miami International Airport. It didn’t directly impact flight operations, but ticketing and baggage handling systems were down. Naturally, it was a P1 and so I dialed into the conference bridge.

This airline had four Cat 6500’s acting as their core devices for the network. The four switches had vastly disparate configurations, both hardware and software. I seem to recall one of them was running a Supe 1 module, which was even old in 2007 when I took the case. There was a different software version on each of them.

EIGRP was acting funny. As a TAC engineer in the routing protocols team, I absolutely hated EIGRP. EIGRP Stuck-In-Active was my nightmare case. It was always such a pain to track down the source, and meanwhile you’d have peers resetting all over the place. OSPF doesn’t do that, nor ISIS. I once got in a debate on an internal Cisco alias with some EIGRP guys. Granted, I had insulted their life’s work, but I stated that EIGRP was fast, but unreliable and prone to meltdown. Their retort was that properly designed EIGRP networks do not melt down. Great, but when are networks ever properly designed? They are so often slapped together haphazardly, grow organically, and overall need to be resilient when even when unplanned. Of course, those of us in design and architecture positions do our best to build highly available networks, but you don’t want to be running a protocol that flips out when a route at some far end of the network disappears. Anyhow…

The adjacencies on all four boxes were resetting constantly. It was totally unstable. Every five minutes or so, some manager from the airline would hop on the bridge to tell us that they were using handwritten tickets and baggage tags, that lines at the ticket counters were going out the door, etc, etc. Because that really helps me to concentrate. I tried to troubleshoot the way TAC engineers are trained to troubleshoot: collect logs, search for bugs in the relevant software, look for configuration issues. With routing adjacency flaps on switches, always check for STP issues. I couldn’t figure it out.

Finally some high-level engineer for the airline got on the phone and took over like a five-star general. He had his ops team systematically shut down and reset the switches, one at a time. The instability stopped. Wish I’d thought of that.

The standards for a routing protocol like OSPF are written by slow-moving committees, and hence don’t change much. These committees often have members from multiple competing vendors who disagree on exactly what should be done, and even when they do agree, nothing happens fast in IETF committees. Conversely, Cisco owns EIGRP, and they can change it as much as they want. Even their internal committees are nowhere near as bureaucratic as IETF. This means that there can be significant changes in the EIGRP code between IOS releases, much more so than for OSPF, and it is thus vital to keep code revisions amongst participating routers fairly close.

In this case, the consulting engineers for the airline helped them to standardize the hardware and software revisions. They never re-opened the case.