Skip navigation

I don’t advertise this blog so I’m always amazed that people even find it. I figured the least-read articles on this blog were my “TAC Tales,” but someone recently commented that they wanted to see more… Well, I’m happy to oblige.

The recent events at United reminded me of a case where operations were down for one of the major airlines at Miami International Airport. It didn’t directly impact flight operations, but ticketing and baggage handling systems were down. Naturally, it was a P1 and so I dialed into the conference bridge.

This airline had four Cat 6500’s acting as their core devices for the network. The four switches had vastly disparate configurations, both hardware and software. I seem to recall one of them was running a Supe 1 module, which was even old in 2007 when I took the case. There was a different software version on each of them.

EIGRP was acting funny. As a TAC engineer in the routing protocols team, I absolutely hated EIGRP. EIGRP Stuck-In-Active was my nightmare case. It was always such a pain to track down the source, and meanwhile you’d have peers resetting all over the place. OSPF doesn’t do that, nor ISIS. I once got in a debate on an internal Cisco alias with some EIGRP guys. Granted, I had insulted their life’s work, but I stated that EIGRP was fast, but unreliable and prone to meltdown. Their retort was that properly designed EIGRP networks do not melt down. Great, but when are networks ever properly designed? They are so often slapped together haphazardly, grow organically, and overall need to be resilient when even when unplanned. Of course, those of us in design and architecture positions do our best to build highly available networks, but you don’t want to be running a protocol that flips out when a route at some far end of the network disappears. Anyhow…

The adjacencies on all four boxes were resetting constantly. It was totally unstable. Every five minutes or so, some manager from the airline would hop on the bridge to tell us that they were using handwritten tickets and baggage tags, that lines at the ticket counters were going out the door, etc, etc. Because that really helps me to concentrate. I tried to troubleshoot the way TAC engineers are trained to troubleshoot: collect logs, search for bugs in the relevant software, look for configuration issues. With routing adjacency flaps on switches, always check for STP issues. I couldn’t figure it out.

Finally some high-level engineer for the airline got on the phone and took over like a five-star general. He had his ops team systematically shut down and reset the switches, one at a time. The instability stopped. Wish I’d thought of that.

The standards for a routing protocol like OSPF are written by slow-moving committees, and hence don’t change much. These committees often have members from multiple competing vendors who disagree on exactly what should be done, and even when they do agree, nothing happens fast in IETF committees. Conversely, Cisco owns EIGRP, and they can change it as much as they want. Even their internal committees are nowhere near as bureaucratic as IETF. This means that there can be significant changes in the EIGRP code between IOS releases, much more so than for OSPF, and it is thus vital to keep code revisions amongst participating routers fairly close.

In this case, the consulting engineers for the airline helped them to standardize the hardware and software revisions. They never re-opened the case.

3 Comments

  1. Hello ccie14023,

    I have read several articles from you and they are quite interesting. I have one question which I did not find any related information sw. so I thought about you.

    I am not sure whether you come here often, I just hope you will see it and get me a hint.

    The question is about PE-CE ospf backdoor in MPLS L3VPN topic. I read from Juniper study guides, it is written,

    to ” ensure that the PE router’s OSPF-to-IBGP export policy matches OSPF routes originated from the local site and rejects any OSPF routes heard through the legacy backbone”

    my question is, how to achieve this ? should I use prefix-list to match the routes generated from local site, or there is other elegant way ? for example advertising router id ?

    The reason why I ask is, in the following topology
    PE1——–PE2
    | |
    CE1——–CE2

    PE1 and CE1 belongs to site1, PE2 and CE2 belong to site2. PE1 and PE2 should generate type 5 lsa, and mpls backbone should be backup of Backdoor between CE1 and CE2.

    I have configured different domain-id, and ospf preference 180. but anyway, it happens very often that I see one of the PE have all ospf routes, and the other has bgp routes. I think the reason is I did not control which ospf routes should be advertised by vrf-export policy as written in Juniper study guide.

    but anyway, nothing is written in this study guide how to control them. I suppose we can only do prefix list to match those routes which are generate from local sites. nothing else we can do, is that correct ?

    Thank you very much in advance!
    BR.
    Sara

    • sorry, the topology is shown wrong, it is like this:

      PE1——–PE2
      | |
      CE1——–CE2

  2. anyway, it can not be shown in the correct way. the links are between PE1 and CE1, PE2 and CE2, and between CE1 and CE2 there is also backdoor. but every time after I posted, the 2 links will be beside each other.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.