All posts tagged ospf

In the last article on technical interviewing, I told the story of how I got my first networking job.  The interview was chaotic and unorganized, and resulted in me getting the job and being quite successful.  In this post, I’d like to start with a very basic question:  Why is it that we interview job candidates in first place?

This may seem like an obvious question, but if you think about it face-to-face interviewing is not necessarily the best way to assess a candidate for a networking position.  To evaluate their technical credentials, why don’t we administer a test? Or, force network engineering candidates to configure a small network? (Some places do!)  What exactly is it that we hope to achieve by sitting down for an hour and talking to this person face-to-face?

Interviewing is fundamentally a subjective process.  Even when an interviewer attempts to bring objectivity to the interview by, say, asking right/wrong questions, interviews are just not structured as objective tests.  The interviewer feedback is usually derived from gut reactions and feelings as much as it is from any objective criteria.  The interviewer has a narrow window into the candidate’s personality and achievements, and frequently an interviewer will make an incorrect assessment in either direction:

  • By turning down a candidate who is qualified for the job.  When I worked at TAC, I remember declining a candidate who didn’t answer some questions about OSPF correctly.  Because he was a friend of a TAC engineer, he got a second chance and did better in his second interview.  He got hired and was quite successful.
  • By hiring a candidate who is unqualified for the job.  This happens all the time.  We pass people through interviews who end up being terrible at the job.  Sometimes we just assess their personality wrong and they end up being complete jerks.  Sometimes, they knew enough technical material to skate through the interview.

Having interviewed hundreds of people in my career, I think I’m a very good judge of people.  I was on the interview team for TAC, and everyone we hired was a successful engineer.  Every TME I’ve hired as a manager has been top notch.  That said, it’s tricky to assess someone in such a short amount of time. As the interviewee, you need to remember that you only have an hour or so to convince this person you are any good, and one misplaced comment could torpedo you unfairly.

I remember when I interviewed for the TME job here at Cisco.  I did really well, and had one final interview with the SVP at the time.  He was very personable, and I felt at ease with him.  He asked me for my proudest accomplishment in my career.  I mentioned how I had hated TAC when I started, but I managed to persevere and left TAC well respected and successful.  He looked at my quizzically.  I realized it was a stupid answer.  I was interviewing for a director-level position.  He wanted to hear some initiative and drive, not that I stuck it out at a crappy job.  I should have told him about how I started the Juniper on Juniper project, for example.  Luckily I got through but that one answer gave him an impression that took me down a bit.

When you are interviewing, you really need to think about the impression you create.  You need empathy.  You need to feel how your interviewer feels, or at least be self-aware enough to know the impression you are creating.  That’s because this is a subjective process.

I remember a couple of years back I was interviewing a candidate for an open position.I asked him why he was interested in the job. The candidate proceeded to give me a depressing account of how bad things were in his current job.”It’s miserable here,” he said.  “Nobody’s going anywhere in his job.  I don’t like the team they’re not motivated.”  And so forth.  He claimed he had programming capabilities and so I asked him what his favorite programming language.”I hate them all,” he said. I actually think that he was technically fairly competent but in my opinion working with this guy would’ve been such a downer that I didn’t hire him.

In my next article I’ll take a look at different things hiring managers and interviewers are looking for in a candidate, and how they assess them in an interview.


My first full-time networking job was at the San Francisco Chronicle.  Now there isn’t much to the Chronicle anymore, but in the early 2000’s the newspaper was still going strong.  It was the beginning of the decline, but most people still took their local newspaper as their primary source of news.  Being a network engineer at a major metropolitan newspaper was fascinating.  It is a massive operation to print and distribute a newspaper every single day, and you can never, ever, miss.  There is no slippage of production deadlines.  It has to be out every day, and every day you start all over, with a blank page.

As the lead network engineer, I touched everything from editorial (the news and photography content of the paper) to advertising, pre-press, production systems, and circulation.  Every one of these was critical.  If editorial content didn’t make it through, there was nothing to go into the paper.  If advertising didn’t make it in, we didn’t earn revenue.  If pre-press or production had problems, the paper wasn’t printed.  If circulation wasn’t working, nobody could get their paper.

The Chronicle owned and operated three printing plants in the Bay Area.  One was on Army Street in San Francisco, while the other two were in Union City and Richmond in the East Bay.  The main office was on Fifth and Mission in downtown SF, so the paper was prepared in San Francisco and then sent to the plants via microwave.  That’s where I came in.

Our microwave system used a dish on the clock tower of our building.  From 5th and Mission we sent a signal up to Roundtop Mountain in the East Bay hills. At Roundtop we leased space in a little concrete bunker that was used for various kinds of radio communication including cellular.  From Roundtop we bounced the signal back to the three printing plants.

Chronicle building with the microwave visible on the clock tower

The microwave presented itself to us as T1 lines.  I had the T1 lines connected to dual routers at the main site and each of the plants.  In addition to the microwave, we had two additional backup T1’s to each plant which were landlines from different carriers with diverse paths into the buildings.  We kept the microwave and the first T1 plugged into the routers, with the third one on manual standby in case we needed it.  You don’t take chances with production in a newspaper, and we had triple redundancy on everything.  I used OSPF for redundancy between the microwave and #1 backup circuit on the routers, and HSRP for gateway redundancy.  With only four sites it was a simple enough topology and it never gave me much trouble.

Until, that is, the day when I got a call from our operations center that the primary circuits were all down.  We were running on backups.  I immediately called up the production systems engineer who managed the microwave and told him his circuits were down.  “Impossible!” he said, “that microwave is five-nines reliable.  Check your router!”  I tried a few of the usual:  shut/no shut the interface, changing the line encoding, etc.  No go.  He wanted me to start swapping hardware, which was a big deal in a live newspaper environment, and seemed pointless.  If it was hardware, why would all of the circuits be down?

We bickered a bit before I moved to have the tertiary backup circuits swapped in so we had automatic failover while we worked on the microwave.  I got out our old T-berd tester to see if I could find any indication of the problem.  Then the systems engineer called:  “We need to meet at the clock tower, I’ve found the problem,” he said.  It’s always a relief to hear that when finger pointing is going around.

T-berd T1 Tester

I showed up at the entrance to the tower and followed the systems guy up a rusty ladder mounted to the wall.  Up in the tower there were bird droppings and as I climbed higher I fought the urge to look down.  I’ve never much liked heights and being out of shape and relying on my own strength to keep from falling several stories onto concrete was not promising.  Once I got to the top there was a large separation between the ladder and the floor, and I fought the urge to panic as I flung my leg way over to climb onto the concrete flooring.  From there we went outside and I saw the problem right away.

If you’ve ever been to a convention in San Francisco, chances are it took place in the Moscone Center.  In the early 2000’s, the city decided to expand Moscone by building a new Moscone Center West on 4th and Howard streets.  And from up on the clock tower it was plain as day:  they had built a cooling tower on the roof right in the path of our microwave beam.  I looked at the systems guy and said, “Well, I guess you could make popcorn in that cooling tower.  Anyways, there goes your five nines.”

We hastily called meetings together to decide what to do.  Sue the city?  Call the FCC?  Find another building to bounce the microwave off of?  Those were long term solutions but we had an immediate problem.  Two circuits might seem like enough, but they were telco circuits and not as reliable as the microwave was, at least when its path wasn’t blocked.

Getting the city to cut the cooling tower off Moscone West was a non-starter, especially when it was the newspaper asking, a newspaper that made its money being critical of city officials.  So, we decided to lease roof space from another building and add an additional repeater.  However, this was a long process.  We needed to negotiate with the landlord, replan the radio deployment, license it and obtain permits, add the new repeater, and re-point the old dish to the new building.  That last item was not as simple as it sounded, since this wasn’t a DirecTV dish.  It was welded to the tower, so we needed to hire ironworkers to cut it off and re-position it.

Meantime, we ordered T1’s from downtown SF up to Roundtop to bypass the segment that wasn’t working.  We’d go hard wire to Roundtop, the microwave the rest of the way.  This was not, by any means, an ideal solution, nor was it an overnight solution, but we could at least get some redundancy faster than it would take to add the repeater.  I’m glad we did because shortly after the microwave went down we started having terrible problems with the landlines and needed the triple redundancy.

If you drive by Fifth and Mission now, the microwave dish is gone from the clock tower.  The Chronicle, a shadow of its former self, no longer operates its own printing plants, and has a circulation far smaller than it did in 2004, when I left.  As I said in my last post, it’s great to have a sense of purpose when you work in IT.  It wasn’t about fixing a microwave but about getting that paper in the hands of our readers.  I’m thankful I got to be a part of that for a few years, even if it cost me some vertigo and sleepless nights.

I’ve mentioned in previous TAC Tales that I started on a TAC team dedicated to enterprise, which made sense given my background.  Shortly after I came to Cisco the enterprise team was broken up and its staff distributed among the routing protocols team and LAN switch team.  The RP team at that time consisted of service provider experts with little understanding of LAN switching issues, but deep understanding of technologies like BGP and MPLS.  This was back before the Ethernet-everywhere era, and SP experts had never really spent a lot of time with LAN switches.

This created a big problem with case routing.  Anyone who has worked more than 5 minutes in TAC knows that when you have a routing protocol problem, usually it’s not the protocol itself but some underlying layer 2 issue.  This is particularly the case when adjacencies are resetting.  The call center would see “OSPF adjacencies resetting” and immediately send the case to the protocols team, when in fact the issue was with STP or perhaps a faulty link.  With all enterprise RP issues suddenly coming into the same queue as SP cases, our SP-centric staff were constantly getting into stuff they didn’t understand.

One such case came in to us, priority 1, from a service provider that ran “cell sites”, which are concrete bunkers with radio equipment for cellular transmissions.  “Now wait,” you’re saying, “I thought you just said enterprise RP cases were a problem, but this was a service provider!”  Well, it was a service provider but they ran LAN switches at the cell site, so naturally when OSPF started going haywire it came in to the RP team despite obviously being a switching problem!

A quick look at the logs confirmed this:

Jun 13 01:52:36 LSW38-0 3858130: Jun 13 01:52:32.347 CDT:
%C4K_EBM-4-HOSTFLAPPING: Host 00:AB:DA:EE:0A:FF in vlan 74 is flapping
between port Fa2/37 and port Po1

Here we could see a host MAC address moving between a front-panel port on the switch and a core-facing port channel.  Something’s not right there.  There were tons of messages like these in the logs.

Digging a little further I determined that Spanning Tree was disabled.  Ugh.

Spanning Tree Protocol (STP) is not  popular, and it’s definitely flawed.  With all due respect to the (truly) great Radia Perlman, the inventor of STP, choosing the lowest bridge identifier (usually the MAC address of the switch) as the root, when priorities are set to the default, is a bad idea.  It means that if customers deploy STP with default values, the oldest switch in the network becomes root.  Bad idea, as I said.  However, STP also gets a bad reputation undeservedly.  I cannot tell you how many times there was a layer 2 loop in a customer network, where STP was disabled, and the customer referred to it as a “Spanning Tree loop”.  STP stops layer 2 loops, it does not create them.  And a layer 2 loop out of control is much worse than a 50 second spanning tree outage, which is what you got with the original protocol spec.  When there is no loop in the network, STP doesn’t do anything at all except for send out BPDUs.

As I suspected, the customer had disabled spanning tree due to concerns about the speed of failover.  They had also managed to patch a layer 2 loop into their network during a minor change, causing an unchecked loop to circulate frames out of control, bringing down their entire cell site.

I explained to them the value of STP, and why any outage caused by it would be better than the out of control loop they had.  I was told to mind my own business.  They didn’t want to enable spanning tree because it was slow.  Yes, I said, but only when there is a loop!  And in that case, a short outage is better than a meltdown.  Then I realized the customer and I were in a loop, which I could break by closing the case.

Newer technologies (such as SD-Access) obviate the need for STP, but if you’re doing classic Layer 2, please, use it.

Two years ago I published my Ten Years a CCIE series.  Actually, I had written the series a couple years before I published it, but as I say in my introduction to the series, I felt it was a bit self-indulgent an uninteresting, so I scrapped it for a while.  The original pieces were dictated, and I’ve been meaning to go back and clean up some of the grammatical errors or grating phrases, but haven’t had the time.  Not a lot of people have read it, nor did I expect many to read it, since I generally don’t advertise the blog in social media, or anywhere really.  But the feedback from the few who have read it has been positive, and I’m gratified for that.

Things have changed a lot since I got into networking in 1995, and since I passed my CCIE in 2004.  But it’s also amazing how much has stayed the same.  TCP/IP, and in fact IPv4, is still the heart of the network.  Knowledge of OSPF and BGP is still key.  For the most part, new controllers and programmable interfaces represent a different way of managing fundamentally the same thing.

The obvious reasons for this are that networks work and are hard to change.  The old protocols have been sufficient for passing data from point A to point B for a long time.  They’re not perfect but they are more than adequate.  They are hard to change because networks are heterogeneous.  There are so many types of different systems connecting to them, that if we wanted to fundamentally alter the building blocks of networks, we’d have to upgrade a lot of systems.  This is why IPv6 adoption is so slow.

Occasionally I poke around at TechExams.net to see what newer network engineers are thinking, and where they are struggling.  I’m probably the only director-level employee of Cisco who reads or comments on that message board.  I started reading it back when I was still at Juniper and studying for my JNCIE, but I’ve continued to read it because I like the insights I get from folks prepping for their certifications.  People are occasionally concerned that the new world of controllers and automation will make their jobs obsolete.

I built the first part of my career on CLI.  Now I’m building it on controllers and programmability.  In this industry, we have to adapt, but we don’t have to die.  Cars have changed drastically, with on-board computer systems and so forth, but we still need mechanics.  We still need good network engineers.

To be honest, I was getting tired of my career by the time I left Juniper and came to Cisco.  I was bored.  I thought of going back to school and getting a Ph.D. in classical languages, my other passion.  Getting married helped put an end to that idea (Ph.D.’s in ancient Greek make a lot less than network engineers) but when I came back to Cisco, I felt revitalized.  I started learning new things.  Networking was becoming fun again.

I wrote the “Ten Years a CCIE” series both for people who had passed the exam and wanted to have some fun remembering the experience, as well as for people struggling to pass it.  Some things change, as I said, but a lot remains the same.  I still think, closing in on 15 years since I took the exam, that it’s still worth it.  I still think it’s a fantastic way to launch a career.  The exam curriculum will adapt, as it always does, with new technologies, but it’s an amazing learning experience if you do it honestly, and you will be needed when you make it through.

In this post in the Ten Years a CCIE series, I go over my preparations for the CCIE Routing and Switching exam, and what I did to pass in one attempt.

The first months…

I passed my CCIE Routing and Switching Lab in one attempt, so I think my approach can be considered effective. At least, it was for the exam at the time. I decided to spend my first several months of study diving deep into each of the exam topics on the blueprint. I was determined to focus on core technologies such as BGP and OSPF and to minimize the amount of time spent on ancillary topics such as DLSw. Because you have access to the documentation CD in the lab, you don’t need to know absolutely everything. However, you do not want to spend a long time trying to figure out how to configure core tasks which you should be able to do automatically.

I didn’t work from a particular manual or outline these first few months. Instead I would pick a topic, say BGP. I would go through all of the examples I could find in the books that I had, Jeff Doyle’s books being the most helpful. I would set up the examples from the books in my lab to see if they work as described. Then I performed free-form experimentation. I tried different things; I indulged my curiosity; I came up with new ways to test the protocols and tried to break them. I introduced loops where there weren’t loops in the examples I had. I saw what happened if I ran the protocol over ISDN instead of Frame Relay. And I made very sure that everything I learned I recorded in my notes. For every subject I kept two note files. The first file contained general, conceptual notes. The second file was a list of commands that I thought were important and I needed to remember. These files grew over time, and I studied them thoroughly before attempting the lab.

I had also acquired practice labs from three different sources. I had IP Expert’s lab book; I also had Internetwork Experts’ lab book; and finally, I had the Cisco press official lab book, which was written by a CCIE proctor. I found that this last book’s labs most closely resembled the real thing in terms of how the labs were written and how the diagrams were drawn. Still, as I studied I quickly came to favor the Internetwork Expert book for its thoroughness and accuracy. At that time, they were still relatively new, but the quality of their material was the best.

Closing in on test day…

In the last couple of months before the exam, I shifted my strategy. Instead of focusing on individual topics I spent my time working the practice labs in the IE book. At first I worked them slowly and methodically. I didn’t do them on a timer, and I didn’t rush through them. If it took me 24 hours to work through lab then it took me 24 hours. My main interest was in covering the material, understanding it thoroughly, and in documenting my learnings. I knew so many people who started giving themselves timed exams when they weren’t ready for them. Yes, it’s important to have a strategy and to understand clock management, but it’s far more important to understand the material thoroughly. The best time management strategy is knowing the material so well you can configure most of it on auto-pilot.

Every time I completed the lab I graded myself using IE’s answer key. I used to say that I was my own worst enemy. I never gave myself a pass on the slightest discrepancy between my solution and IE solution. Every single mistake that I wrote I listed out in a document, and in the last few weeks before the exam I reread that document several times every day. Constantly reviewing the mistakes I had made reinforced my own errors in my mind.  I also found that in my note documents that I was highlighting certain important points or gotchas with the capital words “BE SURE”. I created another document that I called my “BE SURE” list. I also reviewed this list several times a day in the last few weeks before the exam. Reviewing both my mistakes as well as my “BE SURE” list so frequently was quite effective in helping me remember my mistakes and important notes.

A snippet of my BE SURE list

A snippet of my BE SURE list

When I was studying for my CCIE exam Cisco press had just released two handy books. These books covered all of the commands in IOS at that time for BGP and OSPF. Not only did they describe the commands but they had examples of their use as well. In the last few days before the exam I would review the table of contents of these books which listed all the commands by name. I did this every night in bed. If I was able to accurately describe the command, I would cross it off.  Some commands that I couldn’t remember I saw night after night, until they were so familiar I had no problem using them.  Doing this every night helped me to commit fully to memory all of the different BGP and OSPF commands that make up the core of the CCIE lab exam.

I also took the CCIE Lab Boot Camp from Internetwork Expert just a few weeks before I took the actual lab exam. This was a wonderful experience. I was able to take the course from home, using IE’s Java-based virtual environment. Because most of the work and the class consisted of full, eight-hour timed labs, there was no need to travel to a classroom. And, because the eight hour exams were administered on Internetwork Expert’s own racks of equipment, there was no problem with not having a full CCIE lab at home. We had a small amount of lecture each day, followed by the eight hour lab, which was then graded each night. In the morning we were given our results. I was told that people scoring over 80% generally passed the CCIE lab exam, and I was scoring higher with no problem. The Brians gave me some great advice and particularly fixed some problems that I had in configuring multicast.

At the end of the boot camp Brian Dennis, the grumpier of the Brians, gave what I would charitably call a pep talk. He told us that a test is just a test, that we should get some of the classic books on networking and study them thoroughly, and that we should know our subject, not simply pass the test.  “You meet some CCIEs and wonder, how did this guy pass the test?” Brian said.

In November 2004 the time came to take the test. I had no idea if I was ready. A good friend of mine who passed shortly before spent four hours with me in a sushi restaurant grilling me on every possible subject that could be on the exam. They closed the restaurant on us.  For my final preparation, I studied all of the new features in IOS which they were now using in the CCIE lab. I also studied the documentation CD thoroughly so that I would have no trouble navigating it in the lab.

Passing the test

If you’re working on the CCIE exam, why should you care what someone did to prepare for it ten years ago?  Well, as I’ve said, it is a different test now.  My advice on learning ISDN dial maps isn’t going to help you.  However, there are some general principles here that you should pay attention to.

  1. Figure out the core topics and learn them well.  Cold.  On every expert exam, there are some core topics and some ancillary topics.  You cannot know everything.  Figure out the core topics and drill them over, and over, and over again.  You need to be able to configure them without thinking.
  2. Make things harder than they have to be.  As I said, break things intentionally.  Introduce problems.  Ask questions.  Don’t just run the scenarios you bought with your labs.
  3. Be your own worst enemy.  Remember, the CCIE exam is not just about doing what they tell you, but doing exactly what they tell you.  When you grade yourself, read and re-read the tasks.  Make absolutely sure that you have accurately and completely fulfilled the requirements.
  4. Document your mistakes.  Review things you have done wrong, and keep reviewing them.

In the next post in the series,  Room of Horrors, I describe the CCIE lab experience.  I talk about what it was like to enter the infamous lab in Cisco Building C, and take the challenging exam.