The Next Big Thing, TLDR version

At the last Cisco Live in June, I was asked by marketing to do a “center stage” presentation.  My days of getting normal sessions at Cisco Live seem to be over.  Perhaps I’m too far into the management track (although that’s changing) to impress the Cisco Live Session Group Managers.  Eager to speak again, I accepted the proposal.

The abstract was provided for me.  I don’t remember the title, but it was something about AI and the campus.  So, I did my best to craft a set of slides that would be interesting.  When I ran them by marketing, I was told I couldn’t use my own slides.  I had to use theirs.  One of my secrets to success at Cisco Live is that I always build my own slides.  Rarely do I use a single slide from someone else.

Still, I did my best to build a story that would work.  Then I was told I’d be co-presenting with another PM, and we’d also have a customer on stage with us for an Oprah-style panel interview.  Even with these constraints, I spent a lot of time in my hotel room in Vegas practicing to be sure I nailed it.

The center stage is on the show floor (World of Solutions), and presenters there are broadcast onto a series of TVs scattered around the Mandalay Bay convention center.  They walk around the stage like they’re performing King Lear, but nobody watches the TVs or can even hear them.  It’s very performative, but a part of trade shows.

We had a rehearsal with marketing people, stage managers, cameramen, audio technicians, and and army of other people.  On the day of, there were marketing people, stage managers, cameramen, audio technicians, and and army of other people.  There was also a lady there who did intros for all the speakers, to get the audience pumped up.  I’m sure she showed up in Las Vegas decades ago to be a showgirl or something, but now in her 40’s she was doing corporate gigs at Mandalay.  As I got mic’d up and ready to go, I looked out at my audience.  Of the 50 or so chairs, 5 were occupied.  Four of them were friends of the customer presenting.  I looked at the intro lady and said: “I hope you can handle the stage fright.”  She laughed.

I did my shtick, and it all went well enough.  At the very end, the one attendee who was not with the customer, who seemed to have shown up because it was a good spot for a nap, arose like Lazarus, raised his hand and asked:  “Could you guys please stop talking about AI at Cisco Live?”


If you’ve watched the Art of Network Engineering podcast, you probably are familiar with Lexie Cooper, one of the hosts.  I was on the podcast a while back and had a nice talk with her and Andy Lepteff.  The other day, Lexie showed up in my LinkedIn feed in clownish makeup and a bodysuit.  With the audio off, I looked at it and thought, “wow she’s really desperate for attention.”  Then I unmuted it.  I nearly died.

“Have you ever considered…using AI?”  she begins.

“Manage your network devices…with AI!”

“Manage your IOT stuff…with AI!”

“Design a PCB…with AI!”

“Automate your vegetable garden…with AI!”

“Ethernet cables?  Nope…AI!”

“Every vendor in the World of Solutions…with AI!”

…and so forth.

In a minute and thirteen seconds, Lexie captured the Zeitgeist of the current networking world perfectly and hilariously.  It seems that all of the protocols and technologies that make up the “Art of Network Engineering” have been single-handedly wiped away by AI.  Nobody talks about networking anymore, it’s all just AI.


Of course, those protocols and technologies are necessary for AI, for the Internet, and for the modern world to function.  Why do all vendors suddenly have a single-minded focus on AI, and seem to have stopped investing in actual networking technology?

It comes down to the culture of Silicon Valley, the corporate world dominated by Wall Street, and the quest for the “Next Big Thing”.  As network engineers we love acronyms, so I’ll coin a new one:  the NBT.  (With all due respect to NetBIOS over TCP.)

Technology executives are terrified of missing the NBT, and they spend their careers chasing the NBT.  It’s not entirely their fault.  If a technology company is not investing in the NBT, then the industry “analysts” will write somber reports criticizing the company and hurting the stock value.  Because the industry “analysts” have MBAs in topics like marketing and finance, they are experts at technology, and “analyzing” what networking companies should sell to network engineers.  In fact, because they are MBAs, they are experts in anything, really, and far more so than people who actually study and learn their specific fields.

There have indeed been some real NBTs.  Wireless is a good example.  When I started in networking, pretty much everything was hard wired.  Wireless was a major transformation in networking, and a new and different technology domain.  (I’m still not great at understanding it, admittedly.)  Mobile devices and smartphones radically changed the world, and nobody can argue that.

Cloud computing is an interesting one.  First of all, it was (and is) a marketing term.  It refers to several things, but in a broad definition we could say it refers to using someone else’s computing resources instead of your own.  In the case of SaaS, someone else is hosting the application and giving you access to it, whereas in the case of IaaS, they merely host the computing power and you manage the app.  Either way, it was not a new idea.  The idea of shared computing resources has been around since the advent of computing.  In the early days, all computing was done on shared systems.  At the dawn of the Internet, I got my email and other services through an ISP.  I telnetted into their system to check my email.  And in the mid-90’s, I worked at a company that offered a SaaS-based timecard service, before anyone even used the term “SaaS”.

Cloud computing in 1999

Still, we could say Cloud was an NBT.  I used to go to auctions during the dot-bomb of the early 2000’s, and even a small dotcom company had to purchase servers and network gear and host them in rented rack space in a colo.  AWS drastically changed that.

Of course, there have been many potential NBTs that turned out not to be.  The “Metaverse” was one of these.  After 2 years in COVID lockdown, nobody was interested in slapping on a VR headset and meeting their friends using a unicorn avatar floating around a fake version of Mars.


Watch out when an exec begins a presentation with this apocryphal Henry Ford quote:  “If I had asked people what they wanted, they’d have said faster horses.”

Aside from the fact Ford never said it, this quote is recited ad nauseam to inspire people to disruptive innovation.  Nobody ever seems to notice the obvious, however.  The automobile was popularized by Henry Ford over 110 years ago.  It hasn’t changed much since.  Sure, your Subaru is a lot different from a Model T, but the basic idea and design are the same.  The changes to automobiles–fuel injection systems, automatic transmissions–have been major, but nonetheless incremental improvements on the base design.  Once the NBT happened and spawned an industry, things reached a steady state.

From a corporate/investor perspective, this is problematic.  Stock prices are an indicator of future value, and investors demand “growth”.  (Hypothetical question:  is there an end-state to “growth?”  I.e., is a company ever done growing, and if so, when?  Related:  is there anything in nature which can grow indefinitely?)  Steady-state is not good for Wall Street.  So, execs need to go hunting for the NBT.

“Now wait,” many MBAs will correct me.  “The EV is a major disruptor in the automotive industry.”

Leaving aside the fact that EVs have existed in the past, and their questionable future, it just proves my point.  It took 100 years for the Tesla to exist.  But let’s circle back to that in a minute.


Recently I saw a LinkedIn post from a woman, Debbie Gomez, who is making a career change to become a network engineer.  She was joking about the contents of a woman’s purse, comparing it to the books she has in her car.  One of those books was Internet Routing Architectures by Sam Halabi.

When I was studying for my CCIE R/S in 2004, I used Halabi’s book.  It’s clearly visible in a picture of the stack of books I used to study for the infamous exam.  Debbie is studying the same content I was 20 years ago.

This is because, like the automobile, once networking was invented, change became incremental.  BGP hasn’t changed much because it can’t change much.  It’s run across multiple providers and multiple vendors, and it’s not easy to make changes.  Sure, it’s been extended since Halabi’s day, but it’s close enough to the original that his book is still totally relevant.

I’ve written in the past about how non-technical executives view the complexity of networking as a creation of engineers who “revel in complexity”.  In their view, the NBT in networking is to just have “simplicity”, where you don’t need all the fancy BGP, OSPF, EIGRP, ISIS, EVPN, VXLAN, STP stuff.  Just like the Tesla is so much simpler than a traditional car.


I recently started working on cars, because I always like to do things with my hands.  My 2011 BMW 328i is probably the wrong car to start working on.  It’s complex, and designed so that simple tasks require disassembling large parts of the engine.  I recently replaced the valve cover, successfully, but man was it a nightmare of carefully removing various parts.  To even get the thing out took about 30 minutes of me standing on the engine and my brother-in-law working it from the side.  If I learned one thing, it’s how complex a modern car is.

I have a Tesla as well.  There’s no question it’s simple.  There’s hardly an engine to speak of.  There is no gear shifting when you drive it.  You don’t even turn it on.  There is no maintenance required except for tires and brakes. The only fluid required is for the windshield washer.

Many technology executives feel this transformation needs to happen for networking as well.  The problem–they don’t seem to realize–is that the underlying complexity of networking, the protocols, cannot go away.  They exist for a reason.  Can they be improved?  Sure.  Can they be eliminated?  No.

That’s not to say much of the mess of networking cannot be improved.  Vendors have created a lot of that mess, and all are guilty to some degree.  We can distinguish unnecessary complexity from necessary complexity.  A lot of it is unnecessary, but even if you remove that, you’re left with the necessary complexity.

The only option for simplicity when you cannot really simplify, is to abstract.  That is, you hide the complexity.  It’s still there, but it’s easier to deal with.  Take a modern airplane.  It’s just as complex a machine, perhaps more so, than a plane built in the 1970s.  But the cockpit is throughly automated, and the systems throughly instrumented.  It’s much easier to manage than a 1970’s plane.  And yet, someone still needs to know how it all works.


This brings us back to our starting point, AI.

Why is AI driving Lexie to the point of putting own garish makeup and screaming into the camera?  Of course, everyone thinks it’s the NBT.  But is it?

We can easily understate the importance of GenAI and the significance of the technological advancement.  It’s nothing short of astounding.  ChatGPT makes a great search engine, but apart from that, it’s ability to interpret and generate language and code in creative ways is incredible.

Even though I worked on programmability, my knowledge of Python is pretty poor.  If there’s one programming language I feel absolutely comfortable in, it’s Applesoft BASIC from the 1980s.  I’ve found I can have ChatGPT explain some of the more challenging Python concepts by translating them to BASIC.  It’s crazy.  Computers haven’t been able to do anything like that before.

I’ve asked it to generate NETCONF code blocks for configuring IOS XE, with less success.  It gave me an operational data model to configure an IP address on an interface.  These errors can and will be corrected, however.

And yet, even if AI reaches the point of being able to configure and operate network devices, it will still be an abstraction layer.  I cannot fathom AI somehow doing away with networking.  At most, it would be like the automation systems on the plane, not like a Tesla.

I asked ChatGPT to design a networking system that does not use protocols.  It responded:  “Designing a data networking system that does not use protocols is a challenging idea because protocols are fundamental to networking—they define the rules for data exchange.”  It then dutifully attempted to frame out a protocol-free system, but the result was unimpressive, and the AI admitted that it would have a lot of problems.


I am among those working on AI projects at Cisco, both out of interest and out of necessity.  Working at a vendor, I’m caught up in the NBT just like we all are.  While I cannot talk about the specifics of any of the projects, I do see potential for its use beyond the current applications of AI.  (Mainly analyzing operational data.)

Is it really the NBT?  Is it really a “disruptor” on the level of wireless or smartphones?  Or are we tilting at windmills as with the Metaverse?

Time will tell.  But I’m sure Lexie will have plenty of content for more videos.

Meanwhile, keep reading Halabi.  We still need him.

Two Years of Ten Years a CCIE

Two years ago I published my Ten Years a CCIE series.  Actually, I had written the series a couple years before I published it, but as I say in my introduction to the series, I felt it was a bit self-indulgent an uninteresting, so I scrapped it for a while.  The original pieces were dictated, and I’ve been meaning to go back and clean up some of the grammatical errors or grating phrases, but haven’t had the time.  Not a lot of people have read it, nor did I expect many to read it, since I generally don’t advertise the blog in social media, or anywhere really.  But the feedback from the few who have read it has been positive, and I’m gratified for that.

Things have changed a lot since I got into networking in 1995, and since I passed my CCIE in 2004.  But it’s also amazing how much has stayed the same.  TCP/IP, and in fact IPv4, is still the heart of the network.  Knowledge of OSPF and BGP is still key.  For the most part, new controllers and programmable interfaces represent a different way of managing fundamentally the same thing.

The obvious reasons for this are that networks work and are hard to change.  The old protocols have been sufficient for passing data from point A to point B for a long time.  They’re not perfect but they are more than adequate.  They are hard to change because networks are heterogeneous.  There are so many types of different systems connecting to them, that if we wanted to fundamentally alter the building blocks of networks, we’d have to upgrade a lot of systems.  This is why IPv6 adoption is so slow.

Occasionally I poke around at TechExams.net to see what newer network engineers are thinking, and where they are struggling.  I’m probably the only director-level employee of Cisco who reads or comments on that message board.  I started reading it back when I was still at Juniper and studying for my JNCIE, but I’ve continued to read it because I like the insights I get from folks prepping for their certifications.  People are occasionally concerned that the new world of controllers and automation will make their jobs obsolete.

I built the first part of my career on CLI.  Now I’m building it on controllers and programmability.  In this industry, we have to adapt, but we don’t have to die.  Cars have changed drastically, with on-board computer systems and so forth, but we still need mechanics.  We still need good network engineers.

To be honest, I was getting tired of my career by the time I left Juniper and came to Cisco.  I was bored.  I thought of going back to school and getting a Ph.D. in classical languages, my other passion.  Getting married helped put an end to that idea (Ph.D.’s in ancient Greek make a lot less than network engineers) but when I came back to Cisco, I felt revitalized.  I started learning new things.  Networking was becoming fun again.

I wrote the “Ten Years a CCIE” series both for people who had passed the exam and wanted to have some fun remembering the experience, as well as for people struggling to pass it.  Some things change, as I said, but a lot remains the same.  I still think, closing in on 15 years since I took the exam, that it’s still worth it.  I still think it’s a fantastic way to launch a career.  The exam curriculum will adapt, as it always does, with new technologies, but it’s an amazing learning experience if you do it honestly, and you will be needed when you make it through.

TAC Tales #12: SACK of trouble

When I first started at Cisco TAC, I was assigned to a team that handled only enterprise customers.  One of the first things my boss said to me when I started there was “At Cisco, if you don’t like your boss or your cubicle, wait three months.”  Three months later, they broke the team up and I had a new boss and a new cubicle.  My new team handled routing protocols for both enterprise and service provider customers, and I had a steep learning curve having just barely settled down in the first job.

A P1 case came into my queue for a huge cable provider.  Often P1’s are easy, requiring just an RMA, but this one was a mess.  It was a coast-to-coast BGP meltdown for one of the largest service provider networks in the country.  Ugh.  I was on the queue at the wrong time and took the wrong case.

The cable company was seeing BGP adjacencies reset across their entire network.  The errors looked like this:

Jun 16 13:48:00.313 EST: %BGP-5-ADJCHANGE: neighbor 172.17.249.17 Down BGP
Notification sent

Jun 16 13:48:00.313 EST: %BGP-3-NOTIFICATION: sent to neighbor 172.17.249.17
3/1 (update malformed) 8 bytes 41A41FFF FFFFFFFF

The cause seemed to be malformed BGP packets, but why?  The GSR routers they had were kind enough to give us a hex dump of the BGP packet when an adjacency reset.  I got out my trusty Doyle book and began decoding the packets on paper, when a colleague was kind enough to point me to an internal Cisco tool that would decode a BGP packet from hex.

We could see that, for some reason, the NLRI portion of the BGP message was getting cut off.  According to my calculations, it should have been 44 bytes, but we were only seeing 32 bytes of information.  NLRI is Network Layer Reachability Information, just a fancy BGP way of saying the paths that go into the routing update.  We also noticed a clue in the router logs:  TCP-6-TOOBIG messages showing up from time to time.

Going over it with engineering, we realized something interesting.  The customer had enabled TCP selective acknowledgement on all their routers.  Also known as SACK, TCP selective acknowledgement is designed to circumvent an inefficiency in TCP.  If, say, 1 of 3 TCP segments gets dropped, the TCP protocol requires re-transmission of all 3 of the segments.  In other words, the receiver keeps ACKing the last segment it received, but it takes time for the sender to realize something is wrong.  When the sender finally realizes something is wrong, it goes back to the last known good segment and re-transmits everything after it.  SACK allows TCP to acknowledge and re-transmit specific segments.  If we are only missing segments 2, 3, and 5, then we can ask for just those to be re-transmitted.  SACK is stored as an option in the TCP header.

The problem is, there is a finite amount of space in the TCP header, and the SACK field can get rather long.  It just so happens that BGP also stores its MD5 authentication hash in the TCP header.  If SACK gets too long, it can crowd the MD5 header and cause BGP errors.  Based on our analysis, this was exactly what had happened.  Thus, the malformed packets.  We had the customer remove the SACK option from all routers and the problem stopped.

We were left with a couple questions.  Why did SACK get so long, and why would it be allowed to overwrite other important values in the TCP header?  In answer to the first question, there was a bug which was causing some linecards to send out malformed packets on occasion, thus causing SACKs.  In answer to the second question, there was a bug in the TCP header options packing that allowed one field (SACK) to crowd out another field (MD5 authentication).  I knew the case wouldn’t close for a long time.  Multiple bugs needed to be filed, and new code qualified and installed.  Fortunately the customer had a workaround (disable SACK) and an HTE.  An HTE was a TAC engineer dedicated to their account.  He grabbed the case from me for babysitting and I moved onto my next case.

In my TAC tales I often make fun of the occasional mistakes of TAC engineers.  However, TAC is a tough job, and the organization is staffed by some top engineers.  Many cases, like this one, required hard core engineering and knowledge that spans protocol details and ASIC-level hardware debugging.  It’s not a job for the faint of heart.  This case required digging into the TCP header, understanding how options are packed, and figuring out how to stop a major meltdown of a service provider network.  A high-stress situation, to be sure, but these cases often were the most rewarding.

 

How to pass the CCIE lab exam in one attempt

In this post in the Ten Years a CCIE series, I go over my preparations for the CCIE Routing and Switching exam, and what I did to pass in one attempt.

The first months…

I passed my CCIE Routing and Switching Lab in one attempt, so I think my approach can be considered effective. At least, it was for the exam at the time. I decided to spend my first several months of study diving deep into each of the exam topics on the blueprint. I was determined to focus on core technologies such as BGP and OSPF and to minimize the amount of time spent on ancillary topics such as DLSw. Because you have access to the documentation CD in the lab, you don’t need to know absolutely everything. However, you do not want to spend a long time trying to figure out how to configure core tasks which you should be able to do automatically.

I didn’t work from a particular manual or outline these first few months. Instead I would pick a topic, say BGP. I would go through all of the examples I could find in the books that I had, Jeff Doyle’s books being the most helpful. I would set up the examples from the books in my lab to see if they work as described. Then I performed free-form experimentation. I tried different things; I indulged my curiosity; I came up with new ways to test the protocols and tried to break them. I introduced loops where there weren’t loops in the examples I had. I saw what happened if I ran the protocol over ISDN instead of Frame Relay. And I made very sure that everything I learned I recorded in my notes. For every subject I kept two note files. The first file contained general, conceptual notes. The second file was a list of commands that I thought were important and I needed to remember. These files grew over time, and I studied them thoroughly before attempting the lab.

I had also acquired practice labs from three different sources. I had IP Expert’s lab book; I also had Internetwork Experts’ lab book; and finally, I had the Cisco press official lab book, which was written by a CCIE proctor. I found that this last book’s labs most closely resembled the real thing in terms of how the labs were written and how the diagrams were drawn. Still, as I studied I quickly came to favor the Internetwork Expert book for its thoroughness and accuracy. At that time, they were still relatively new, but the quality of their material was the best.

Closing in on test day…

In the last couple of months before the exam, I shifted my strategy. Instead of focusing on individual topics I spent my time working the practice labs in the IE book. At first I worked them slowly and methodically. I didn’t do them on a timer, and I didn’t rush through them. If it took me 24 hours to work through lab then it took me 24 hours. My main interest was in covering the material, understanding it thoroughly, and in documenting my learnings. I knew so many people who started giving themselves timed exams when they weren’t ready for them. Yes, it’s important to have a strategy and to understand clock management, but it’s far more important to understand the material thoroughly. The best time management strategy is knowing the material so well you can configure most of it on auto-pilot.

Every time I completed the lab I graded myself using IE’s answer key. I used to say that I was my own worst enemy. I never gave myself a pass on the slightest discrepancy between my solution and IE solution. Every single mistake that I wrote I listed out in a document, and in the last few weeks before the exam I reread that document several times every day. Constantly reviewing the mistakes I had made reinforced my own errors in my mind.  I also found that in my note documents that I was highlighting certain important points or gotchas with the capital words “BE SURE”. I created another document that I called my “BE SURE” list. I also reviewed this list several times a day in the last few weeks before the exam. Reviewing both my mistakes as well as my “BE SURE” list so frequently was quite effective in helping me remember my mistakes and important notes.

A snippet of my BE SURE list

A snippet of my BE SURE list

When I was studying for my CCIE exam Cisco press had just released two handy books. These books covered all of the commands in IOS at that time for BGP and OSPF. Not only did they describe the commands but they had examples of their use as well. In the last few days before the exam I would review the table of contents of these books which listed all the commands by name. I did this every night in bed. If I was able to accurately describe the command, I would cross it off.  Some commands that I couldn’t remember I saw night after night, until they were so familiar I had no problem using them.  Doing this every night helped me to commit fully to memory all of the different BGP and OSPF commands that make up the core of the CCIE lab exam.

I also took the CCIE Lab Boot Camp from Internetwork Expert just a few weeks before I took the actual lab exam. This was a wonderful experience. I was able to take the course from home, using IE’s Java-based virtual environment. Because most of the work and the class consisted of full, eight-hour timed labs, there was no need to travel to a classroom. And, because the eight hour exams were administered on Internetwork Expert’s own racks of equipment, there was no problem with not having a full CCIE lab at home. We had a small amount of lecture each day, followed by the eight hour lab, which was then graded each night. In the morning we were given our results. I was told that people scoring over 80% generally passed the CCIE lab exam, and I was scoring higher with no problem. The Brians gave me some great advice and particularly fixed some problems that I had in configuring multicast.

At the end of the boot camp Brian Dennis, the grumpier of the Brians, gave what I would charitably call a pep talk. He told us that a test is just a test, that we should get some of the classic books on networking and study them thoroughly, and that we should know our subject, not simply pass the test.  “You meet some CCIEs and wonder, how did this guy pass the test?” Brian said.

In November 2004 the time came to take the test. I had no idea if I was ready. A good friend of mine who passed shortly before spent four hours with me in a sushi restaurant grilling me on every possible subject that could be on the exam. They closed the restaurant on us.  For my final preparation, I studied all of the new features in IOS which they were now using in the CCIE lab. I also studied the documentation CD thoroughly so that I would have no trouble navigating it in the lab.

Passing the test

If you’re working on the CCIE exam, why should you care what someone did to prepare for it ten years ago?  Well, as I’ve said, it is a different test now.  My advice on learning ISDN dial maps isn’t going to help you.  However, there are some general principles here that you should pay attention to.

  1. Figure out the core topics and learn them well.  Cold.  On every expert exam, there are some core topics and some ancillary topics.  You cannot know everything.  Figure out the core topics and drill them over, and over, and over again.  You need to be able to configure them without thinking.
  2. Make things harder than they have to be.  As I said, break things intentionally.  Introduce problems.  Ask questions.  Don’t just run the scenarios you bought with your labs.
  3. Be your own worst enemy.  Remember, the CCIE exam is not just about doing what they tell you, but doing exactly what they tell you.  When you grade yourself, read and re-read the tasks.  Make absolutely sure that you have accurately and completely fulfilled the requirements.
  4. Document your mistakes.  Review things you have done wrong, and keep reviewing them.

In the next post in the series,  Room of Horrors, I describe the CCIE lab experience.  I talk about what it was like to enter the infamous lab in Cisco Building C, and take the challenging exam.