Skip navigation

Author Archives: ccie14023

It’s funny that I remember a time when Cisco Live used to be a privilege, before it became a chore.  I’ve been to every Cisco Live US and every Cisco Live Europe since I started here in 2015.  I enjoy the show primarily because I enjoy meeting with fellow network engineers, and there is still a lot of energy every show.  But I’ve seen all the booths, the flashy keynotes, and the marketing sessions many times over.  I’ve eaten the bad food and stayed in a few bad hotels.

Speaking was one of my favorite activities.  The audiences are intimidating, and as someone who used to have a terrible fear of public speaking, there is always a bit of jitters until my session(s) are over.  But I enjoy communicating technical concepts in a clear and understandable way.  That’s why I got into speaking.  I just wish I was teaching a college course where I grade the students instead of the students grading me.

After I transitioned from being an individual contributor to a manager, I held onto my speaking slots for a while, but the last Cisco Live where I spoke was Las Vegas 2022.  I attended, but didn’t speak, at Cisco Live Amsterdam 2023 and Cisco Live Las Vegas 2023.  It was nice not to have to prepare much, but I missed the “speaker” tab on my badge.  It just wasn’t the same.  Yet this is what happens when you move into leadership.  The sessions at Cisco Live are meant to be technical, and the SGMs who control sessions want technical speakers, not manager-types.  When you lead a team, you also don’t want to take sessions from your staff.

In Vegas, I swore I would not return to Cisco Live without a speaker badge.  If I didn’t get a slot, I would skip CL for the first time since I came back here in 2015.  I applied for two sessions for Europe in Feb 2024, trying to cash in on my “Hall of Fame” status.  The SGMs want relevant topics and good speakers.  I didn’t get any emails and assumed I wasn’t going to Amsterdam.  Have I finally jumped the shark?

Then I got a funny email.  “Do you want your Amsterdam session considered for Las Vegas?”  Huh?  Only speakers get that email.  I checked the Cisco Live Speaker Resource Center, and lo and behold, I saw one DevNet session there.

The session I submitted is called “Time Machine! A 1980’s bulletin board system, revived”.  What??  I submitted this as a total long shot, nearly positive it would be denied.  As detailed in a couple posts on this blog, I ran a bulletin board system in the 1980’s.  I proposed to do a DevNet session on how BBS’s worked and how I had revived my own in an emulator.  And they accepted it!  Ha!  Now I have to come up with the content.  I remember in the abstract where it asks for “business value of this session” and I said something like “There is absolutely no business value to this session.”  I guess the SGMs have a sense of humor.

I guess I’m going to Amsterdam.  I know for a fact one of my three regular readers goes to Cisco Live Europe.  So I look forward to seeing you there.  Meanwhile, I realized at this point in my career I need to submit my sessions in the “IT Leadership” track, so I’m writing some Las Vegas abstracts for that.  Can’t keep an old dog down!

7
1

The first company where I worked as a “systems administrator” had no Internet connectivity at all when I started.  By the time I left, I had installed an analog phone line which was shared amongst several users with modems for dial-up service.  The connectivity options in 1995 were limited, and very expensive.  Our company operated on a shoestring budget and could not afford the costly dedicated service offerings from our ISP.

When I moved to a “consulting” company, I finally had the opportunity to work with real dedicated Internet service.  For the customers I worked with, our main options were two:  ISDN and T1 lines.

ISDN stood for Integrated Services Digital Network.  It came in two major flavors, but we exclusively used the lower-end Basic Rate Interface (BRI).  ISDN was a digital phone line, and like its analog counterpart required dialing a phone number.  The BRI had two data channels which were 64 Kbps each, but we usually ran them together for a combined 128 Kbps.  At the time, this was quite speedy, more than double the speed of the modems we had, and our smaller customers loved ISDN.  Because it was a dial-up technology, however, per minute rates applied.  This meant the line would time out and disconnect periodically to save costs.  When it was down and an outbound packet arrived at the router with the ISDN interface, the router would dial up the SP again.  The connection time was much faster than analog modems, but it still added latency which was annoying.

I don’t know if other regions did it, but we had a local hack to get around this.  Understanding the hack requires a little background on the phone systems of the time.  Two kinds of business phone systems were typically in use:  PBXs and key systems.  With a key system, every phone had an extension number, but no direct dial into it.  If you wanted to speak to the person at extension 302, you dialed the main phone number for the business, and either asked the receptionist to connect you, or else an automated system did it for you.  For outbound dialing, the user would either lift the handset and select an unused line from the pool of lines available, or perhaps dial 9 for the key system to connect them to the next available line.  PBXs, on the other hand, were used by large companies, gave each user their own phone line and allowed inter-office extension-to-extension calling, as well as direct dial from the outside world.  If my extension was 3202, I would have a direct dial phone number of, say, 415-555-3202.

Some companies instead opted for the phone company to do their internal switching.  This was known as a Centrex service.  The phone company provided hard wired analog phone lines to the customer, but enabled extension-to-extension direct dial.  Thus, if I was at extension 3202 and I needed to dial extension 3203, I could pick up the phone and just dial the four digits.  The phone company took care of routing it.

What does this have to do with ISDN?  We used to order Centrex service for our customers in the same Centrex group as their ISP. Thus, the customer’s ISDN line became an “extension” of the ISP’s Centrex group.  Not only could the customer then dial the ISP with four digits (not a big deal when the router is doing the dialing), but there were no toll charges on Centrex lines.  We used to nail the line up so it would never disconnect, and if it did for an reason, it would auto-redial.  And then we had dedicated Internet service on a dial-up line!

T1’s (E1’s elsewhere) were 1.544 Mbps, blazing fast at the time.  Unlike the single-pair ISDN line, T1’s were delivered on four wires, two for TX and two for RX.  I won’t get into the details of line coding on T1’s, which we all studied as junior network engineers.  T1 lines were truly dedicated, and provided a point-to-point connection from customer to ISP.  They were distance-priced, but I worked in San Francisco which is a small city, so it wasn’t usually a factor.  Because 1.544 Mbps was expensive for some customers, we had the option of ordering fractional T1s, fewer channels at a slower speed, but still faster than ISDN.  In the early days we had to terminate the T1 on an external CSU/DSU device, and then run a serial cable to the router, but eventually the CSU/DSU came integrated on the router interface card.

When I worked at the San Francisco Chronicle, we were providing Internet service via a T1 line terminating on a 2500-series router.  (The same one disabled by paint roller in this story.)  1000 users on a single T1 was painfully slow, and we made the decision to upgrade to a DS3 (T3) which ran at 45Mbps.  The interface for DS3 used two BNC coax connections.  I remember being amazed that the phone company could deliver service over coax, but it turns out that service into the building used fiber optics.  Inside the building we ran coax.  The run of coax from the basement to our 2nd floor data center was expensive, but the result was phenomenal.  The DS3, which we terminated on a brand new 7200vxr, was vastly superior to the crawling T1, and the effort paid off with our users.

DSL was groundbreaking.  There was nothing consumer-grade before that, and small companies could not even afford Internet connectivity.  I was one of the first home adopters of DSL.  The freshly-trained phone guy showed up at my apartment and installed a splitter box in the basement.  This was needed because residential service was ADSL, which multiplexed digital service on an analog line.  Unlike ISDN, which converted analog phone signal to digital, ADSL left the analog in tact, adding the digital part of the signal onto the higher frequencies.  The splitter box took the incoming phone line from the street and peeled off the high frequencies, providing an analog signal for telephones.  It then passed through the analog/digital mix intact to the modem, which just ignored the analog frequencies.  The phone guy then sat down with his toolbelt and tried to configure TCP/IP on my computer.  He gave up because he had no idea what he was doing.  I told him to leave me the IP addresses and I’d do it myself.  Eventually the telco would just send you an small filter to plug into each analog phone jack yourself, and they could turn on the service without sending a phone guy to rewire things.  Once DSL in its various forms came out, the Internet was available to the masses.  Of course, cable modems came shortly after.

We take for granted instant connectivity from every location on portable devices.  Once upon a time,  connectivity was only available at certain locations, often requiring dialing a service provider.  There was a real excitement as new technologies emerged for making connectivity faster and easier.  Now, of course, we just expect things to work and get angry when they don’t.

In a recent article (paywall), Elon Musk has once again turned his wrath to remote workers.  Elon has a lot of good ideas, but also many bad ones, such as naming his child “X AE A-XII”.  This is certainly proof that we don’t need to take everything he says seriously.

Elon has said that those who want to work remote are “detached from reality” and give off “Marie Antoinette vibes” (I will forgive his apparent misunderstanding of history).  His argument, to the extent he even articulates it, seems to have two angles:

  • First, it is “morally wrong” to work remotely because so many people in the world cannot do their jobs from home.
  • Second, the productivity of remote workers is not the same.  (I’m extrapolating a bit here.)

I don’t think the first argument is terribly serious.  Just because some people (food service workers, factory workers, etc.) cannot work from home does not mean I should not.  Long before remote work, some people worked in clean offices while others worked in filthy coal mines.  We can debate the injustices of life, but I’m not convinced this disparity should guide office policies.

As to the second, well, as manager of many large teams, I can say that some of my most productive workers are fully remote, i.e., they work entirely from home.  I can think of two or three of my most respected and productive employees who have this arrangement.  Because they don’t live near a Cisco office, they have no choice.  I recently promoted one of them to prinicpal, a major accomplishment that is not given out lightly.  So, we cannot say that working from home automatically means lack of productivity.

On the other hand, I’ve had some very poor performers who worked from home.  This was particularly the case during the lockdowns.  I remember one engineer who seemed to be doing nothing, and when I checked her calendar it was empty.  She took a job elsewhere, finally.  But I, as any good boss should, was well aware of her lack of contribution and would have done something had she not taken the initiative herself.

Does being in the office guarantee productivity?  Not at all!  I can sit around and watch YouTube videos at work just the same as I can from home.  I remember a guy who sat near me many years ago, and had a rearview mirror on his monitor.  He was always playing Solitaire and every time I, or anyone else, walked by his desk he would glance in the mirror and minimize the game.  He wasn’t fooling anyone.

For me, the noise and distractions of the office often make productive work difficult.  Thankfully, post-lockdown, several Cisco buildings are virtually empty, and I decamp to one of them if I need to get actual work done.  Pre-COVID I used to head out to a coffee shop, put in earphones, and get productive work done there.  Open offices are the worst for this.  They make serious work nearly impossible.

Then there’s this…  Let me be open about it.  I never agreed with the lockdowns.  When they first implemented them, I wrote every congressman, city councilman, county supervisor, the health department, the governor, the president, and pretty much anyone I could think of with my opposition to what seemed to me a lunatic idea, and totally unneeded.  Now you can disagree with me vehemently, you can think I’m a jerk, and that’s fine.  But here’s the point:  Almost all the large corporations bought into it.  They could have fought these mandates, but they went along with them, shut their doors, and embraced remote work.  Many started marketing campaigns (and still have them) around hybrid work.  You cannot go 100% in for the thing and then make a 180 degree shift a few years later because you regret your decision.  The outcome of the lockdowns–a lot of people unwilling to return to the office–was entirely predictable.  I think corporations need to embrace the world they created, and live with the consequences of their choices.  Your workers want to be remote, let them be remote.  Sure, give them incentives to come into the office and be together.  Encourage them to do so.  But accept the reality of the new world.

Elon Musk reopened his factories mid-lockdown.  He may not know how to name a child, but I’ll at least give him points for consistency.

I asked our ChatGPT to create a dialog between two network engineers in the style of the absurdist playwright Eugene Ionesco, author of The Bald Soprano.  It’s not even close to the original, but I do think it captures our existence quite nicely.  “We are the clowns of the digital realm” is perhaps my favorite line:

Engineer 1: Good evening. Have you ever stopped to wonder why networks are like mazes?

Engineer 2: Ah, the eternal question! Networks are like mazes because, well, because mazes are like networks!

Engineer 1: But what purpose do they serve? These intricate paths leading us nowhere. It’s all so absurd!

Engineer 2: Absurd indeed! We tirelessly configure routers and switches, creating connections that resemble a tangled web. Yet, do we ever truly reach our destination?

Engineer 1: No, my friend, we simply pass through endless loops, like Sisyphus rolling the boulder up the hill, only to have it roll back down.

Engineer 2: And what about those cryptic error messages we encounter? They mock us, like whispers of an elusive truth we can never grasp.

Engineer 1: Indeed, they are the whispers of a madman, taunting us with their indecipherable codes. “Error 404: Page not found.” How fitting!

Engineer 2: Ah, the irony! We seek connectivity, yet we are trapped in a world of disconnections and network outages.

Engineer 1: Our lives revolve around troubleshooting, forever trapped in an endless cycle of pinging and tracert-ing.

Engineer 2: It’s a tragic comedy, my friend. We are the clowns of the digital realm, forever chasing packets that vanish into thin air.

Engineer 1: And yet, we persist, day in and day out, our minds slowly unraveling like a faulty Ethernet cable.

Engineer 2: We are but players in this grand theater of network engineering, destined to forever dance to the rhythm of broken connections and elusive solutions.

Engineer 1: So, my friend, let us raise our glasses to the absurdity of it all. To the networks that confound us and the mysteries we strive to unravel!

Engineer 2: Cheers to that, my comrade! May we forever find solace in the chaos and laughter in the face of the inexplicable!

(They clink their glasses together, lost in the absurdity of their network engineer existence.)

I’ve been reluctant to comment on the latest fad in our industry, generative AI, simply because everybody has weighed in on it.  I do also try to avoid commenting on subjects outside of my scope of authority.  Increasingly, though, people are coming to me at work and asking how we can incorporate this technology into our products, how our competitors are doing it, and what our AI strategy is.  So I guess I am an authority.

To be honest, I didn’t play with ChatGPT until this week.  When I first looked at it, it wanted my email address and phone number and I wasn’t sure I wanted to provide that to our new AI overlords.  So I passed on it.  Then Cisco release an internal-only version, which is supposedly anonymous, so I decided to try it out.

My first impression was, as they say, “meh.”  Obviously its ability to interpret and generate natural language are amazing.  Having it recite details of its data set in the style of Faulkner was cool.  But overall, the responses seemed like warmed-over search-engine results.  I asked it if AI is environmentally irresponsible since it will require so much computing power.  The response was middle-of-the-road, “no, AI is not environmentally irresponsible” but “we need to do more to protect the environment.”  Blah, blah.  Non-committal, playing both sides of the coin.  Like almost all of its answers.

Then I decided to dive a bit deeper into a subject I know well:  Ancient Greek.  How accurately would ChatGPT be on a relatively obscure subject (and yet one with thousands of years of data!)

Even if you have no interest, bear with me.  I asked ChatGPT if it knew the difference between the Ionic dialect of Herodotus and the more common dialect of classical Athens.  (Our version, at least, does not allow proper names so I had to refer to Herodotus somewhat elliptically.)  It assured me it did.  I asked it to write “the men are arriving at Athens” in the dialect of Herodotus.  It wrote, “Ἀφίκοντο οἱ ἄνδρες εἰς Ἀθήνας,” which is obviously wrong.  The first word there, “aphikonto“, would actually be “apikonto” in the dialect of Herodotus.  He was well known for dropping aspirations.  The version ChatGPT gave me would be the classical Attic version.

I let ChatGPT know it was wrong, and it dutifully apologized.  Then I asked it to summarize the differences in the dialects.  It said to me:

Herodotus and Ionic writers typically removed initial aspirations, while the Attic dialect retained them. For example, “Ἀφίκοντο” (Herodotus) vs. “ἔφικοντο” (Attic)

Uh, you don’t need to know the Greek alphabet to see it made exactly the same mistake, again.  It should have said that Herodotus would use “Ἀπίκοντο” (apikonto) whereas in Attic the word would be “Ἀφίκοντο” (aphikonto.)

Here’s a little more computer-person-friendly example.  I decided to ask about my other favorite subject, retro computing.  I asked, “What is the difference between the Apple IIGS ProDOS 8, ProDOS 16, and GS/OS operating systems?”  Among other things, ChatGPT responded that GS/OS ” introduced features such as a desktop metaphor, drop-down menus, and a hierarchical file system.”  Which happens to be wrong.  ProDOS 16 actually introduced those features.  (In fact, ProDOS 8 introduced the hierarchy.)  I pointed this out.  “You are correct, and I apologize for the oversight in my previous response. I misspoke in stating that GS/OS introduced [those] features.”  Hmmm.

I’m testing it and I know the right answers.   The problem would arise if I were trusting ChatGPT to provide me with a correct answer.  There have been plenty of examples of mistakes made by ChatGPT, such as adding a “permit any” to the top of access-lists.

The issue is, ChatGPT sounds authoritative when it responds.  Because it is a computer and speaking in natural language, we have a tendency to trust it.  And yet it has consistently proven it can be quite wrong on even simple subjects.  In fact, our own version has the caveat “Cisco Enterprise Chat AI may produce inaccurate information about people, places, or facts” at the bottom of the page, and I’m sure most implementations of ChatGPT carry a similar warning.

Search engines place the burden of determining truth or fiction upon the user.  I get hundreds or thousands of results, and I have to decide which is credible based on the authority of the source, how convincing it sounds, etc.  AI provides one answer.  It has done the work for you.  Sure, you can probe further, but it many cases you won’t even know the answer served back is not trustworthy.  For that reason, I see AI tools to be potentially very misleading and potentially harmful in some circumstances.

That aside, I do like the fact I can dialog with it in ancient Latin and Greek, even if it makes mistakes.  It’s a good way to kill time in boring meetings.

My readership is limited, so consider a post to be “viral” if I get more than 2 thumbs up at the bottom of a page.  (Incidentally, I’ve only ever gotten one thumbs down, for this post, but I don’t know why.)  My 2021 post For the Love of Wiring got 3 thumbs up (!) but actually did get a lot more hits than usual after Tom Hollingsworth linked to it from his own blog.  How about a little more layer 1?

After my initial foray into stringing Cat 3 cable around in various unwise ways, Category 5 quickly became the standard.  I hated Cat 5 cable.  Cat 3 had a small number of twists per foot (or meter, non-Americans, or metre, Brits!), so upon removing the jacketing of the cable it was quite easy to untwist it before punching it down.  Cat 5 is very twisted.  Not only are the pairs hard to untwist, but they remain kinked after untwisting, and they take a lot of work to smooth out.  (If you correctly terminate Cat 5, you shouldn’t have to untwist and smooth the wires, but I didn’t know that at first.)  I remember once, on my 10 Mbps Ethernet, running a speed test on Cat 3 cable and then being very disappointed when I saw no improvement running the same test over Cat 5.  (Doesn’t quite work that way, and for 10 Mbps, Cat 3 was more than adequate.)

I did a lot of research to learn how to run cable the correct way.  Mainly this means preserving the tight twists.  Cat 5 cable cannot be kinked or bent sharply, and the twists must be maintained up to the point of termination.  Not only did I use this information to run my own cable, but once I took a job at a computer consulting company, I oversaw many cabling projects and needed to inspect the work done by our vendors.  Voice cable did not have the stringent requirements of data, so often phone cabling experts would run the Cat 5 with tight bends and would untwist the wires several inches before punching down.

The consulting company used one such phone installer to do many of their jobs, often as a sub-contractor.  This was in the days before wireless, when every computer connected to the network, even laptops, had to be plugged in.  I remember one client, a small architectural firm in Berkley, where our installer ran a brand new, Cat 5 Ethernet network.  We showed up, installed a hub, Ethernet cards, etc., and got everyone online.

A week or so later we got called back.  Stations were dropping on and off the network.  I fought may way through Bay Area traffic back to the office to figure out what was going on.

With any layer 1 issue, replacing cables is a good first step.  As I unplugged one station from the wall jack, the entire jack and face plate fell off the wall.  Whoops.

Normally when a network jack is installed in an office building with sheetrock (drywall) walls, the installer cuts a fairly large opening in the sheetrock and then installs a “low voltage ring”.  This ring secures to the drywall from behind, and provides a place for the faceplate to screw into.  Then the Cat 5 cable is punched down on a small “keystone” jack, over which a cover is placed, and which then snaps into the faceplate.

Low voltage ring

Our clueless installer had not done this.  Instead he cut a hole in the drywall just small enough for the jack to fit through.  He never installed the low voltage ring, instead screwing the faceplate directly into the drywall.  He also never installed the cover on the contacts on the jack, so the contacts were covered with drywall powder.  Because screws don’t hold well in drywall, when I pulled the cable from the jack, the whole thing fell out.  I also found out that when he had installed the small office patch panel in their supply closet, he put the screws straight into the drywall as well.  Normally you would use a backboard, screw it into a stud, or at least use drywall anchors.  The patch panel fell off the wall too.

Keystone jack with cover

Needless to say, I wasn’t too happy and neither was the customer.  I hate taking the fall for something that’s not my fault, but the customer considered it our mistake.  I made the cabling vendor come out and redo the entire installation.  After that, I told the owner of our firm to never use that vendor again.

A major concern, even with good cabling vendors, was having people in the office around the cables before they were fully installed.  I remember one client where we had a reputable vendor install the cabling before everyone moved in.  They ran one really large bundle of Cat 5 on the floor, because the client was going to install a raised floor afterwards.  Unfortunately, it took them months to get the raised floor in, and the bundle of cable ran right outside of a row of offices.  People stepped on them going in and out of their offices.  One time I remember a guy in cowboy boots standing right on top of the bundle.  I asked him to move.  By the time the floor covered the cables, they had gone from a clean, round bundle, to totally flattened.  Oddly enough, I never had any problems with the wiring in the time I worked there.

When I worked at the San Francisco Chronicle, our cabling vendor was installing some new fiber optic cabling to some data center racks.  The data center also housed our operations team (NOC, more or less.)  There was one lady who worked there who was very nice, rather large, and a tad immature.  The vendor had laid the fiber out on the floor before routing it under the floor tiles.  We looked up and there was the woman, jumping up and down on the fiber and laughing hysterically.  “Is this good for the cables, is this good for the cables?!” she was saying.  When we explained the interior was made out of glass, she looked horrified and stopped, but it was too late.  It cost us a bit, but fortunately for the NOC lady, she was in a union and well protected.

Working on software now, I don’t have to worry about cabling very much anymore.  I touch racks so infrequently I still call SFPs “GBICs”.  I do think it’s good for network engineers to stay informed on layer 1.  As much as you may know about protocols, software defined networking, or automation systems, none of it will work if the wires aren’t right.

19

As I mentioned in my last post, I like modeling networks using tools like Cisco Modeling Labs or GNS3.  I recalled how, back in TAC, I had access to a Cisco-internal (at the time) tool called IOS on Unix, or IOU.  This enabled me to recreate customer environments in minutes, with no need to hunt down hardware.  Obviously IOU didn’t work for every case.  Often times, the issue the customer raised was very hardware specific, even when it was a “routing protocol” issue.  However, if I could avoid hardware, I would do the recreate virtually.

When I worked at Juniper (in IT), we did a huge project to refresh the WAN.  This was just before SD-WAN came about.  We sourced VPLS from two different service providers, and then ran our own layer 3 MPLS on top of it.  The VPLS just gave us layer 2 connectivity, like a giant switch.  We had two POPs in each region which acted as aggregation points for smaller sites.  For these sites we had CE routers deployed on prem, connecting to PE routers in the POPs.  This is a basic service provider configuration, with us as a service provider.  Larger sites had PE routers on site, with the campus core routers acting as CEs.

We got all the advantages of layer 3 MPLS (traffic engineering, segmentation via VRF) without the headaches (peering at layer 3 with your SP, yuck!)

As the “network architect” for IT, I needed a way to model and test changes to the network.  I used a tool called VMM, which was similar to IOU.  Using a text file I could define a topology of routers, and their interconnections.  Then I used a Python script to start it up.  I then had a fully functional network model running under a hypervisor, and I could test stuff out.

I never recreated the entire network–it wasn’t necessary.  I created a virtual version with two simulated POPs, a tier 1 site (PE on prem), and a tier 2 site (PE in POP).  I don’t fully remember the details, there may have been one or two other sites in my model.

For strictly testing routing issues assuming normally functioning devices, my VMM-based model was a dream.  Before we rolled out changes we could test them in my virtual lab.  We could apply the configuration exactly as entered into the real device, to see what effect there would be in the network.  I just didn’t have the cool marketing term “digital twin” as it didn’t exist yet.

I remember working on a project to roll out multicast on the WAN using Next Generation Multicast VPN (NGMVPN).  NGMVPN was (is?) a complex beast, and as I designed the network and sorted out things like RP placement, I used my virtual lab.  I even filed bugs against Juniper’s NGMVPN code, bugs I found while using my virtual devices.  I remember the night we did I pilot rollout to two sites.  Our Boston office dropped off the network entirely.  Luckily we had out-of-band access and rolled back the config.  I SSHd into my virtual lab, applied the config, and spend a short amount of time diagnosing the problem (a duplicate loopback address applied), and did so without the stress of troubleshooting a live network.

I’ve always been a bit skeptical of the network simulation/modeling approach.  This is where you have some software intellgence layer that tries to “think through” the consequences of applied changes.  The problem is the variability of networks.  So many things can happen in so many ways.  Actual devices running actual NOS code in a virtual environment will behave exactly the way real devices will, given their constraints.  (Such as:  not emulating the harware precisely, not emulating all the different interface types, etc.)  I may be entirely wrong on this one, I’ve spent virtually no time with these products.

The problems I was modeling were protocol issues amongst a friendly group of routers.  When you add in campus networking, the complexity increases quite dramatically.  Aside from wireless being in the mix, you also have hundreds, thousands of non-network devices like laptops, printers, and phones which often cause networks to behave unpredictably.  I don’t think our AI models are yet at the point where they can predict what comes with that complexity.

Of course, the problem you have is always the one you don’t predict.  In TAC, most of the cases I took were bugs.  Hardware and software behaves unexpectedly.  As in the NGMVPN case, if there is a bug in software that is strictly protocol related, you might catch it in an emulation.  But many bugs exist only on certain hardware platforms, or in versions of software that don’t run virtually, etc.

As for digital twins, I do think learning to use CML (of course I’m Cisco-centric) or similar tools is very worthwhile.  Preparing for major changes offline in a virtual environment is a fantastic way to prep for the real thing.  Don’t forget, though, that things never go as planned, and thank goodness for that, as it gives us all job security.

We all have to make a decision, at some point in our career, about whether or not we get into the management track.  At Cisco, there is a very strong path for individual contributors (IC).  You can be come a principal (director-level), a distinguished (senior director-level), and a fellow (VP-level) as an IC, never having to manage a soul.  When I married my wife, I told her:  “Never expect me to get into management, I’m a technical guy and I love being a technical guy, and I have zero interest in managing people.”

Thus, I surprised myself back in 2016 when my boss asked me, out of the blue, to step into management and I said yes.  Partly it was my love of the Technical Marketing Engineer role, partly my desire to have some authority behind my ideas.  At one point my team grew to fifty TMEs.

All technical people know that, when you go that route, your technical skills will atrophy as you will have less and less hands-on experience.  This is very true.  In the first couple of years, I kept up my formidible lab, then over time it sat in Cisco building 23, unused and consuming OpEx.  I almost surrendered it numerous times.

Through attrition and corporate shenanigans, my team is considerably smaller (25 or so) and run by a very strong management team.  Last week, I decided to bring the lab back up.  I’ve been spending a lot of time sorting through servers and devices, figuring out which to scrap and which to keep.  (Many of my old servers require Flash to access the CIMC, which is not feasible going forward.)  I haven’t used ESXi in years, and finding out I can now access vSphere in a browser–from my Mac!!–was a pleasant surprise.  Getting CIMCs upgraded, ESXi installed, and a functional Ubuntu server was a bit of a pain, but this is oddly the sort of pain I miss.

I have several Cat 9k switches in my lab, but I installed Cisco Modeling Labs on one of my servers.  (The nice thing about working for Cisco is the license is free.)  I used VIRL many years ago, which I absolutely hated.  CML is quite slick.  It was simple to install, and within a short time I had a lab up and running with a CSR1kv, a Cat 8k, and a virtual Cat 9k.

When I was in TAC I discovered IOS on Unix, or IOU.  Back then, TAC agents were each given a Sun Sparc station, and I used mine almost exclusively to run IOU.  (I thought it was so cool back then to have a Sun box on my desk.  And those of you who remember them will know what I mean when I say I miss the keyboard.)  IOU allowed me to define a topology in a text file, and then spin up several virtual IOS devices on the Sparc station in that topology.  It only supported sinulated Ethernet links, but for pure routing protocols cases, IOU was more than adequate to recreate a customer environment.  In 15 minutes I could have my recreate up and running.  Other engineers would open a case to have a recreate built by our lab team, which could take days.  I never figured out why they wouldn’t use IOU.

When I left Cisco I had to resort to GNS3, which was a pretty helpful piece of software.  Then, when I went to Juniper I used Junosphere, or actually an internal version of it called VMM, to spin up topologies.  VMM was awesome.  Juniper produced a virtual version of its MX router that was so faithful to the real thing that I could pass the JNCIE Service Provider exam without ever having logged into a real one, at least until exam day.

It’ll be interesting to see what I can do on virtual 9ks in CML–I hear there are some limitations.  But I do plan to spend as much time as possible using the virtual version over the real thing.

One thing I think I lost sight of as I (slightly) climbed the corporate ladder was the necessity of technical leadership.  We have plenty of people managers and MBAs.  We need leaders who understand the technology, badly.  And while I have a lot of legacy knowledge in my mental database, it’s in need of refresh.  It’s hard to stay sharp technically when reading about new technologies in PowerPoint.

The other side of this is that, as engineers, we love the technology.  I love making stuff work.  My wife is not technical at all, and cannot understand why I get a thrill from five little exclamation points when a ping goes through.  I don’t love budgets and handling HR cases, although I’ve come to learn why I need to do those things.  I need to do them so my people can function optimally.  And I’m happy to do them for my team.

On the other hand, I’m glad to be in the frigid, loud, harsh lighting of a massive Cisco lab again. It’s very cool to have all this stuff.   Ain’t life grand!

I haven’t written anything for a while, because of the simple fact that I had nothing to say.  The problem with being a writer is that sometimes you have nothing to write.  I also have a day job, and sometimes it can keep me quite busy.  Finally, an afternoon drive provided some inspiration.

There’s a funny thing about the buildings I work in–they all tend to be purchased by Google.  When I started at Juniper, I worked in their Ariba campus in Mountain View, several buildings they rented from that software company.  We were moved to Juniper’s (old) main campus, on Mathilda Drive, and the old Ariba buildings were bought and re-purposed by Google.  Then the Mathilda campus was bought by Google.

When I worked in TAC, from 2005-2007, I worked in building K on Tasman Drive in San Jose.  Back then, the meteoric growth of Cisco was measured by the size of its campus, which stretched all along Tasman, down Cisco way, and even extended into Milpitas.

Cisco’s campus has been going the opposite direction for a while now.  The letter buildings (on Tasman, West of Zanker Street) started closing before the COVID lockdowns changed everything.  Now a lot of buildings sit empty and will certainly be sold, including quite possibly the ones in Milpitas, where I work.

Building K closed, if not sometime during the lockdowns, shortly after.  I hadn’t driven by it in months, and when I did yesterday-lo and behold!-it was now a Google building!

What used to be building K

It’s funny how our memories can be so strongly evoked by places.  Building K was, for a long time, the home to Cisco TAC.  I vividly remember parking on Champion Drive, reviewing all of my technological notes before going in to be panel-interviewed by four tough TAC engineers.  I remember getting badged in the day I started, after passing the interview, and being told by my mentor that he wouldn’t be able to put me “on the queue” for three months, because I had so much to learn.

Two weeks later I was taking cases.  Not because I was a quick study, but because they needed a body.

I worked in High Touch Techical Support, dealing with Cisco’s largest customers.  The first team I was on was called ESO.  Nobody knew what it stood for.  The team specialzied in taking all route/switch cases for Cisco’s large financial customers like Goldman Sacks and JPMC.  Most of the cases involved the Cat 6k, although we supported a handful of other enterprise platforms.

When a priority 1 case came in, the Advanced Services Hotline (ASH) call center agents would call a special number that would cause all of the phones on the ESO team to play a special ring tone.  I grew to develop a visceral hatred of that ring tone.  Hearing it today would probably trigger PTSD.  I’d wait and wait for another TAC engineer (called CSEs) to answer it.  If nobody did, I’d swallow hard and grab the phone.

The first time I did it was a massive multicast meltdown disrputing operations on the NYSE trading floor.  I had just gotten my CCIE, but I had only worked previosly as a network engineer in a small environment.  Now I was dealing with a major outage, and it was the first time I had to handle real-world multicast.  Luckily, my mentor showed up after 20 minutes or so and helped me work the case.

My first boss in HTTS told me on the day I started, “at Cisco, if you don’t like your boss or your cubicle, wait three months.”  Three months later I had a new boss and a new cubicle.  The ESO team was broken up, and its engineers dispersed to other teams.  I was given a choice:  LAN Switch or Routing Protocols.  I chose the latter.

I joined the RP-LSA team as a still new TAC engineer.  The LSA stood for “Large Scale Architectures.”  The team was focused on service provider routing and platform issues.  Routing protocol cases were actually a minority of our workload.  We spent a lot of time dealing with platform issues on the GSR 12000-series router, the broadband aggregation 10000-series, and the 7500.  Many of the cases were crashes, others were ASIC issues.  I’d never even heard of the 12k and 10k, now I was expected to take cases and speak with authority.  I leaned on my team a lot in the early days.

Fortunately for me, these were service provider guys, and they knew little about enterprise networking or LAN switching.  With the breakup of the ESO team, the large financials were now coming into the RP-LSA queue.  And anyone who has worked in TAC can tell you, a routing protocols case is often not an RP case at all.  When the customer opens a case for flapping OSPF adjacencies, it’s usually just a symptom of a layer 2 issue.  The SP guys had no clue how to deal with these, but I did, so we ended up mutually educating each other.

In those days, most of the protocol cases were on Layer 3 MPLS.  I had never even heard of MPLS before I started there, but I did a one week online course (with lab), and started taking cases like a champ.  MPLS cases were often easily because it was new, but usually when a large service provider like AT&T, Orange, or Verizon opens a case on soemthing like BGP, it’s not because they misconfigured a route map.  They’ve looked at everything before opening the case, and so the CSE becomes essentially a middleman to coordinate the customer talking to developers.  In many cases the CSE is like a paramedic, stabilizing the patient before the doctor takes over to figure out what is wrong.  We often knew we were facing a bug, but our job was to find workarounds to bring the network back up so developers could find a fix.

I had my share of angry customers in those days, some even lividly angry.  But most customers were nice.  These were professional network engineers who understood that the machines we build don’t always act as we expect them to.  Nevertheless, TAC is a high-stress job.  It’s also relentless.  Close one case, and two more are waiting in the queue.  There is no break, no downtime.  The best thing about it was that when you went home, you were done.  If a call came in on an open case in your backlog, it would be routed to another engineer.  (Though sometimes they routed it back to you in the morning.)  In HTTS, we had the distinct disadvantage of having to work cases to resolution.  If the case came in at 5:55pm on Friday night, and your shift ended at 6pm, you might spend the next five hours in the office.  Backbone TAC engineers “followed the sun” and re-assigned cases as soon as their shift ended.

I make no secret of the fact that I hated the job.  My dream was to work at Cisco, but shortly after I started, I wanted out.  And yet the two years I spent in TAC are two of the most memorable of my career.  TAC was a crucible, a brutal environment dealing with nasty technical problems.  The fluff produced by marketeers has no place there.  There was no “actualize your business intent by optimizing and observing your network”-type nonsense.  Our emails were indeciperhable jumbles of acronyms and code names.  “The CEF adjacency table is not being programmed because the SNOOPY ASIC has a fault.”  (OK, I made that up…  but you get the point.)  This was not a place for the weak-minded.

When things got too sticky for me, I could call in escalation engineers.  I remember one case where four backbone TAC escalation engineers and one from HTTS took over my cube, peering at my screen and trying to figure out what was going on during a customer meltdown.

Building K was constructed with the brutalist-stlye of architecture so common in Silicon Valley.  One look at the concrete and glass and the non-descript offices and conference rooms is enough to drain one’s soul.  These buildings are pure function over form.  They are cheap to put up and operate, but emotionally crushing to work in.  There is no warmth, even on a winter day with the heat on.

Still, when I look at building K, or what’s become of it, I think of all the people I knew there.  I think of the battles fought and won, the cases taken and closed, the confrontational customers and the worthless responses from engineering, leaving us unable to close a case.  I think of the days I would approach that building in dread, not knowing what hell I would go through that day.  I also think of the incredible rush of closing a complex case, of finding a workaround, and of getting an all-5’s “bingo” (score) from a customer.  TAC is still here, but for those of us who worked in building K, its closure represents the end of an era.

There’s a lot of talk about networking simplicity these days.  There’s been a lot of talk about networking simplicity, in fact, for as long as I can remember.  The drive to simplify networking has certainly been the catalyst for many new products, most (but not all) unsuccessful.  Sometimes we forget that networking has some inherent complexities (a large distributed system with multiple os’s, protocols, media types), but that much of the complexity can be attributed to humans and their choices.  IPv4 is a good example of this.

When I got into network engineering, I had assumed that network protocols were handed down from God and were immaculate in their perfection.  Reading Radia Perlman’s classic book Interconnections changed my understanding.  Aside from her ability to explain complex topics with utter clarity, Perlman also exposed the human side of protocol development.  Protocols are the result of committees, power politics, and the limitations of human personality.  Some protocols are obviously flawed.  Some flaws get fixed, but widely deployed protocols, like IPv4, are hard to fix.  Of course, v6 does remedy many of the problems of v4, but it’s still IP.

My vote for simplest protocol goes to AppleTalk.  When I was a young network guy, I mostly worked on Mac networks.  This was in the beige-box era before Jobs made Apple “cool” again.  The computers may have been lame, but Apple really had the best networking available in the 1990’s.  I’ve written about my love for LocalTalk, and its eminently flexible alternative PhoneNet in the past.  But the AppleTalk protocol suite was phenomenal as well.

N.B.  My description of AppleTalk protocol mechanics is largely from memory.  Even the Wikipedia article is a bit sparse on details.  So please don’t shoot me if I misremember something.

In the first place, you didn’t need to do anything to set up an AppleTalk network.  You just connected the computers together and switched either the printer or modem port into a network port.  Auto-configuration was flawless.  Without any DHCP server, AppleTalk devices figured out what network they were on, and acquired an address.  This was done by first probing for a router on the network, and then randomly grabbing an address.  The host then broadcast its address, and if another host was already using it, it would back off and try another one.  AppleTalk addresses consisted of a two byte network address which was equivalent to the “network” portion of an IP subnet, and a one-byte host address (equivalent to the “host” portion of an IP subnet.)  If this host portion of the address is only one byte, aren’t you limited to 255 (or so) addresses?  No!  AppleTalk (Phase 2) allowed aggregation of contiguous networks into “cable ranges”.  So I could have a cable range of 60001-60011, multiple networks on the same media, and now I could have 2530 end stations, at least in theory.

Routers did need some minimal configuration, and support for dynamic routing protocols was a bit light.  Once the router was up and running, it would create “zones” in the end-user’s computer in an application called “Chooser”.  They might see “1st floor”, “2nd floor”, “3rd floor”, for example, or “finance”, “HR”, “accounting”.  However you chose to divide things.  If they clicked on zone, they would see all of the AppleTalk file shares and printers.  You didn’t need to point end stations at their “default gateway”.  They simply discovered their router by broadcasting for it upon start up.

AppleTalk networks were a breeze to set up and simple to administer.  Were there downsides?  The biggest one was the chattiness of the protocols.  Auto-configuration was accomplished by using a lot of broadcast traffic, and in those days bandwidth was at a premium.  (I believe PhoneNet was around 200 Kbps or so.)  Still, I administered several large AppleTalk networks and was never able to quantify any performance hit from the broadcasts.  Like any network, it required at least some thinking to contain network (cable range) sizes.

AppleTalk was done away with as the Internet arose and IP became the dominant protocol.  For hosts on LocalTalk/PhoneNet networks, which did not support IP, we initially tunneled it over AppleTalk.  Ethernet-connected Macs had a native IP stack.  The worst thing about AppleTalk was the flaky protocol stack (called OpenTransport) in System 7.5, but this was a flaw in implementation, not protocol design.

I’ll end with my favorite Radia Perlman quote:  “We need more people in this industry who hate computers.”  If we did, more protocols might look like AppleTalk, and industry MBAs would need something else to talk about.