Skip navigation

I started this blog in February 2013.  Amazingly, I’m closing in on ten years.  It’s certainly turned out differently than I expected, and I haven’t written as much as I’d liked.

In 2016, I hired a company to update the blog and design a new theme.  Maintaining your own WordPress blog is a lot of work, and it was worthwhile to have an expert fix things up.  I was fairly happy with the results, although there are a few glitches in the theme that annoy me.  The tile layout was never my favorite, and it gets really uneven depending on whether I choose a featured photo or not.

Anyways, with the WP updates over the years, my theme is breaking down.  On the editing page, many buttons don’t work at all.  I contacted the original company, which still exists, but I haven’t been able to get a response.

Over the next few weeks, I’m going to experiment with the built-in WP themes to see if there is one I like.  Expect the look and feel to change from time to time as I explore options.  Hopefully it won’t throw off the spambots that seem to be my primary readership!

I thought I’d take  break from Cisco Live to relive some memories in another Netstalgia.

Working in product management at a Cisco business unit, we are constantly talking about the latest and greatest, the cutting edge of technology.  It’s easy to forget how many customers out there are nowhere near the cutting edge.  They’re not near any edge at all.  When I worked at a Gold partner, I got to see all sorts of customer networks in a variety of states.

I remember one customer who ended up spending a lot of money with my company.  They were (and are) a major grocery chain in the United States.  They had reached a point when their network was grinding to a halt, and they needed help.  They had two overworked network engineers running things, and I remember being amused that their company policy required them to wear ties in the office.  This was not a financial company in San Francisco, but a discount grocery chain in a very relaxed part of the East Bay.  Anyways, they had hosts dropping off the network, performance problems, and their mainframe kept losing contact with its default gateway.

Walking in to these kinds of situations, you’re never sure what you might find.  Oftentimes the problems aren’t clear and the customer gets frustrated with their hired gun.  In this case, the very affable in-house engineers were thrilled to have experienced help.  They explained to me that the entire network, a large corporate office and countless stores, were on a single /8 network.  Only the on-site data center had a separate subnet.  Even the remote sites were in the /8!!

It got worse.  The stores were connected to HQ with IPSec VPN, but the hardware VPN devices they were made by a company that no longer existed.  The devices kept failing, and one of the network engineers had a stock of them he had purchased on eBay.  He amazingly was using his electronics skills to perform component-level repairs on the devices, cannibalizing parts from the eBay stash, which enabled him to stretch the stash longer than if he had simply swapped them.

My favorite was the data center.  The mainframe was sending constant pings to its default gateway, which would occasionally drop packets, in which case the mainframe would declare the gateway dead.  I found out that the default gateway was none other than a 2500-series router.

An old 2503 router

Even in 2009, this router was ancient history.  It had an old AUI connector on it which was nearly falling out.  In their 100 Mbps environment, they were limited to 10 Mbps.  I seem to recall it was doing router-on-a-stick on that interface, hairpinning traffic, but I don’t think the 2500 could do subinterfaces, so I may be wrong.  Anyways, the poor little 2500 was being slammed by traffic and dropping packets from time-to-time.

The ubiquitous CentreCOM AUI 10Base-T transceiver

I spent months at the client.  We designed a subnet scheme, renumbered their network, installed ASAs for IPSec, cut over all the stores, and put some high-end switches in the data center.  They were a grateful client, never complained, and I was able to make a genuine improvement in the lives of their users.  Unlike that other client I wrote about before.

I have a lot of bad memories of working for that partner, but one of the most interesting things was walking into so many different customers’ worlds, seeing what they dealt with every day, the mistakes they had made in building their networks, and helping them out by fixing those mistakes.

Getting a session at Cisco Live is not a given, even for a Principal TME.  I started at Cisco in October 2015, and I certainly didn’t expect to present at, or even go to, Cisco Live Berlin in January 2016.  Normally, there are three ways to secure a session at CL:

  1. Submit an idea during the “Call for papers” process about six months before a given event.  The Session Group Managers (SGMs) who manage individual tracks (e.g., security, data center) then must approve it.  It can be hard to get that approval.  SGMs need to have faith in your ability as a presenter, as well as believe your topic is relevant and unique.
  2. Carry over a session from the previous CL.  If you’ve presented a session, you can check a box in the Cisco Live management tool to ask for it to be carried over to the next CL.  Again, this is dependent on the SGMs approving it.
  3. Be handed a session by somebody who had it approved, but is not going to present it.  Perhaps they are leaving, taking a new job, or just don’t want to do it any more.  Usually the SGMs need to approve any re-assignments as well.  (You can see the SGMs are rather powerful for CL.  It helps to know them well, and it hurts when they turn over!)

When I joined Cisco, I was assigned to a wonderful and humble team working on programmability.  We met and discussed the various assignments and approvals we had for Cisco Live, and they kindly offered my two:  booth duty for an Ansible with Nexus demo, and a session at DevNet called “Automation with NXOS:  Let’s get started!”

A Principal TME is a director-level position, and normally a PTME would not be expected to spend all day at a booth.  However, since I was new to the company, position, and team, I decided it would be a good idea to do some of the grunt work a normal TME does, and experience booth duty.

My 2016 Berlin CL Booth before opening

As for the DevNet session:  DevNet, Cisco’s developer enablement team, runs a large section of the Cisco Live show floor.  DevNet has theaters which are open-seating and divided from the rest of the show floor.  A typical CL breakout takes place in a room, whereas DevNet sessions are out in the open.  In 2016, it was pretty easy to get DevNet sessions, and nobody cared when the team re-assigned it to me.  I had a free trip to Europe and plenty to do when I got there.

What you see at Cisco Live is the fruit of months of preparation.  I had to develop the entire booth demo from scratch–I was supposed to have help from another TME from a different team, but he was totally worthless.  I set up the lab and wrote the demo script myself.  For the DevNet session, I pulled together slides from my colleagues and did my best to master them.  Keep in mind, I came to Cisco after six years at Juniper.  I didn’t know a thing about Nexus, and programmability was new to me.

Every new speaker at CL is required to undergo speaker training, so I signed up for mine.  In 1 hour the non-technical trainer gave me a few pointers.  I’ve been through enough speaker training in the past that it wasn’t terribly helpful, but the box was checked.

Arriving in Berlin, I registered at CL and as a speaker and staffer was given an “all-access” pass.  I could go anywhere at the show.  Personally, I’ve always loved having backstage access to anything, and so I headed to the World of Solutions (WoS, the show floor) and spent a long time trying to find my booth.  WoS before it opens is a genuine mess–people running cherry-pickers and forklifts, laying down carpet, and well-dressed booth people all contending for space.  There are usually challenges getting the demo computers up and running, connected to demos back home, etc.

WoS is a mess when we arrive

Working a booth can be frenetic or boring.  The positioning of my booth and the content of the demo (Ansible automation of NXOS) did not generate a lot of traffic.  I spent hours standing at the booth with nothing to do.  For the occasional customer who would show an interest, I’d run the demo, and possibly do a little white boarding.  Then, reset the demo and wait for the next guy. It wasn’t a lot of fun.

Eventually, the time came for the DevNet session.  I was really nervous for my first time in front of a CL audience. Would I mess up?  Would I choke up due to nerves?  Would my audience ask questions I couldn’t answer?

Seeing your own name on the board is exciting and nerve-wracking

As I said, the DevNet sessions are presented out on the show floor, and it’s a terrible speaking environment.  It’s noisy, you cannot hear yourself, and the participants were given headphones so they could hear.  It was like speaking into a void.  I remember one gentleman bouncing between sleep and wakefulness, his head nodding down and then coming alive again.  The presentation was not one of my best, but I got the job done acceptably and the participants filled out their paper score sheets at the end.  I mostly had 5’s, and a few 4’s.  At that point DevNet sessions did not receive an official score, so my numbers didn’t “count”, but I could show them to my boss and get some credit, at least.

My wife had traveled with me and we took a few sightseeing trips.  We saw the amazing museums on Berlin’s “museum island” and also hired a driver to give us a tour.  We had several team events around the city–Cisco Live is famous for parties–and ate some very good German food.  One of my colleagues was well known for arranging parties that went until four or five AM, and many TMEs would show up to their 8am session with only a couple hours of sleep.  In fact, one of the other Hall of Fame Distinguished Speakers claims this was his secret to success!  I myself, avoid parties like that and spend hours in my room practicing my presentation before giving it.  To each his own, I suppose.

Ah, the perks of Cisco Live!

Network engineers are a breed unto ourselves, and I think we have a distinct feeling of community.  Our field is highly specialized, and because we often have to defend our domain from those who do not understand it (“it’s not the network, ok?!”), we have a camaraderie that’s hard to match.  I left Berlin on a real high, feeling more a part of that community than ever having been there in a Cisco uniform, and having gotten up in front of an audience.  I didn’t know what my future held at Cisco, but it was the first of many such experiences to come.

The last Cisco Live I attended was in Barcelona in January 2020.  As I was in the airport heading home, I was reading news of a new virus emerging from China.  I looked with bemusement at a troop of high-school-age girls who all had surgical masks on.  Various authorities told us not to wear masks, saying they don’t do much to prevent viral spread at a large scale.  The girls kept pulling the masks on and off.  I thought back on my performance at Cisco Live, and looked forward to Cisco Live in Las Vegas in the summer.  Who knew that, a few months hence, everyone would be wearing masks and Cisco Live,  physically, would be indefinitely postponed.

For Technical Marketing Engineers (TMEs), Cisco Live (technically Cisco Live!) measures the seasons of our year like the crop cycle measures the seasons of a farmer’s year.  Four times annually a large portion of our team would hop on an airplane and depart for Europe, Cancun, Melbourne, or a US city.  Cancun and Melbourne were constant, but the European and US cities would change every couple of years.  In my time with Cisco, I have traveled to Cancun and Melbourne, Berlin, Barcelona, Las Vegas, Orlando, and San Diego to present and staff Cisco Live.

A trade show may just be a corporate event, but for those of us who devoted our career to that corporation’s products, it’s far more than a chance for a company to hawk its products.  The breakout sessions and labs are critical for staying up-to-date on a fast-moving industry, the keynotes are always too high-level but with entertaining productions, and the parties are a great chance to connect with other network engineers.  CL was fun for participants, exhausting for those of us staffing it, but still my favorite part of the job.

Cisco Live was originally called Networkers, and started in 1990.  For many years I badly wanted to go to this temporary Mecca of networking technology, but I worked for companies that would not pay the cost of a badge and the travel fees, a total of thousands of dollars.  Even when I first worked at Cisco, from 2005-2007, as a lowly TAC engineer I never had the opportunity to attend.  My first trip to CL came in 2007, when I was working for a Gold partner.  They sent several of us to the Anaheim show, and I remember well the thrill of walking into a CL for the first time.  I walked the show floor, talked to the booth staffers, and attended a lot of breakout sessions of varying quality.  I was quite excited to go to the CCIE party, but I’m not sure why I thought a party full of CCIEs would really be all that exciting.  I remember hanging out by myself for an hour or so before I gave up because I didn’t know anyone there.

The same partner sent me to Orlando in 2008 as well, just barely.  The recession was starting and we were short on cash.  My boss wanted me to share a badge with a colleague, and I didn’t like the idea of having to juggle time slots nor or trying to explain to security how my name could be “Nguyen”.  Thankfully, they ponied up the cash for a second badge.  I’m not a fan of loud music, so I generally don’t go to the CL party, but for Orlando they opened up Universal Studios for us and the aforementioned Nguyen and I, along with a couple others, had a great time on the rides and attending the Blue Man Group.  (OK, some loud music there, but it is an entertaining show.)

I attended CL once more before I came back to Cisco–in 2014, ironically, as an employee of Juniper.  Somehow I convinced my boss to give me a pass on the grounds of researching what Cisco IT was doing.  (They do present at Cisco Live.)  I remember sitting in just a few rows back at the keynote as John Chambers presented, amused I’d be bringing a report back to Juniper about what I’d heard.

 

My view of Chambers at CL 2014

It was actually at Cisco Live when I first got the idea to be a technical marketing engineer.  It’s a bit embarrassing, but I sat in a presentation given by a TME and thought, “I could do better than this guy.”  It took a few years, but I finally managed to get into tech marketing.

I became a Principal TME at Cisco in late 2015 and was told I’d be presenting at Cisco Live in Berlin in January, 2016!  Needless to say, I was thrilled to be given the opportunity, humbled, and more than a little nervous about standing up in front of an audience at the fabled event.

It’s been a sad year in so many ways.  After I came home from Barcelona in January 2020, I received another Distinguished Speaker award and knew I would be inducted into the Hall of Fame.  This was a dream of mine for years, but instead of standing up in front of my peers at Cisco Live Vegas to receive the award, it was mailed to me.  There would be no show floor, no breakouts, no CCIE parties.  The event would go virtual.  I must say, I am impressed with the CL team’s ability to pivot to a virtual format in so short a time.  Still, it was a sad year for those of us who organize the event, and those who were hoping to attend.

In the next couple posts, I thought I would offer a little behind-the-scenes look at how we put on CL, and look at a few events from the past.

My first IT job was at a small company in Novato, California, that designed and built museum exhibits.  At the time most companies either designed the exhibits or built them, but ours was the only one that did both.  You could separate the services, and just do one or the other, but our end-to-end model was the best offering because the fabricators and designers were in the same building and could collaborate easily.  The odd thing about separating the functions was that we could lose a bid to design a project, but win the bid to build it, and hence end up having to work closely with a competitor to deliver their vision.

A museum exhibit we designed and built

The company was small–only 60 employees.  Half of them were fabricators who did not have computers, whereas the other half were designers and office staff who did.  My original job was to be a “gopher” (or go-fer), who goes for stuff.  If someone needed paint, screws, a nail gun, fumigation of a stuffed tiger, whatever, I’d get in the truck and take care of it.  However, they quickly realized I was skilled with computers and they asked me to take over as their IT guy.  (Note to newbies:  When this happens, especially at a small company, people often don’t forget you had the old job.  One day I might be fixing a computer, then the next day I’d be hauling the stuffed tiger.)

This was in the mid-1990’s, so let me give you an idea of how Internet connectivity worked:  it didn’t.  We had none when I started.  We had a company-internal network using LocalTalk (which I described in a previous post), so users could share files, but they had no way to access the Internet at all.  We had an internal-only email system called SnapMail, but it had no ability to do SNMP or connect beyond our little company.

The users started complaining about this, and I had to brainstorm what to do when we had virtually no operating budget at all.  I pulled out the yellow pages and looked under “I”, and found a local ISP.  I called them, and the told me I could use Frame Relay, a T1, or ISDN.  I had no idea what they were talking about.  The sales person faxed me a technical description of these technologies, and I still had no idea what they were talking about.  At this point I didn’t know the phone company could deliver anything other than, well, a phone line.  I wasn’t at the point where I needed to hear about framing formats and B8ZS line encoding.

We decided we could afford neither the ongoing expense, nor the hardware, so we came up with a really bad solution.  We ordered modems for three of the computers in the office:  the receptionist, the CEO, and the science researcher.  For those of you too young to remember, modems allow you to interface computers using an ordinary phone line.  We ordered a single phone line (all we could afford).  When one of them wanted to use the Internet, they would run around the office to check with the other two if the line was free.

A circa-1990’s Global Village modem

The reason we gave the receptionist a modem is amusing.  Our dial-up ISP allowed us to create public email addresses for all of our employees.  However, they all dumped into one mailbox.  The receptionist would dial in in the morning, download all the emails, and copy and paste them into the internal email system.  If somebody wanted to reply, the would send it to the receptionist via SnapMail and she would dial up, paste it into the administrator account, and send it.  Brilliant.

Needless to say, customer satisfaction was not high, even in those days.  Sick of trying to run IT with no money, I bailed for a computer consulting company in San Francisco and started installing the aforementioned T1s and ISDN lines for customers, with actual routers.

If ever you’re annoyed with slow Wi-Fi, be glad you aren’t living in the 1990’s.

After I left TAC I worked for two years at a Gold Partner in San Francisco.  One of my first customers there was one of my most difficult, and it all came down to timing.

I was dispatched to perform a network assessment of a small real-estate SaaS company in the SF East Bay.  Having just spent two years in TAC, I had no idea how to perform a network assessment, and unfortunately nobody at the partner was helping me.  I had been told they had a dedicated laptop loaded with tools just for this purpose, but nobody could locate it.  I started downloading tools on my own, but I couldn’t find a good, free network analysis tool.  Another engineer recommended a product called “The Dude” from MikroTik, and since it was easy to install I decided to use it.  I needed to leave it collecting data for a few days, and since nobody had provided me an assessment laptop I had to leave my own computer there.  I distinctly remember the client asking me what tool I was using to collect data, and sheepishly answering “Uh, it’s called The Dude.”  He looked at me skeptically.  (Despite the name, the tool was actually quite decent.)

Without any format or examples for an assessment, I looked at bandwidth utilization, device security, IOS versions, and a host of other configuration items.  The network was very simple.  It was a small company with a handful of switches in their main office, and a T1 line connecting them to a satellite office in LA.  They used a Cisco VoIP system for phones, and the satellite phones connected over the T1 back to the main campus.  I wrote up a document with recommendations and presented it to the customer.  Almost everything was minor, and they agreed to have me come back in and make a few upgrades.

One item I noted in the assessment was that the clocks on the routers and switches were set incorrectly.  The clock has absolutely nothing to do with the operation of the device, but having just come from TAC I knew how important device clocks are.  If there is a network-wide incident, one of the first things we look at is the logging messages across the network, and without an accurate device clock we cannot properly compare log messages across multiple network devices.  We need to know if the log message on this router happened at the same time as that other log message on that switch, but if the clocks are set to some random time, it is  difficult or impossible.

I proceeded to make my changes, including synchronizing the clocks to NTP and doing a few IOS upgrades.  Then I closed out the work order and moved on to other clients.  I thought I was done, but I sure wasn’t.

We started getting calls from the customer complaining that the T1 between the two offices wasn’t working right.  I came over and found that it had come back up, but soon they were calling again.  And again.  I had used up all the hours on our contract, and my VP was not keen on me providing services for free.  But our client insisted the problem began as a result of my work, and I had to fix it.  Nothing I had done was major, but I went back and reverted to the old IOS (in case of a bug) and reverted to saved configs (which I had kept.)

With the changes rolled back, the problem kept happening.  The LA office was not only losing its Internet connectivity, but was dealing with repeated voice outages.  The temperature of the client was hot.  I opened a TAC case and had an RMA sent out for the WIC, the card that the T1 connected to.  I replaced it and the problem persisted.  At this point I was insisting it could not have been my fault, since I rolled back the changes, but the customer didn’t see it that way and I don’t blame them.

The customer called up their SBC (now ATT) rep, basically a sales person, to complain as well.  He told her a consultant had been working on his network, and she asked what had been changed.  He said “the clock on the router” and she immediately flagged that as the problem.  Sadly, the rep mistook the router clock, which has no effect on operations, with the T1 clocking, which does.  I never touched the T1 clocking.  I knew the sales rep as she had been my sales rep years before at the San Francisco Chronicle, and I knew she was a non-technical sales person who had no idea what she was talking about.  Alas, she had planted the seeds in the customer’s mind that I had messed everything up by touching the clock.  I pleaded my two CCIE’s and two years of TAC experience to try to persuade this customer that the router clock has zero, nada, zilch to do with the T1, to no avail.

The customer then, being sick of us, hired another consultant who got on the phone with SBC.  It turns out there was an issue with the line encoding on the T1, which SBC fixed and the problem went away.  The new consultant looked like a hero, and the next we heard from the client was a letter from a lawyer.  They were demanding their money back.

It’s funny, I’ve never really had another charge of technical incompetence leveled at me.  In this case I hadn’t done anything wrong at all, but the telco messed up the T1 line around the same time as I made my changes.  So I guess in more than one way, you could say it was a matter of bad timing.

I have written more than once (here and here, for example) about my belief that technological progression cannot always be considered a good thing.  We are surrounded in the media by a form of technological optimism which I find disconcerting.  “Tech” will solve everything from world hunger to cancer, and the Peter Thiels of the world would have us believe that we can even solve the problem of death.  I don’t see a lot of movies these days, but there used to be a healthy skepticism of technological progress, which was seen as a potential threat to the human race.  Some movies that come to mind:

  • Demon Seed (1977), a movie I found profoundly disturbing when I was taken to see it as a child. I have no idea why anyone took a child to this movie, by the way.  In it, a scientist invents a powerful supercomputer and uses it to automate his house.  Eventually, the computer forms a prosthesis and uses it to inseminate the doctor’s wife to produce a hybrid human-computer being.
  • The Terminator (1984) presented a world in which humans were at war with computers and robots.
  • The Matrix (1999), a move I actually don’t like, nevertheless presents us with a world in which, again, computers rule humans, this time to the point where we have become fuel cells.

Most readers have certainly heard of the latter two, but I’m guessing almost none have heard of the first.  I could go on.  From West World (1973) to RoboCop (1987), there has been movie after movie presenting “tech” not as the key to human progress, but as a potential destroyer of the human race.  I suspect the advent of nuclear weapons had much to do with this view, and with the receding of the Cold War and the ever-present nuclear threat, maybe we are lest concerned about the destruction our inventions are capable of.

The other day I was thinking about my own resistance to Apple AirPods.  It then occurred to me that they are only one step away from the “Cerebral Communicator” in another movie you probably haven’t heard of, The President’s Analyst.

Produced in 1967, (spoilers follow!) the film features James Coburn as a psychoanalyst who is recruited to provide therapy to the President of the United States.  At first excited by the prospect, Coburn himself quickly becomes overwhelmed by the stress of his assignment, and decides to flee.  He is pursued by agents of the CIA and FBI (referred to as CEA and FBR in the movie), the KGB, and the mysterious organization called “TPC”.  Filmed in the psychedelic style of the time, with numerous double-crossings and double-double-crossings, the movie is hard to follow.  But, in the end, we learn that this mysterious agency called “TPC” is actually The Phone Company.

1967 was long before de-regulation, and there was a single phone company in the United States controlling all telephone communications.  It was quasi-governmental in size and scope, and thus a suitable villain for a movie like The President’s Analyst.  The ultimate goal of TPC is to implant mini-telephones into people’s brains.  From Wikipedia:

TPC has developed a “modern electronic miracle”, the Cerebrum Communicator (CC), a microelectronic device that can communicate wirelessly with any other CC in the world. Once implanted in the brain, the user need only think of the phone number of the person they wish to reach, and they are instantly connected, thus eliminating the need for The Phone Company’s massive and expensive-to-maintain wired infrastructure.

I’ve only seen the movie a couple times, but I wonder if the AirPods implanted in people’s ears remind me too much of the CC.  Already we have seen the remnants of the phone company pushing us to ever-more connectivity, to the point where our phones are with us constantly and we stick ear buds in our heads.  Tech companies love to tell us that being constantly connected to one another is the great path forward for humanity.  Meanwhile, we live in a time as divided as any in history.  If connecting humanity were the solution to the world’s problems, why do we seem to be in a state of bitter conflict?  I wonder if we’ve forgotten the lesson of the Babel Fish in Douglas Adams’ science fiction book, Hitchhiker’s Guide to the Galaxy.  The Babel fish is a convenient device for Adams to explain a dilemma in all science fiction:  how do people from different planets somehow understand each other?  In most science fiction, the aliens just speak English (or whatever) and we never come to know how they could have learned it.  But Adams uses his fictional device to make an amusing point:

The Babel fish is small, yellow, leech-like, and probably the oddest thing in the Universe…if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish…[T]he poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation.

 

In 1998 I left my job as a computer “consultant” to pursue a master’s degree in Telecommunications Management.  I was stuck in my job, tired of troubleshooting people’s email clients and installing Word on their desktops, and was looking for a way to make a leap into bigger and better things.  That did happen–although not how I expected–but meanwhile for two years I needed to support myself while achieving my degree.  I took the easy path and stole a client from my previous employer.  This was at the height of the dotcom boom, and he was frankly too busy to even notice.  For a couple years I worked part-time at my advertising agency client, setting up computers, managing the servers, running the fairly simple network they had implemented.  It was a good deal for both of us, as the office manager had responsibility for IT and little inclination to work on technology.

Anyone who was around back then will remember the “Y2K” scare.  As we approached the year 2000, someone realized that many computers had been storing the year part of dates with only the last two digits.  For example, a program would store “98” instead of “1998.”  This meant that, as we moved into the new millennium, the year 2001 would be interpreted by these systems to be “1901”.  A legitimate problem, to be sure.  Some software, such as banking programs running on mainframes, could act unexpectedly and potentially lead to serious consequences.

However, the panic that followed was completely out of proportion to the threat.  Instead of focusing on the limited systems that would likely experience problems, a paranoia built up fueled by newspeople who had no idea what they were talking about.  People became concerned that, on January 1, 2000, our entire society would melt down and we would descend into chaos as the financial system melted down, stoplights stopped working, water and power systems crashed, and millions prepared to regress to living like cavemen.  A brilliant ad from Nike from just before the millennium captured the overall belief of what would happen come New Year’s Day.  (Sadly, the ad looks more realistic for 2020 than it did for 2000).

The ad agency I was working was told by their parent company that every single piece of electronic equipment they had needed to be Y2K-certified.  The downside of capitalism is that armies of opportunists arise (sound familiar?) to take advantage of social panics like these.  In fact, for such opportunists, exacerbating the panic is in their best interests.  Thus were spawned legions of “Y2K consultants”, non-technical business-types telling us that even our smoke alarms needed to be Y2K-certified and receive a literal sticker of approval.

When my office manager boss told me I needed to certify every piece of equipment I controlled my response was as follows:  “It doesn’t matter whether a network switch has the Y2K problem or not.  Most of the network gear was built recently and doesn’t even have this issue.  But if the vendor did make the mistake of storing the date with two digits instead of four, the worst that will happen is the time stamp on the log will be off.  Everything will still work.  So, rather than bill out hours and hours of me doing the certification, I’m going to do nothing.  And if I’m wrong, you can personally sue me for negligence.”

My boss didn’t care much about corporate HQ and took my advice.  New Year’s Eve, 2000, came and went.  A lot of people, expecting the worst, stayed home.  I went out and partied, well…like it was 1999.  And nothing happened.  Of course, the Y2k consultants patted themselves on the back for a job well done.  If the army of MBA’s hadn’t saved the day with their stickers, life would have ground to a halt.  I, on the other hand, knew the reality.  A panic was whipped up by the media, a bunch of opportunists swooped in to make a buck of the crisis, and helped to whip it up further, and life would have gone on either way.

I’ve been revising my Cisco Live session on IOS XE programmability, and it’s made me think about programming in general, and a particular idea I’ve been embarrassed to admit I loathe: Object Oriented Programming.

Some context:  I started programming on the Apple II+ in BASIC, which shows my age.  Back then programs were input with line numbers and program control was quite simple, consisting of GOTO and GOSUB statements that jumped around the lines of code.  So, you might have something that looked like this:

10 INPUT "Would you like to [C]opy a File or [D]elete a file?"; A$
20 IF A$ = "C" THEN GOTO 100
30 IF A$ = "D" THEN GOTO 200

This was not really an elegant way to build programs, but it was fairly clear.  Given that code was entered directly into the DOS CLI with only line-by-line editing functionality, it could certainly get a bit confusing what happened and where when you had a lot of branches in your code.

In college I took one programming course in Pascal.  Pascal was similar in structure to C, just far more verbose.  Using it required a shift to procedural-style thinking, and while I was able to get a lot of code to work in Pascal, my professor was always dinging me for style mistakes.  I tended to revert to AppleSoft BASIC style coding, using global variables and not breaking things down into procedures/functions enough.  (In Pascal, a procedure is simply a function that returns no value.)  Over time I got used to the new way of thinking, and grew to appreciate it.  BASIC just couldn’t scale, but Pascal could.  I picked up C after college for fun, and then attempted C++, the object-oriented version of C.  I found I had an intense dislike for shifting my programming paradigm yet again, and I felt that the code produced by Object Oriented Programming (OOP) was confusing and hard to read.  At that point I was a full-time network engineer and left programming behind.

When I returned to Cisco in 2015, I was assigned to programmability and had to learn the basics of coding again.  What I teach is really basic coding, scripting really, and I would have no idea how to build or contribute to a large project like many of those being done here at Cisco.  I picked up Python, and I generally like it for scripting.  However, Python has OO functionality, and I was once again annoyed by confusing OO code.

In case you’re not familiar with OOP, here’s how it works.  Instead of writing down (quite intuitively) your program according to the operations it needs to perform, you create objects that have operations associated with them according to the type of object.  An example in C-like pseudocode can help clarify:

Procedural:

rectangle_type: my_rect(20, 10)
print get_area(my_rect)

OOP:

rectangle_type: my_rect(20, 10)
print my_rect.get_area()

Note that in OOP, we create an object of type “rectangle_type” and then is has certain attributes associated with it, including functions.  In the procedural example, we just create a variable and pass it to a function which is not in any way associated to the variable we created.

The problem is, this is counter-intuitive.  Procedural programming follows a logical and clear flow.  It’s easy to read.  It doesn’t have complex class inheritance issues.  It’s just far easier to work with.

I was always ashamed to say this, as I’m more of a scripter than a coder, and a dabbler in the world of programming.  But recently I came across a collection of quotes from people who really do know what they are talking about, and I see many people agree with me.

How should a network engineer, programming neophyte approach this?  My advice is this:  Learn functional/procedural style programming first.  Avoid any course that moves quickly into OOP.

That said, you’ll unfortunately need to learn the fundamentals of OOP.  That’s because many, if not most Python libraries you’ll be using are object oriented.  Even built in Python data-types like strings have a number of OO functions associated with them.  (Want to make a string called “name” lower-case?  Call “name.lower()”) You’ll at least need to understand how to invoke a function associate with a particular object class.

Meanwhile I’ve been programming in AppleSoft quite a bit in my Apple II emulator, and the GOTO’s are so refreshing!

I’ve mentioned before that, despite being on the Routing Protocols team, I spent a lot of time handling crash cases in TAC.  At the time, my queue was just a dumping ground for cases that didn’t fit into any other bucket in the High Touch structure.  Backbone TAC had a much more granular division of teams, including a team entirely dedicated to crash.  But in HTTS, we did it all.

Some crashes are minor, like a (back then) 2600-series router reloading due to a bus error.  Some were catastrophic, particularly crashes on large chassis-type routing systems in service provider networks.  These could have hundreds of interfaces, and with sub-interfaces, potentially thousands of customers affected by a single outage.  Chassis platforms vary in their architecture, but many of the platforms we ran at the time used a distributed architecture in which the individual line cards ran a subset of IOS.  Thus, unlike a 2600 which had “dumb” WIC cards for interface connections, on chassis systems line cards themselves could crash in addition to the route processors.  Oftentimes, when a line card crashed, the effect would cascade through the box, with multiple line cards crashing, which would result in a massive meltdown.

The 7500 was particularly prone to these.  A workhorse of Cisco’s early product line, the 7500 line cards ran IOS but forwarded packets between each other by placing them into special queues on the route processor.  This was quite unlike later products, such as the Gigabit Switch Router (GSR), which had a fabric architecture enabling line cards to communicate directly.  On the 7500, oftentimes a line card having a problem would write bad data into the shared queues, which the subsequent line cards would read and then crash, causing a cascading failure.

One of our big customers, a Latin American telecommunications company I’ll call LatCom, was a heavy user of 7500’s.  They were a constant source of painful cases, and for some reason had a habit of opening P1 cases on Fridays at 5:55pm.  Back then HTTS day-shift engineers’ shifts ended at 6pm, at which point the night shift took over, but once we accepted a P1 or P2 case, unlike backbone TAC, we had to work it until resolution.  LatCom drove us nuts.  Five minutes was the difference between going home for the weekend and potentially being stuck on the phone until 10pm on a Friday night.  The fact that LatCom’s engineers barely spoke English also proved a challenge and drew out the cases–occasionally we had to work through non-technical translators, and getting them to render “there was a CEF bug causing bad data to be placed into the queue on the RP” into Spanish was problematic.

After years of nightmare 7500 crashes, LatCom finally did what we asked:  they dropped a lot of money to upgrade their routers to GSRs with PRPs, at that time our most modern box.  All the HTTS RP engineers breathed a sigh of relief knowing that the days of nightmare cascading line card failures on 7500’s were coming to an end.  We never had a seen a single case of such a failure on a GSR.

That said, we knew that if anything bad was going to happen, it would happen to these guys.  And sure enough, one day I got a case with…you guessed it, a massive cascading line card failure on a GSR!  The first one I had seen.  In the case notes I described the failure as follows:

  1. Six POS (Packet over Sonet) interfaces went down at once
  2. Fifteen seconds later, slots 1 and 15 started showing CPUHOG messages followed by tracebacks
  3. Everything stabilized until a few hours later, when the POS interfaces go down again
  4. Then, line cards in slots 0, 9, 10, 11, and 13 crashed
  5. Fifteen seconds later, line cards in slots 6 and 2 crash
  6. And so forth

My notes said: “basically we had a meltdown of the box.”  To make matters worse, 4 days later they had an identical crash on another GSR!

When faced with a this sort of mess, TAC agents usually would send the details to an internal mailer, which is exactly what I did.  The usual attempt by some on the mailer to throw hardware at the problem didn’t go far as we saw the exact same crash on another router.  This seemed to be a CEF bug.

Re-reading the rather extensive case notes bring up a lot of pain.  Because the customer had just spent millions of dollars to replace their routers with a new platform that, we assured them, would not be susceptible to the same problem, this went all the way to their top execs and ours.  We were under tremendous pressure to find a solution, and frankly, we all felt bad because we were sure the new platform would be an end to their problems.

There are several ways for a TAC engineer to get rid of a case:  resolve the problem, tell the customer it is not reproducible, wait for it to get re-queued to another engineer.  But after two long years at TAC, two years of constant pressure, a relentless stream of cases, angry customers, and problem after problem, my “dream job” at Cisco was taking a toll.  When my old friend Mike, who had hired me at the San Francisco Chronicle, my first network engineering job, called me and asked me to join him at a gold partner, the call wasn’t hard to make.  And so I took the easiest route to getting rid of cases, a lot of them all at once, and quit.  LatCom would be someone else’s problem.  My newest boss, the fifth in two years, looked at me with disappointment when I gave him my two weeks notice.

I can see the case notes now that I work at Cisco again, and they solved the case, as TAC does.  A bug was filed and the problem fixed.  Still, I can tell you how much of a relief it was to turn in my badge and walk out of Cisco for what I wrongly thought would be the last time.  I felt, in many ways, like a failure in TAC, but at my going away party, our top routing protocols engineer scoffed at my choice to leave.  “Cisco needs good engineers,” he said.  “I could have gotten you any job you wanted here!”  True or not, it was a nice comment to hear.

I started writing these TAC tales back in 2013, when I still worked at Juniper.  I didn’t expect they’d attract much interest, but they’ve been one of the most consistently popular features of this blog. I’ve cranked out 20 of these covering a number of subjects, but I’m afraid my reservoir of stories is running dry.  I’ve decided that number 20 will be the last TAC Tale on my blog.  There are plenty of other stories to tell, of course, but I’m finished with TAC, as I was back in 2007.  My two years in TAC were some of the hardest in my career, but also incredibly rewarding.  I have so much respect for my fellow TAC engineers, past, present, and future, who take on these complex problems without fear, and find answers for our customers.