Perspectives

I have to give AWS credit for posting a fairly detailed technical description of the cause of their recent outage.  Many companies rely on crisis PR people to phrase vague and uninformative announcements that do little to inform customers and put their minds at ease.  I must admit, having read the AWS post-mortem a couple times, I don’t fully understand what happened, but it seems my previous article on automation running wild was not far off.  Of course, the point of the article was not to criticize automation.  An operation the size of AWS would be simply impossible without it.  The point was to illustrate the unintended consequences of automation systems.  As a pilot and aviation buff, I can think of several examples of airplanes crashing due to out-of-control automation as well.

AWS tells us that “an automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network.”  What’s interesting here is that the automation event was not itself a provisioning of network devices.  Rather, the capacity increase caused “a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network…”  This is just the old problem of overwhelming link capacity.  I remember one time, when I was at Juniper, and a lab device started sending a flood of traffic to the Internet, crushing the Internet-facing firewalls.  It’s nice to know that an operation like Amazon faces the same challenges.  At the end of the day, bandwidth is finite, and enough traffic will ruin any network engineer’s day.

“This congestion immediately impacted the availability of real-time monitoring data for our internal operations teams, which impaired their ability to find the source of congestion and resolve it.”  This is the age-old problem, isn’t it?  Monitoring our networks requires network connectivity.  How else do we get logs, telemetry, traps, and other information from our devices?  And yet, when our network is down, we can’t get this data.  Most large-scale customers do maintain a separate out-of-band network just for monitoring.  I would assume Amazon does the same, but perhaps somehow this got crushed too?  Or perhaps what they refer to as their “internal network” was the OOB network?  I can’t tell from the post.

“Operators continued working on a set of remediation actions to reduce congestion on the internal network including identifying the top sources of traffic to isolate to dedicated network devices, disabling some heavy network traffic services, and bringing additional networking capacity online. This progressed slowly…”  I don’t want to take pleasure in others’ pain, but this makes me smile.  I’ve spent years telling networking engineers that no matter how good their tooling, they are still needed, and they need to keep their skills sharp.  Here is Amazon, with presumably the best automation and monitoring capabilities of any network operator, and they were trying to figure out top talkers and shut them down.  This reminds me of the first broadcast storm I faced, in the mid-1990’s.  I had to walk around the office unplugging things until I found the source.  Hopefully it wasn’t that bad for AWS!

Outages happen, and Amazon has maintained a high-level of service with AWS since the beginning.  The resiliancy of such a complex environment should be astounding to anyone who has built and managed complex systems.  Still, at the end of the day, no matter how much you automate (and you should), no matter how much you assure (and you should), sometimes you have to dust off the packet sniffer and figure out what’s actually going down the wire.  For network engineers, that should be a reminder that you’re still relevant in a software-defined world.

As I write this, a number of sites out on the Internet are down because of an outage at Amazon Web Services.  Delta Airlines is suffering a major outage.  On a personal note, my wife’s favorite radio app and my Lutron lighting system are not operating correctly.  Of course, this outage is a reminder of the simple principle of not putting one’s eggs in a single basket.  AWS became the dominant web provider early on, but there are multiple viable alternatives now.  Long before the modern cloud emerged, I regularly ran disaster recovery exercises to ensure business continuity when a data center or service provider failed.  Everyone who uses a cloud provider better have a backup, and you better figure out a way to periodically test that backup.  A few startups have emerged to make this easier.

While the cause of the outage is yet unknown, there was an interesting comment in an Newsweek article on the outage.  Doug Madory, director of internet analysis an Kentik Inc, said:  “More and more these outages end up being the product of automation and centralization of administration…”  I’ve been involved in automation in some form or another for my entire six years at Cisco, and one aspect of automation is not talked about enough:  automation gone wild.  Let me give a non-computer example.

Back when I worked at the San Francisco Chronicle, the production department installed a new machine in our Union City printing plant.  The Sunday paper, back then, had a large number of inserts with advertisements and circulars that needed to be stuffed into the paper.  They were doing this manually, if you can believe it.

The new machine had several components.  One part of the process involved grabbing the inserts and carrying them in a conveyor system high above the plant floor, before dropping them down into the inserter.  It’s hard to visualize, so I’ve included a picture of a similar machine.

You can see the inserts coming in via the conveyor, hanging vertically.  This conveyor extended quite far.  One day I was in the plant, working on some networking thing or other, and the insert machine was running.  I looked back and saw the conveyor glitch somehow, and then a giant ball of paper started to form in the corner of the room, before finally exploding and raining paper down on the floor of the plant.  There was a commotion and one of the workers had to shut the machine down.

The point is, automation is great until it doesn’t work.  When it fails, it fails big.  You don’t just get a single problem, but a compounding problem.  It wasn’t just a single insert that got hit by the glitch, but dozens of them, if not more.  When you use manual processes, failures are contained.

Let’s tie this back to networking.  Say you need to configure hundreds of devices with some new code, perhaps adding a new routing protocol.  If you do it by hand in one device, and suddenly routes start dropping out of the routing table, chances are you won’t proceed with the other devices.  You’ll check your config to see what happened and why.  But if you set up, say, a Python script to run around and do this via NETCONF to 100 devices, suddenly you might have a massive outage on your hands.  The same could happen using a tool like Ansible, or even a vendor network management platform.

There are ways to combat this problem, of course.  Automated checks and validation after changes is an important one, but the problem with this approach is you cannot predict every failure.  If you program 10 checks, it’s going to fail in way #11, and you’re out of luck.

As I said, I’ve spent years promoting automation.  You simply couldn’t build a network like Amazon’s without it.  And it’s critical for network engineers to continue developing skills in this area.  We, as vendors and promoters of automation tools, need to be careful how we build and sell these tools to limit customer risk.

Eventually they got the inserter running again.  Whatever the cause of Amazon’s outage, let’s hope it’s not automation gone wild.

I have written more than once (here and here, for example) about my belief that technological progression cannot always be considered a good thing.  We are surrounded in the media by a form of technological optimism which I find disconcerting.  “Tech” will solve everything from world hunger to cancer, and the Peter Thiels of the world would have us believe that we can even solve the problem of death.  I don’t see a lot of movies these days, but there used to be a healthy skepticism of technological progress, which was seen as a potential threat to the human race.  Some movies that come to mind:

  • Demon Seed (1977), a movie I found profoundly disturbing when I was taken to see it as a child. I have no idea why anyone took a child to this movie, by the way.  In it, a scientist invents a powerful supercomputer and uses it to automate his house.  Eventually, the computer forms a prosthesis and uses it to inseminate the doctor’s wife to produce a hybrid human-computer being.
  • The Terminator (1984) presented a world in which humans were at war with computers and robots.
  • The Matrix (1999), a move I actually don’t like, nevertheless presents us with a world in which, again, computers rule humans, this time to the point where we have become fuel cells.

Most readers have certainly heard of the latter two, but I’m guessing almost none have heard of the first.  I could go on.  From West World (1973) to RoboCop (1987), there has been movie after movie presenting “tech” not as the key to human progress, but as a potential destroyer of the human race.  I suspect the advent of nuclear weapons had much to do with this view, and with the receding of the Cold War and the ever-present nuclear threat, maybe we are lest concerned about the destruction our inventions are capable of.

The other day I was thinking about my own resistance to Apple AirPods.  It then occurred to me that they are only one step away from the “Cerebral Communicator” in another movie you probably haven’t heard of, The President’s Analyst.

Produced in 1967, (spoilers follow!) the film features James Coburn as a psychoanalyst who is recruited to provide therapy to the President of the United States.  At first excited by the prospect, Coburn himself quickly becomes overwhelmed by the stress of his assignment, and decides to flee.  He is pursued by agents of the CIA and FBI (referred to as CEA and FBR in the movie), the KGB, and the mysterious organization called “TPC”.  Filmed in the psychedelic style of the time, with numerous double-crossings and double-double-crossings, the movie is hard to follow.  But, in the end, we learn that this mysterious agency called “TPC” is actually The Phone Company.

1967 was long before de-regulation, and there was a single phone company in the United States controlling all telephone communications.  It was quasi-governmental in size and scope, and thus a suitable villain for a movie like The President’s Analyst.  The ultimate goal of TPC is to implant mini-telephones into people’s brains.  From Wikipedia:

TPC has developed a “modern electronic miracle”, the Cerebrum Communicator (CC), a microelectronic device that can communicate wirelessly with any other CC in the world. Once implanted in the brain, the user need only think of the phone number of the person they wish to reach, and they are instantly connected, thus eliminating the need for The Phone Company’s massive and expensive-to-maintain wired infrastructure.

I’ve only seen the movie a couple times, but I wonder if the AirPods implanted in people’s ears remind me too much of the CC.  Already we have seen the remnants of the phone company pushing us to ever-more connectivity, to the point where our phones are with us constantly and we stick ear buds in our heads.  Tech companies love to tell us that being constantly connected to one another is the great path forward for humanity.  Meanwhile, we live in a time as divided as any in history.  If connecting humanity were the solution to the world’s problems, why do we seem to be in a state of bitter conflict?  I wonder if we’ve forgotten the lesson of the Babel Fish in Douglas Adams’ science fiction book, Hitchhiker’s Guide to the Galaxy.  The Babel fish is a convenient device for Adams to explain a dilemma in all science fiction:  how do people from different planets somehow understand each other?  In most science fiction, the aliens just speak English (or whatever) and we never come to know how they could have learned it.  But Adams uses his fictional device to make an amusing point:

The Babel fish is small, yellow, leech-like, and probably the oddest thing in the Universe…if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish…[T]he poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation.

 

I’ve been revising my Cisco Live session on IOS XE programmability, and it’s made me think about programming in general, and a particular idea I’ve been embarrassed to admit I loathe: Object Oriented Programming.

Some context:  I started programming on the Apple II+ in BASIC, which shows my age.  Back then programs were input with line numbers and program control was quite simple, consisting of GOTO and GOSUB statements that jumped around the lines of code.  So, you might have something that looked like this:

10 INPUT "Would you like to [C]opy a File or [D]elete a file?"; A$
20 IF A$ = "C" THEN GOTO 100
30 IF A$ = "D" THEN GOTO 200

This was not really an elegant way to build programs, but it was fairly clear.  Given that code was entered directly into the DOS CLI with only line-by-line editing functionality, it could certainly get a bit confusing what happened and where when you had a lot of branches in your code.

In college I took one programming course in Pascal.  Pascal was similar in structure to C, just far more verbose.  Using it required a shift to procedural-style thinking, and while I was able to get a lot of code to work in Pascal, my professor was always dinging me for style mistakes.  I tended to revert to AppleSoft BASIC style coding, using global variables and not breaking things down into procedures/functions enough.  (In Pascal, a procedure is simply a function that returns no value.)  Over time I got used to the new way of thinking, and grew to appreciate it.  BASIC just couldn’t scale, but Pascal could.  I picked up C after college for fun, and then attempted C++, the object-oriented version of C.  I found I had an intense dislike for shifting my programming paradigm yet again, and I felt that the code produced by Object Oriented Programming (OOP) was confusing and hard to read.  At that point I was a full-time network engineer and left programming behind.

When I returned to Cisco in 2015, I was assigned to programmability and had to learn the basics of coding again.  What I teach is really basic coding, scripting really, and I would have no idea how to build or contribute to a large project like many of those being done here at Cisco.  I picked up Python, and I generally like it for scripting.  However, Python has OO functionality, and I was once again annoyed by confusing OO code.

In case you’re not familiar with OOP, here’s how it works.  Instead of writing down (quite intuitively) your program according to the operations it needs to perform, you create objects that have operations associated with them according to the type of object.  An example in C-like pseudocode can help clarify:

Procedural:

rectangle_type: my_rect(20, 10)
print get_area(my_rect)

OOP:

rectangle_type: my_rect(20, 10)
print my_rect.get_area()

Note that in OOP, we create an object of type “rectangle_type” and then is has certain attributes associated with it, including functions.  In the procedural example, we just create a variable and pass it to a function which is not in any way associated to the variable we created.

The problem is, this is counter-intuitive.  Procedural programming follows a logical and clear flow.  It’s easy to read.  It doesn’t have complex class inheritance issues.  It’s just far easier to work with.

I was always ashamed to say this, as I’m more of a scripter than a coder, and a dabbler in the world of programming.  But recently I came across a collection of quotes from people who really do know what they are talking about, and I see many people agree with me.

How should a network engineer, programming neophyte approach this?  My advice is this:  Learn functional/procedural style programming first.  Avoid any course that moves quickly into OOP.

That said, you’ll unfortunately need to learn the fundamentals of OOP.  That’s because many, if not most Python libraries you’ll be using are object oriented.  Even built in Python data-types like strings have a number of OO functions associated with them.  (Want to make a string called “name” lower-case?  Call “name.lower()”) You’ll at least need to understand how to invoke a function associate with a particular object class.

Meanwhile I’ve been programming in AppleSoft quite a bit in my Apple II emulator, and the GOTO’s are so refreshing!

“Progress might have been alright once, but it has gone on too long.”
–  Ogden Nash

The book The Innovator’s Dilemma appears on the desk of a lot of Silicon Valley executives.  Its author, Clayton Christiensen, is famous for having coined the term “disruptive innovation.”  The term has always bothered me, and I keep waiting for the word “disruption” to die a quiet death.  I have the disadvantage of having studied Latin quite a bit.  The word “disrupt” comes from the Latin verb rumperewhich means to “break up”, “tear”, “rend”, “break into pieces.”  The word, as does our English derivative, connotes something quite bad.  If you think “disruption” is good, what would you think if I disrupted a presentation you were giving?  What if I disrupted the electrical system of your heart?

Side note:  I’m fascinated with the tendency of modern English to use “bad” words to connote something good.  In the 1980’s the word “bad” actually came to mean its opposite.  “Wow, that dude is really bad!” meant he was good.  Cool people use the word “sick” in this way.  “That’s a sick chopper” does not mean the motorcycle is broken.

The point, then, of disruption is to break up something that already exists, and this is what lies beneath the b-school usage of it.  If you innovate, in a disruptive way, then you are destroying something that came before you–an industry, a way of working, a technology.  We instantly assume this is a good thing, but what if it’s not?  Beneath any industry, way of working, or technology are people, and disruption is disruption of them, personally.

The word “innovate” also has a Latin root.  It comes from the word novus, which means “new”.  In industry in general, but particularly the tech industry, we positively worship the “new”.  We are constantly told we have to always be innovating.  The second one technology is invented and gets established, we need to replace it.  Frame Relay gave way to MPLS, MPLS is giving way to SD-WAN, and now we’re told SD-WAN has to give way…  The life of a technology professional, trying to understand all of this, is like a man trying to walk on quicksand.  How do you progress when you cannot get a firm footing?

We seem to have forgotten that a journey is worthless unless you set out on it with an end in mind.  One cannot simply worship the “new” because it is new–this is self-referential pointlessness.  There has to be a goal, or an end–a purpose, beyond simply just cooking up new things every couple years.

Most tech people and b-school people have little philosophical education outside of, perhaps (and unfortunately) Atlas Shrugged.  Thus, some of them, realizing the pointlessness of endless innovation cycles, have cooked up ludicrous ideas about the purpose of it all.  Now we have transhumanists telling us we’ll merge our brains with computers and evolve into some sort of new God-species, without apparently realizing how ridiculous they sound.  COVID-19 should disabuse us of any notion that we’re not actually human beings, constrained by human limitations.

On a practical level, the furious pace of innovation, or at least what is passed off as such, has made the careers of technology people challenging.  Lawyers and accountants can master their profession and then worry only about incremental changes.  New laws are passed every year, but fundamentally the practice of their profession remains the same.  For us, however, we seem to face radical disruption every couple of years.  Suddenly, our knowledge is out-of-date.  Technologies and techniques we understood well are yesterday’s news, and we have to re-invent ourselves yet again.

The innovation imperative is driven by several factors:  Wall Street constantly pushes public companies to “grow”, thus disparaging companies that simply figure out how to do something and do it well.  Companies are pressured into expanding to new industries, or into expanding their share of existing industries, and hence need to come up with ways to differentiate themselves.  On an individual level, many technologists are enamored of innovation, and constantly seek to invent things for personal satisfaction or for professional gain.  Wall Street seems to have forgotten the natural law of growth.  Name one thing in nature that can grow forever.  Trees, animals, stars…nothing can keep growing indefinitely.  Why should a company be any different?  Will Amazon simply take over every industry and then take over governing the planet?  Then what?

This may seem a strange article coming from a leader of a team in a tech company that is handling bleeding edge technologies.  And indeed it would seem to be a heresy for someone like me to say these things.  But I’m not calling for an end to inventing new products or technologies.  Having banged out CLI for thousands of hours, I can tell you that automating our networks is a good thing.  Overlays do make sense in that they can abstract complexity out of networks.  TrustSec/Scalable Group Tags are quite helpful, and something like this should have been in IP from the beginning.

What I am saying is that innovation needs a purpose other than just…innovation.  Executives need to stop waxing eloquent about “disrupting” this or that, or our future of fusing our brains with an AI Borg.  Wall Street needs to stop promoting growth at all costs.  And engineers need time to absorb and learn new things, so that they can be true professionals and not spend their time chasing ephemera.

Am I optimistic?  Well, it’s not in my nature, I’m afraid.  As I write this we are in the midst of the Coronavirus crisis.  I don’t know what the world will look like a year from now.  Business as usual, with COVID a forgotten memory?  Perhaps.  Great Depression due to economic shutdown?  Perhaps.  Total societal, governmental, and economic collapse, with rioting in the streets?  I hope not, but perhaps.  Whatever happens, I do hope we remember that word “novel”, as in “novel Coronavirus”, comes from the same Latin root as the word “innovation”.  New isn’t always the best.

Two things can almost go without saying:

  1. If you start a blog, you need to commit time to writing it.
  2. When you move up in the corporate world, time becomes a precious commodity.

When I started this blog several years ago, I was a network architect at Juniper with a fair amount of time on my hands.  Then I came to Cisco as a Principal TME, with a lot less time on my hands.  Then I took over a team of TMEs.  And now I have nearly 40 people reporting to me, and responsibility for technical marketing for Cisco’s entire enterprise software portfolio.  That includes ISE, Cisco DNA Center, SD-Access, SD-WAN (Viptela), and more.  With that kind of responsibility and that many people depending on me, writing TAC Tales becomes a lower priority.

In addition, when you advance in the corporate hierarchy, expressing your opinions freely becomes more dangerous.  What if I say something I shouldn’t?  Or, do I really want to bare my soul on a blog when an employee is reading it?  Might they be offended, or afraid I would post something about them?  Such concerns don’t exist when you’re an individual contributor, even at the director level, which I was.

I can take some comfort in the fact that this blog is not widely read.  The handful of people who stumble across it probably will not cause me problems at work.  And, as for baring my soul, well, my team knows I am transparent.  But time is not something I have much of these days, and I cannot sacrifice work obligations for personal fulfillment.  And that’s definitely what the blog is.  I do miss writing it.

Is this a goodbye piece?  By no means.  The blog will stay, and if I can eek out 10 minutes here or there to write or polish an old piece, I will.  Meanwhile, be warned about corporate ladder climbing–it has a way of chewing up your time.

I think it’s fair to say that all technical marketing engineers are excited for Cisco Live, and happy when it’s over.  Cisco Live is always a lot of fun–I heard one person say “it’s like a family reunion except I like everyone!”  It’s a great chance to see a lot of folks you don’t get to see very often, to discuss technology that you’re passionate about with other like minded people, to see and learn new things, and, for us TMEs, an opportunity to get up in front of a room full of hundreds of people and teach them something.  We all now wait anxiously for our scores, which are used to judge how well we did, and even whether we get invited back.

It always amazes me that it comes together at all.  In my last post, I mentioned all the work we do to pull together our sessions.  A lot of my TMEs did not do sessions, instead spending their Cisco Live on their feet at demo booths.  I’m also always amazed that World of Solutions comes together at all.  Here is a shot of what it looked like at 5:30 PM the night before it opened (at 10 AM.)  How the staff managed to clear out the garbage and get the booths together in that time I can’t imagine, but they did.

The WoS mess…

My boss, Carl Solder, got to do a demo in the main keynote.  There were something like 20,000 people in the room and the CEO was sitting there.  I think I would have been nervous, but Carl is ever-smooth and managed it without looking the least bit uncomfortable.

My boss (left) on the big stage!

The CCIE party was at the air and space museum, a great location for aviation lovers such as myself.  A highlight was seeing an actual Apollo capsule.  It seemed a lot smaller than I would have imagined.  I don’t think I would ever have gotten in that thing to go to the moon.  The party was also a great chance to see some of the legends of the CCIE program, such as Bruce Caslow, who wrote the fist major book on passing the CCIE exam, and Terry Slattery, the first person to actually pass it.

CCIE Party

I delivered two breakouts this year:  The CCIE in an SDN World, and Scripting the Catalyst.  The first one was a lot of fun because it was on Monday and the crowd was rowdy, but also because the changes to the program were just announced and folks were interested in knowing what was going on.  The second session was a bit more focused and deeper, but the audience was attentive and seemed to like it.  If you want to know what it feels like to be a Cisco Live presenter, see the photo below.

My view from the stage

I closed out my week with another interview with David Bombal, as well as the famous Network Chuck.  This was my first time meeting Chuck, who is a bit of a celebrity around Cisco Live and stands out because of his beard.  David and I had already done a two-part interview (part 1, part 2) when he was in San Jose visiting Cisco a couple months back.  We had a good chat about what is going on with the CCIE, and it should be out soon.

As I said, we love CL but we’re happy when it’s over.  This will be the first weekend in a long time I haven’t worked on CL slides.  I can relax, and then…Cisco Live Barcelona!

 

While I’m thinking about another TAC Tale, I’m quite busy working on slides for Cisco Live.  I figured this makes for another interesting “inside Cisco” post, since most people who have been to the show don’t know much about how it comes together.  A couple years back I asked a customer if I could schedule a meeting with him after Cisco Live, since I was working on slides.  “I thought the Cisco Live people made the slides and you just showed up and presented them!” he said.  Wow, I wish that was the case.  With hundreds of sessions I’m not sure how the CL team could accomplish that, but it would sure be nice for me.  Unfortunately, that’s not the case.

If you haven’t been, Cisco Live is a large trade show for network engineers which happens four times globally: in Europe, Australia, the US, and Mexico.  The US event is the largest, but Europe is rather large as well.  Australia and Mexico are smaller but still draw a good crowd.  The Europe and US shows move around.  The last two years Europe was in Barcelona, as it will be next year, but it was in Berlin two years before that.  The US show is in San Diego this year, was in Orlando last year, and was in Las Vegas for two years before that.  Australia is always in Melbourne, and Mexico is always in Cancun.  I went to Cisco Live US twice when I worked for a partner, and I’ve been to every event at least once since I’ve worked at Cisco as a TME.

The show has an number of attractions.  There is a large show floor with booths from Cisco and partners.  There are executive and celebrity keynotes.  The deepest content is delivered in sessions–labs, techtorials, and breakout sessions which can have between 20 and several hundred attendees.  The sessions are divided into different tracks:  collaboration, security, certification, routing and switching, etc., so attendees can focus on one or more areas.

Most CL sessions are delivered by technical marketing engineers like myself, who work in a business unit, day in and day out, with their given product.  As far as I know anyone in Cisco can submit a session, so some are delivered by people in sales, IT, CX (TAC or AS), and other organizations.  Some are even delivered by partners and customers.

Six months before a given event, a “call for papers” goes out.  I’m always amused that they pulled this term from academia, as the “papers” are mostly powerpoints and not exactly academic.  If you want to do a session, you need to figure out what you want to present and then write up an abstract, which contains not only the description, but also explains why the session is relevant to attendees, what they can hope to get out of it, and what the prerequisites are.  Each track has a group of technical experts who manage it, called “Session Group Managers”, or SGMs.  They come from anywhere in the business, but have the technical expertise to review the abstracts and sessions to ensure they are relevant and well-delivered.  For about a year, the SGM for the track I usually presented actually reported to me.  They have a tough job, because they receive a large number of applications for sessions, far more than the slots they have.  They look at the topic, quality of the abstract, quality of the speaker, available slots, and other factors in figuring out which sessions get the green light.

Once you have an approved session, you can start making slides.  Other than a standard template, there is not much guidance on how to build a deck for Cisco Live.  My old SGM liked to review each new presentation live, although some SGMs don’t.  Most of us end up making our slides quite close to the event, partly because we are busy, but also because we want to have the latest and most current info in our decks.  It’s actually hard to write up a session abstract six months before the event.  Things change rapidly in our industry, and often your original plan for a session gets derailed by changes in the product or organization.  More than once I’ve had a TME on my team presenting on a topic he is no longer working on!  One of my TMEs was presenting on Nexus switches several months after our team switched to Catalyst only.

At Cisco Live you may run into the “speaker ready room.”  It’s a space for speakers to work on slides, supplied by coffee and food, but there is also a small army of graphic design experts in there who will review the speakers’ slides one last time before they are presented. They won’t comment on your design choices, but simply review them to ensure they are consistent with the template formatting.  We’re required to submit our final deck 24 hours before our session, which gives the CL staff time to post the slides for the attendees.

Standing up in front  of a room full of engineers is never easy, especially when they are grading you.  If you rate in the top 10% of speakers, you win a “Distinguished Speaker” award.  If you score below 4.2 you need to take remedial speaker training.  If your score is low more than a couple times, the SGMs might ask you not to come back.  Customers pay a lot of money to come to CL and we don’t want them disappointed.  For a presenter, being scored, and the high stakes associated with the number you receive, makes a CL presentation even more stressful.  One thing I’ve had to accept is that some people just won’t like me.  I’ve won distinguished speaker before, but I’ve had some sessions with less-than-stellar comments too.

The stress aside, CL is one of the most rewarding things we do.  Most of the audience is friendly and wants to learn.It’s a fun event, and we make great contacts with others who are passionate about their field.  For my readers who are not Cisco TMEs (most I suspect), I hope you have a chance to experience Cisco Live at least once in your career.  Now you know the amount of work that goes into it.

I recently replied to a comment that I think warrants a full blog post.

I’ve been here at Cisco working on programmability for a few years.  Brian Turner wrote in to say, essentially:  Hang on!  I became a network engineer precisely because I don’t want to be a coder!  I tried programming and hated it!  Now you’re telling me to become a programmer!

As I said in my reply, I have a lot of sympathy for him.  It reminds me of a story.

Back when I was at Juniper, I met with the IT department’s head of automation to discuss using some of his tools for network automation.  Jeremy was an expert in all things Puppet and Ansible, and a rather enthusiastic promoter of these tools on the server/app side of the house.  He had also managed to get Puppet running on a Junos device.  I was meeting with him because, frankly, the wind seemed to be blowing in his direction.  That said, I did not share his enthusiasm.  He told me about a server guy he had worked with, Stephane.  When Jeremy proposed to Stephane that he should use automation tools to make his life easier, Stephane vehemently rejected the idea, and the meeting ended with Stephane banging his fists on the table and shouting “I am not a coder!”

Flash forward a couple years and Stephane ended up the head of automation for a major company.  Apparently he finally bought into the idea.

Frankly I had no desire to become a coder either.  When I interviewed at Cisco, most of my discussions were around the controllers I was working with at the time, data center fabrics, etc.  When I arrived, my new boss assigned me as his Principal TME for programmability.  I never claimed to be an expert in this area.  Two months later I was presenting to Tech Field Day, and experienced automation guys like Jason Edelman and Matt Oswalt on how to run Puppet on a Nexus switch.  Three years later and I’m known as a NETCONF/YANG guy.  I’d barely heard of them when I started.

As I replied to Brian, Cisco doesn’t want him or anyone to learn Python or YANG or whatever.  Think about it from my perspective in product management.  Implementing YANG models for all of IOS XE is a massive undertaking.  Engineering devoted a huge amount of effort to pull this off.  Huge.  Mandating YANG models for their ongoing development burns cycles.  Product marketing and engineering would never prioritize this unless we thought there was a high probability someone would use it.  In other words, we don’t want people to use it so much as customers want us to develop it.  We have demand for programmable interfaces for network devices, and hence we’ve delivered on it.   My job as a TME is not to push NETCONF/YANG on anyone, but to provide the enablement to make it easier for someone to use this technology if they themselves want to.

As I often say in my presentations, the why is important.  Why do some customers demand these interfaces?  Well, because they know Notepad is a horrible automation tool, and it’s what 90% of network engineers use.  If you want to configure 50 switches, you’re going to configure one, paste the config into Notepad, tweak a few values, and then paste it into the next switch.  Do this 48 more times and tell me if this is the best use of your time as a highly skilled network engineer.  You can write a script to do this and save yourself a lot of trouble.  Or use Ansible to do it.  Or Cisco DNAC.  Whatever you want.  But if you want any of these tools to work efficiently, you need a machine interface, which CLI is not.  If you don’t believe me, try writing a script to do regular expression-based parsing of CLI outputs.  It’s a lot easier with YANG.

The point is not for network engineers to become programmers.  The point is to add some tools to your toolbox to help you focus on what you do well.  One weekend spent with a Python course and one more weekend with a DevNet course on YANG will give you a tool you can use to make your life easier.  That’s it.  Some customers may take it a lot further, of course, and go way into CI/CD workflows and that’s fine.  If you want to do 95% of your work in CLI and write a few scripts to do the other 5%, that’s fine.  If you want to use Cisco DNAC to do almost everything, knock yourself out.  It’s about what works best for you, as a network engineer.

I often point out how lousy my code quality is.  I’m sometimes ashamed to show the code for some of the scripts I’ve written.  I’m not a coder!  That’s a point I often make.  I don’t want to be a full-time software developer.  I’m a network engineer.  So for Brian and all the other CCIE’s out there, keep doing what you do best, but don’t close yourself off to some additional tools that will make your life easier.

I was doing well on the blog for a few months but lately fell behind.  With (now) 12 people reporting to me, and three major areas of responsibility (SD-Access, Assurance, and Programmability), it’s not easy to find time to write up a blog post.   I have about five drafts needing work but I cannot seem to find the will to finish them.  Sometimes, however, it just takes a spark to get me going. That spark came in my inbox from Ivan Peplnjak.  I like Ivan’s blog posts, which, while often not favorable to Cisco, are nonetheless fair and balanced and raise some very important points.

“Why Is Every SDN Vendor Bashing Networking Engineers?” asks Ivan in the form email I received.  “[T]he vendors know they wouldn’t be able to sell their latest concoctions to people who actually understand how networking works and why some architectures have no chance of ever working in real life,” answers Ivan.  “The only way to sell the warez is to try to convince everyone else how to get rid of the pesky ossified CLI jockeys.”

Now I work for a vendor, and since I deal with the aforementioned products, I guess I am an SDN vendor.  That would seem to qualify me to speak on this subject.  (With, of course, the usual disclaimer that the opinions here are my own and do not represent Cisco officially.)

Selling Concoctions

I must admit, I do want to sell our products.  Everyone at Cisco should want our products to sell.  Just about all of us have a personal, financial stake in the matter, whether we have stock grants or ESPP.  We would be insane not to want people to buy our products.  I, and most of my co-workers, are driven by far more than finance, however.  We all want to know that our work means something, and that we are coming up with innovative solutions to problems.  Otherwise, why show up in the office every day?

We operate in a highly competitive environment, which means if we are not constantly innovating and coming up with better ways to do things, we will all suffer.  You can complain about the macroeconomic system, and believe me, I’m not a Randian, objectivist believer in unbridled capitalism.  But, at the end of the day, a public company needs to create the perception of future value in the eyes of the stock market, and that’s a motivating factor for all of us.

These things being said, I’ve been in product management for a few years now and I have never heard anyone, ever, talk about trying to put one over on our customers.  I’m not saying that’s what Ivan means here, but it’s an accusation I’ve heard before.  In the first place, our customers are network engineers who are quite smart.  If ever I’ve presented to my customer and was not crystal clear on what I was talking about and what advantage it would bring the customer, they will let me know it.  We’re constantly trying to find ways to do things better and make our customers’ lives easier.  As somebody who worked in IT for more years than product management, I’m very interested in this subject.  There were a lot of things that were frustrating and I want to fix things that used to annoy me.  You can argue about whether we’ve come up with the right ideas, but I hope nobody questions our motivations.

CLI Jockeys

Do I bash CLI jockeys in order to sell my products?  I should hope not, given that most of my customers are CLI jockeys, as I am myself!  I have two CCIEs and a JNCIE.  I spent a couple years in routing protocols TAC and many years in IT.  I spent a long time learning my trade and I have a lot of respect for those who have put the time and effort into learning it as well.  It’s not easy.

However, I don’t operate under the delusion that network engineers do a good job of configuring and managing CLI.  When I was at Juniper, I had designed a new NGMVPN system for our WAN.  I handed it off to the implementation team with some sample configs and asked them to come back to me with their plan.  I think we were touching about 20 devices the first go around.  The engineer came back with 20 Word documents.  He took my sample config and copied and pasted it into Word, and then modified the config in a separate Word doc for each CE/PE he was touching.  CLI itself isn’t a problem, but how we manage it.  This is where programmability and automation tools come in.  At the very least Ansible templating would have made this easier.  Software-Defined Networking (a very loose term, for what it’s worth), is not about replacing ossified CLI jockeys but getting them to focus on what they should be doing (network engineering) and avoiding what they should not (pasting stuff in Word docs.)

SD-Access takes this quite a bit further than Ansible, NETCONF, and other device-level tools.  Rather than saying “I want this device to be a LISP MS/MR” and so forth, you just say “I want this device to be a control plane node” and the system figures out what you need.  Theoretically we could change from LISP to some other protocol and the end-user shouldn’t even notice.  The idea here is somewhat like a fly-by-wire system.  When a pilot operated the controls of an airplane, they used to be directly coupled to the control surfaces via hydraulics.  Now, the pilot is operating what is essentially a joystick, providing control inputs to a computer, which then computes the best way to move the control surfaces given the conditions.  This is then relayed to servo motors in the wings, tail, etc.  The complexity of a fly-by-wire system is much higher than an old hydraulic system, but the complexity is hidden from the pilot in order to provide a better experience.  Likewise, with SD-Access, we’ve made the details more complex in order to deliver a better experience (TrustSec, layer 3 routed backbone, etc.) while hiding the complexity from the user.  It’s a different approach, for sure, but the idea is to allow engineers to focus on the right problems, like how to design their network, and not worry so much about configuration.

A New Era?

I’ve written extensively (see, for example, here, and here) about the role for CLI-jockey network engineers in the future.  When airplanes switched from the old dials and gauges to sleek, modern computerized (glass) cockpits, I’m sure some old timers threw up their hands, retired, and got their old Piper Super Cubs out of the hanger to do some “real” flying.  But most adapted, and in the end, saw how the new automation systems helped them do their jobs better.  That’s an era I’m looking forward to.  And as I always, always say, the pilots who fly the new cockpits still need to understand weather systems, engines, navigation, etc.  We still need network engineers who know how networks operate.

Meanwhile, I won’t bash any CLI jockeys and I hope nobody else here does either.