CCIE Enterprise Infrastructure

There were quite a few big announcements at Cisco Live this year.  One of the big ones was the overhaul of the certification program.  A number of new certifications were introduced (such as the DevNet CCNA/CCNP), and the existing ones were overhauled.  I wanted to do a post about this because I was involved with the certification program for quite a while on launching these.  I’m posting this on my personal blog, so my thoughts here are, of course, personal and not official.

First, the history.  Back when I was at Juniper, I had the opportunity to write questions for the service provider written exams.  It was a great experience, and I got thorough training from the cert program on how to properly write exam questions.  I don’t really remember how I got invited to do it, but it was a good opportunity, as a certified (certifiable?) individual, to give back to the program.  When I came to Cisco, I quickly connected with the cert program here, offering my services as a question writer. I had the training from Juniper, and was an active CCIE working on programmability.  It was a perfect fit, and a nice chance to recertify without taking the test, as writing/reviewing questions gets your CCIE renewed.

As I was managing a team within the business unit that was working on Software-Defined Access and programmability, it seemed logical for me to talk to the program about including those topics on the test.  I can assure you there was a lot of internal debate about this, as the CCIE exam is notoriously complex, and the point of our Intent-Based Networking products is simplicity.  One product manager even suggested a separate CCIE track for SD-Access, an idea I rejected immediately for that very reason.

Still, as I often point out here and elsewhere, SDN technologies do not mitigate the need for network engineers.  SDN products, all SDN products, are complex precisely because they are automated.  Automation enables us to build more complex things, in general.  You wouldn’t want to configure all the components of SD-Access by hand.  Still, we need engineers who understand what the automation tools are doing, and how to work with all the components which comprise a complex solution like SD-Access.  Network engineers aren’t going to disappear.

For this reason, we wanted SD-Access, SDWAN, and also device programmability (NETCONF/YANG, for example) to be on the lab.  We want to have engineers who know and understand these technologies, and the certification program is a fantastic way to help people to learn them.  I, and some members of my team, spent several months working with the CCIE program to build a new blueprint, which became the CCIE Enterprise Infrastructure.  The storied CCIE Routing and Switching will be no more.

At the end of the day, the CCIE exam has always adapted to changed in networking.  The R/S exam no longer has ISDN or IPX on it, nor should it.  Customers are looking for more automated solutions, and the exam is keeping pace.  If you’re studying for this exam, the new blueprint may be intimidating.  That said, CCIE exams have always been intimidating.  But think about this:  if you pass this exam, your resume will have skills on it that will make you incredibly marketable.

The new CCIE-EI (we always abbreviate stuff, right?) breaks down like this:

  • 60% is classic networking, the core routing protocols we all know and love.
  • 25% is SDx:  SD-Access and SD-WAN, primarily.
  • 15% is programmability.  NETCONF/YANG, controller APIs, Ansible, etc.

How do you study for this?  Like you study for anything.  Read about it and lab it.  There is quite a bit of material out there on all these subjects, but let me make some suggestions:

Programmability

You are not expected to be a programming expert for this section of the exam.  It’s not about seeing if you can write complex programs, but whether you know the basics well enough to execute some tasks via script/Ansible/etc instead of CLI.  DevNet is replete with examples of how to send NETCONF messages, or read data off a router or switch with programmable interfaces.  Download them, play with them, spend some time learning the fundamentals of Python, and relax about it.

  • Learn:  DevNet is a phenomenal resource.  Hank Preston, an evangelist for DevNet, has put out a wealth of material on programmability.  In addition, there is the book on IOS XE programmability I wrote with some colleagues.
  • Lab:  You can lab programmability stuff easily on your laptop.  Python and ncclient are free, as is Ansible.  If you have any sort of lab setup already, all you need to do is set up a Linux VM or install some tools on to your laptop.

Software-Defined

This is, as I said before, a tough one to test on.  After all, to add a device to an SD-Access fabric, you select it and click “Add to Fabric.”  What’s there to test?  Well, since these are new products you of course need to understand the components of SD-Access/SDWAN and how they interoperate.  How does policy work?  How do fabric domains talk to non-fabric domains?  There is plenty to study here.

  • Learn:  Again, we’ve written books on SD-Access and SD-WAN.  Also, we are moving a lot of documentation into Cisco Communities.
  • Lab:  Well, this is harder.  We’re working on getting SD-Access into the hands of learning partners, so you’ll have a place to get your hands on it.  We’re also working on virtualizing SD-Access as much as possible, to make it easier for people to run in labs.  I don’t have a timeframe on the latter, but hopefully the former we can do soon.

These are huge but exciting changes. I’ve been very lucky to have landed at a job where I am at the forefront of the changes in the industry, but this new exam will give others the opportunity to move themselves in that direction as well.  Happy labbing!

Cisco Live is over! Long Live Cisco Live!

I think it’s fair to say that all technical marketing engineers are excited for Cisco Live, and happy when it’s over.  Cisco Live is always a lot of fun–I heard one person say “it’s like a family reunion except I like everyone!”  It’s a great chance to see a lot of folks you don’t get to see very often, to discuss technology that you’re passionate about with other like minded people, to see and learn new things, and, for us TMEs, an opportunity to get up in front of a room full of hundreds of people and teach them something.  We all now wait anxiously for our scores, which are used to judge how well we did, and even whether we get invited back.

It always amazes me that it comes together at all.  In my last post, I mentioned all the work we do to pull together our sessions.  A lot of my TMEs did not do sessions, instead spending their Cisco Live on their feet at demo booths.  I’m also always amazed that World of Solutions comes together at all.  Here is a shot of what it looked like at 5:30 PM the night before it opened (at 10 AM.)  How the staff managed to clear out the garbage and get the booths together in that time I can’t imagine, but they did.

The WoS mess…

My boss, Carl Solder, got to do a demo in the main keynote.  There were something like 20,000 people in the room and the CEO was sitting there.  I think I would have been nervous, but Carl is ever-smooth and managed it without looking the least bit uncomfortable.

My boss (left) on the big stage!

The CCIE party was at the air and space museum, a great location for aviation lovers such as myself.  A highlight was seeing an actual Apollo capsule.  It seemed a lot smaller than I would have imagined.  I don’t think I would ever have gotten in that thing to go to the moon.  The party was also a great chance to see some of the legends of the CCIE program, such as Bruce Caslow, who wrote the fist major book on passing the CCIE exam, and Terry Slattery, the first person to actually pass it.

CCIE Party

I delivered two breakouts this year:  The CCIE in an SDN World, and Scripting the Catalyst.  The first one was a lot of fun because it was on Monday and the crowd was rowdy, but also because the changes to the program were just announced and folks were interested in knowing what was going on.  The second session was a bit more focused and deeper, but the audience was attentive and seemed to like it.  If you want to know what it feels like to be a Cisco Live presenter, see the photo below.

My view from the stage

I closed out my week with another interview with David Bombal, as well as the famous Network Chuck.  This was my first time meeting Chuck, who is a bit of a celebrity around Cisco Live and stands out because of his beard.  David and I had already done a two-part interview (part 1, part 2) when he was in San Jose visiting Cisco a couple months back.  We had a good chat about what is going on with the CCIE, and it should be out soon.

As I said, we love CL but we’re happy when it’s over.  This will be the first weekend in a long time I haven’t worked on CL slides.  I can relax, and then…Cisco Live Barcelona!

Inside Cisco Live

 

While I’m thinking about another TAC Tale, I’m quite busy working on slides for Cisco Live.  I figured this makes for another interesting “inside Cisco” post, since most people who have been to the show don’t know much about how it comes together.  A couple years back I asked a customer if I could schedule a meeting with him after Cisco Live, since I was working on slides.  “I thought the Cisco Live people made the slides and you just showed up and presented them!” he said.  Wow, I wish that was the case.  With hundreds of sessions I’m not sure how the CL team could accomplish that, but it would sure be nice for me.  Unfortunately, that’s not the case.

If you haven’t been, Cisco Live is a large trade show for network engineers which happens four times globally: in Europe, Australia, the US, and Mexico.  The US event is the largest, but Europe is rather large as well.  Australia and Mexico are smaller but still draw a good crowd.  The Europe and US shows move around.  The last two years Europe was in Barcelona, as it will be next year, but it was in Berlin two years before that.  The US show is in San Diego this year, was in Orlando last year, and was in Las Vegas for two years before that.  Australia is always in Melbourne, and Mexico is always in Cancun.  I went to Cisco Live US twice when I worked for a partner, and I’ve been to every event at least once since I’ve worked at Cisco as a TME.

The show has an number of attractions.  There is a large show floor with booths from Cisco and partners.  There are executive and celebrity keynotes.  The deepest content is delivered in sessions–labs, techtorials, and breakout sessions which can have between 20 and several hundred attendees.  The sessions are divided into different tracks:  collaboration, security, certification, routing and switching, etc., so attendees can focus on one or more areas.

Most CL sessions are delivered by technical marketing engineers like myself, who work in a business unit, day in and day out, with their given product.  As far as I know anyone in Cisco can submit a session, so some are delivered by people in sales, IT, CX (TAC or AS), and other organizations.  Some are even delivered by partners and customers.

Six months before a given event, a “call for papers” goes out.  I’m always amused that they pulled this term from academia, as the “papers” are mostly powerpoints and not exactly academic.  If you want to do a session, you need to figure out what you want to present and then write up an abstract, which contains not only the description, but also explains why the session is relevant to attendees, what they can hope to get out of it, and what the prerequisites are.  Each track has a group of technical experts who manage it, called “Session Group Managers”, or SGMs.  They come from anywhere in the business, but have the technical expertise to review the abstracts and sessions to ensure they are relevant and well-delivered.  For about a year, the SGM for the track I usually presented actually reported to me.  They have a tough job, because they receive a large number of applications for sessions, far more than the slots they have.  They look at the topic, quality of the abstract, quality of the speaker, available slots, and other factors in figuring out which sessions get the green light.

Once you have an approved session, you can start making slides.  Other than a standard template, there is not much guidance on how to build a deck for Cisco Live.  My old SGM liked to review each new presentation live, although some SGMs don’t.  Most of us end up making our slides quite close to the event, partly because we are busy, but also because we want to have the latest and most current info in our decks.  It’s actually hard to write up a session abstract six months before the event.  Things change rapidly in our industry, and often your original plan for a session gets derailed by changes in the product or organization.  More than once I’ve had a TME on my team presenting on a topic he is no longer working on!  One of my TMEs was presenting on Nexus switches several months after our team switched to Catalyst only.

At Cisco Live you may run into the “speaker ready room.”  It’s a space for speakers to work on slides, supplied by coffee and food, but there is also a small army of graphic design experts in there who will review the speakers’ slides one last time before they are presented. They won’t comment on your design choices, but simply review them to ensure they are consistent with the template formatting.  We’re required to submit our final deck 24 hours before our session, which gives the CL staff time to post the slides for the attendees.

Standing up in front  of a room full of engineers is never easy, especially when they are grading you.  If you rate in the top 10% of speakers, you win a “Distinguished Speaker” award.  If you score below 4.2 you need to take remedial speaker training.  If your score is low more than a couple times, the SGMs might ask you not to come back.  Customers pay a lot of money to come to CL and we don’t want them disappointed.  For a presenter, being scored, and the high stakes associated with the number you receive, makes a CL presentation even more stressful.  One thing I’ve had to accept is that some people just won’t like me.  I’ve won distinguished speaker before, but I’ve had some sessions with less-than-stellar comments too.

The stress aside, CL is one of the most rewarding things we do.  Most of the audience is friendly and wants to learn.It’s a fun event, and we make great contacts with others who are passionate about their field.  For my readers who are not Cisco TMEs (most I suspect), I hope you have a chance to experience Cisco Live at least once in your career.  Now you know the amount of work that goes into it.

What are we getting ourselves into?

It seems to be rank heresy for someone working in the valley to say it, but let me say it anyways.  I don’t agree with the axiom of the technology industry which states that all technological progress is always good.  Many in our society instinctively realize this, which is why they oppose genetic engineering and plastics.  Still, the technology industry is so persistently in love with itself, and so optimistic about its potential to solve every human problem, that when anyone points out the consequences of technological progress, we quickly respond with AI’s potential to solve the problems it’s bound to create.  (Sorry for the long sentence, but I’m going to quote Plato in this essay, and by Platonic standards that last sentence is short.)  AI is the solution to everything.  AI will unlock the mysteries of human existence.  AI will allow human beings to live forever.  AI will cure cancer.  AI will solve the dangers of, well, genetic engineering and plastics.

An example of this is the extraordinarily concerning essay in the Wall Street Journal a few weeks ago by computer scientist and 60’s icon Jerry Kaplan.  Dr. Kaplan reviews the recent accomplishments of functional brain imaging technologies, which are starting to become more precise in identifying how people are feeling, and even which words they are thinking.  “With improved imaging technology, it may become possible to ‘eavesdrop’ on a person’s internal dialogue, to the extent that they are thinking in words,” says Kaplan.  With a predictable dose of technological optimism, Kaplan sees nothing concerning in the possibility of machines being able to read people’s minds.  Instead, he thinks it opens up a world of possibilities.  For example, in civil lawsuits it’s difficult to ascertain how much pain and suffering an individual has undergone, and hence to assign damages.  Why, we could use brain imaging and AI to calculate precisely how much somebody was harmed!

I may not hold a doctorate, but I spend a lot of time working with computers in the real world, not the world of researchers.  I’m skeptical that functional brain imaging will be able to read people’s minds, but the possibility is alarming.  In today’s era of instant social media “viral” lynching, we all have to be quite careful what we say.  Even with innocent intentions, a slip of the tongue can set off Twitter mobs that will destroy your life and career.  Now, even guarding your speech won’t help you.  You may walk by an AI mind-reading machine and have your life ruined for thought-crime.  And we’re celebrating this?  Even Dr. Kaplan’s scenario of determining pain and suffering in lawsuits is ludicrous.  How quickly will people learn to game the machine, produce artificial emotional trauma, and reap the rewards?

And now to Plato.  In his Phaedrus, Plato tells the story of an inventor named Theuth who came to the Egyptian king and was showing off some of his creations.  This was in the time before writing existed.  After showing the king a number of inventions, Theuth showed him letters and writing:

And when he was talking about writing, Theuth said:  “King, this learning will make the Egyptians wiser and give them better memories.  For, I have found a medicine of both memory and wisdom.

The King said:  “Oh most artful Theuth!  While one person has the ability to create things skillfully, it takes another person to judge those things, and whether their use brings harm or help.  Now you, being the father of letters, through your love of them, have stated the opposite of their capability.  For, in the minds of those who learn this art, it will produce forgetfulness, by neglect of the memory, inasmuch as they have faith in writing, which consists of inscriptions outside of themselves, rather than remembering for themselves.”

Phaedrus 274E-275A.  (My own admittedly rough translation)

In other words, the inventor thinks writing will help memory, whereas the king points out it will hinder it!  I love this quote because it shows how the arrogance of inventors clouds their perception of their own inventions.  This is particularly true in Silicon Valley, where the pressure to always innovate removes any clear thinking about the consequences of the inventions. When confronted with the possibility of huge swaths of jobs being eliminated by their inventions, the lame response of the Silicon Valley innovators is to propose a universal basic income, hence making the movie Wall-E appear all too real.

This is a blog about network engineering, so how is this related?  Aren’t I involved in the automation of network systems?  Isn’t Cisco bringing AI to the world of networking?

Indeed we are, but I like to think that we’re a bit more realistic about it.  As a network engineer, and former TAC guy, I’ve spent countless hours doing nasty troubleshooting that, frankly, was hard and not particularly enjoyable.  Having executives looking over your shoulder, with the possibility of getting fired, with countless users freaking out, while trying to hunt down why the Internet just doesn’t work…  Well, if ML and AI will help me to locate the problem faster and restore operation to my network, I’m all for that.  If AI starts reading minds, I’m breaking out the tinfoil hat.

Blog Updates

A lot of the blog posts I write begin with “I’m just too busy to blog these days!”  Luckily, I have dozens of drafts so often blogging is just a question of cleaning up something I wrote a long time ago.  However, I’d like to keep things up here even as life becomes more hectic here at Cisco.  (I don’t know how things can get more hectic but they seem to each day!)

I don’t have many comments on this blog.  I think this is largely due to the fact that most of my readers are spambots.  However, I know there are a few out there who actually read and enjoy some of the posts.  For years I’ve required users to enter a name and email address to post a comment, and while many users just fill out fake information there, I’ve always thought it kept spam down.  This policy probably keeps genuine comments low too.  So, I’ve flipped the setting to allow anonymous comments.  I’ll test it for a few days, and if the spam is out of control I’ll flip it back.  My spam filtering software gets the vast majority of spam comments, so I hope it will continue to do its job with anonymous commenting.

The performance on this blog is also slow.  I’m looking at moving to a more fully managed offering from my hosting provider since I don’t have time to muck around with WordPress trying to get it faster.  I also need to get a certificate installed because people aren’t as happy with un-secure web sites these days.

So, a few things going on here!

Interviewing #1: How I got my first networking job

I’ve wanted to kick off a series for a while now on technical interviewing. Let me begin with a story.

My first job interview for a full network engineering role was at the San Francisco Chronicle in 2000. I had been working for five years in IT, mostly doing desktop and end-user support. I then decided to get a master’s degree in telecommunications management, which didn’t help at all, followed by a CCNA certification, which got me the interview.

My first interview was with the man who would be my boss. Henry was a manager who had almost no technical knowledge about networking, but I didn’t know that at the time. “Do you know Foundry switches at all?” Henry asked.

“No.” I was already worried.

“I doubted you would. That’s ok because we want to replace them all with Cisco and you know Cisco.” He pulled out a network diagram and handed it to me. “If you look at this, do you see a problem?” he asked.

I had never worked on a network larger than a couple switches, and now I was staring at a convoluted diagram depicting the network of the largest newspaper in Northern California. I was looking at subnet masks, link speeds, and hostnames, trying to find something wrong.

“I’m not sure,” I had to reply meekly.

He pointed at the main core switch for the network. There was only one, with no redundancy.  “There’s a huge single point of failure,” he said. I felt stupid missing the forest for the trees.

Henry brought me upstairs to interview with Tom, who was an on-site project management contractor from Lucent. I was extremely nervous–Lucent (later Avaya) was a big name in the industry and this guy worked for them! Henry left me with Tom. Tom pulled out a copy of the same diagram Henry was showing me earlier.

“Do you notice anything wrong with this?” he asked.

“Wow, that’s a huge single point of failure,” I replied.

He nodded his head in approval. “That’s right–very good.” He asked me a technical question about supernetting. I answered nervously, although it quickly became clear I knew more than he did.

The door flew open and another guy named Vincent walked in. He was the desktop support contractor, but again I didn’t know that. “Ask Jeff a few technical questions,” Tom said.

“Question number one,” said Vincent. “If you were running a network this size, would you subnet it?”

Now the answer seemed obviously to be “yes”, but I was trying to figure out if this is a trick. “Yes,” I answered, deciding to play it safe.

“Good! Next question: Can you route NetBios?” My desktop years were almost exclusively dedicated to Macs and I didn’t even know what NetBios was. I figured it was a 50/50 shot, and the way he asked it seemed to suggest the answer.

“No,” I said, trying to sound confident.

“He’s good,” said Vincent.

Next, the door flew open again, and in walked Bing. Bing was carrying some sort of network device with her. She handed it to me. “Is this a switch or a hub?” she asked. There was no obvious labeling on it, and as I turned the device over and over again in my hands, I had a sinking feeling.

“I don’t know,” I replied.

“Look at this,” Bing said. She pointed to a collision light. “Since there is one on each port, you can tell this is a switch.”

We don’t have collision lights on switches any more, but at the time we did and she had a valid point. Realizing this, I explained to her that a since a hub has a single collision domain, it would only have one collision light. I explained to her the concept of a collision domain, and how a switch worked versus a hub. It turns out she was a project manager for desktop support and she didn’t know any of that. Someone had just shown her the collision light thing and she thought it would be a good question.

“He’s good,” said Bing.

Tom had told me my next interview would be with a CCIE from Lucent. Now that was definitely intimidating. I knew of the reputation of CCIEs, and I didn’t expect to do well. The CCIE guy never showed up. As Tom was walking me to the elevator, however, we ran into him in the hallway. It turns out that Mike, who is still a friend of mine, and who later got three CCIE’s, had not passed the exam yet. We ended up talking about his home lab for a few minutes.

“He’s good,” said Mike. And a good thing too, as I’ve been in a couple of interviews with Mike and I’ve seen him grill people mercilessly.

I got a call with a job offer a few days later, and ended up working there five years.

For a while now I’ve had several posts in my drafts folder on the subject of technical interviewing. As you can see from the above story, interviews are often chaotic, disorganized, and conducted by unqualified people who have no plan. In the case of the San Francisco Chronicle, they made the right decision on me, and I don’t think anybody there would dispute that. I was thankful to begin my career in network engineering.

That said, I’ve had other interviews that didn’t go so well. Over the next few articles, I’d like to cover technical interviewing. Why do we interview people? How can we select good people from bad people? How worthwhile are the typical technical questions? Are gotchas worth throwing out “just to see how the candidate reacts”? Are interviews purely subjective or can we make them data-driven and objective?

I’ll throw out a few more anecdotes from my own experience to illustrate my points–feel free to comment with some of your own!

Moving carpets for $2000

I worked for two years at a Cisco Gold Partner.  The first year was great.  We were trying to start up a Cisco practice in San Francisco (they were primarily a Citrix partner before), so my buddy and I wined and dined Cisco channel account managers trying to impress them with our CCIE’s and get them to steer business our way.  Eventually, the 2009 financial crisis hit and business started to dry up.  The jobs became fewer and less interesting.  I had two CCIE’s and at one point, I drove out to Mare Island near San Francisco to install a single switch for a customer whose entire network consisted of–a single switch.  I always recommend people not to stay in jobs like this too long, as it hurts your prospects for future employment.

Potential Employer:  “So what kind of jobs have you done lately?”

You:  “Uh, I installed one switch at a customer.”

Anyhow, we had one other customer that managed to keep me surprisingly busy, considering their network was quite small as well.  They were a local builder, and with three small offices connected together with ASAs and VPN tunnels.  The owner was filthy rich and also paranoid about security, which meant I was out there a lot changing passwords, tightening up ACLs, and cleaning up the mess the last network engineer had left.

The owner had a ranch near Wilits, CA which was reputed to be the size of the city of Concord, CA.  He also had two jets to take him to his private landing strip at his ranch.  Being a pilot myself, the prospect of a trip in a small jet to his ranch made me wish for some sort of network problems up there.  However, there wasn’t much up there for me to work on.  He had a single ASA 5505 connected to satellite uplink which he primarily used to connect to the cameras (which he had everywhere) at the ranch.

One day, my contact at the builder told me the cameras weren’t reachable.  Yes!  Finally a trip in the jet.  We set a date and I spent my time wondering whether I’d get the Lear or the Citation.

Unfortunately, when the day rolled around, the weather was hideous.  A Lear jet can handle most any weather, but the little airstrip had no instrument approaches.  Instead, my contact gave me an alternative:  I was to drive up there with her in-house cabling contractor (I’ll call him “Tim”) to do the job.  (I never understood why a business this small had an in-house cabling contractor.  As far as I knew he didn’t work on the actual construction projects associated with the company.)  Now from San Francisco, the drive to Willits is about 2.5 hours.  However, the ranch was near Willits.  After driving 2.5 hours to Willits, we had another hour drive over dirt roads to the middle of nowhere.

The cabling contractor was exactly the sort of person with whom I have nothing in common, and spending 3.5 hours in a car with him, in the era before smartphones are a handy distraction, was painful.  Tim loved fishtailing his truck as we drove on dirt roads on the side of a mountain.  I think he also liked just scaring the white collar guy.  It worked.

We arrived at the ranch and Tim opened up the back of his pickup.  “Can you give me a hand here?” he asked.  In the bed of his truck were several large carpet rolls and piles of dry cleaning.  I grabbed one end of a carpet roll and began the backbreaking work.  My company was billing me out at $250/hour to haul some lady’s dry-cleaning into her ranch.

The ASA itself was located in a pole in the middle of the property, which had a satellite dish on top.  I was amazed the ASA 5505 even functioned out there, given that the external temperature could reach over 100 degrees Fahrenheit.  The metal box housing the ASA was like an oven.  I consoled into it and immediately saw a problem.  Latency on the link was over one second round-trip.  There was no way he was going to get real-time video streaming with this slow satellite uplink.  I reported my findings to Tim and, after eating lunch with the ranch hands, we hopped back in the truck.  Tim put on a song called “You piss me off, f*cking jerk” while we drove.  I guess he didn’t like me.

When I mentor people, I often tell them you have to know the right time to quit a job.  There were several signs in this story that it was time for a change.  With two CCIEs, installing a single switch or working on a single ASA 5505 was not really a good use of my skills.  Neither was moving in carpet rolls and dresses for $250/hour.  Luckily I had enough big jobs at the partner that I managed to get through my interviews at Juniper without trouble.

Meanwhile, a few years later I read about the FBI raiding the builder who was my customer.  I guess he had good reasons for cameras.

 

I am not a coder!

I recently replied to a comment that I think warrants a full blog post.

I’ve been here at Cisco working on programmability for a few years.  Brian Turner wrote in to say, essentially:  Hang on!  I became a network engineer precisely because I don’t want to be a coder!  I tried programming and hated it!  Now you’re telling me to become a programmer!

As I said in my reply, I have a lot of sympathy for him.  It reminds me of a story.

Back when I was at Juniper, I met with the IT department’s head of automation to discuss using some of his tools for network automation.  Jeremy was an expert in all things Puppet and Ansible, and a rather enthusiastic promoter of these tools on the server/app side of the house.  He had also managed to get Puppet running on a Junos device.  I was meeting with him because, frankly, the wind seemed to be blowing in his direction.  That said, I did not share his enthusiasm.  He told me about a server guy he had worked with, Stephane.  When Jeremy proposed to Stephane that he should use automation tools to make his life easier, Stephane vehemently rejected the idea, and the meeting ended with Stephane banging his fists on the table and shouting “I am not a coder!”

Flash forward a couple years and Stephane ended up the head of automation for a major company.  Apparently he finally bought into the idea.

Frankly I had no desire to become a coder either.  When I interviewed at Cisco, most of my discussions were around the controllers I was working with at the time, data center fabrics, etc.  When I arrived, my new boss assigned me as his Principal TME for programmability.  I never claimed to be an expert in this area.  Two months later I was presenting to Tech Field Day, and experienced automation guys like Jason Edelman and Matt Oswalt on how to run Puppet on a Nexus switch.  Three years later and I’m known as a NETCONF/YANG guy.  I’d barely heard of them when I started.

As I replied to Brian, Cisco doesn’t want him or anyone to learn Python or YANG or whatever.  Think about it from my perspective in product management.  Implementing YANG models for all of IOS XE is a massive undertaking.  Engineering devoted a huge amount of effort to pull this off.  Huge.  Mandating YANG models for their ongoing development burns cycles.  Product marketing and engineering would never prioritize this unless we thought there was a high probability someone would use it.  In other words, we don’t want people to use it so much as customers want us to develop it.  We have demand for programmable interfaces for network devices, and hence we’ve delivered on it.   My job as a TME is not to push NETCONF/YANG on anyone, but to provide the enablement to make it easier for someone to use this technology if they themselves want to.

As I often say in my presentations, the why is important.  Why do some customers demand these interfaces?  Well, because they know Notepad is a horrible automation tool, and it’s what 90% of network engineers use.  If you want to configure 50 switches, you’re going to configure one, paste the config into Notepad, tweak a few values, and then paste it into the next switch.  Do this 48 more times and tell me if this is the best use of your time as a highly skilled network engineer.  You can write a script to do this and save yourself a lot of trouble.  Or use Ansible to do it.  Or Cisco DNAC.  Whatever you want.  But if you want any of these tools to work efficiently, you need a machine interface, which CLI is not.  If you don’t believe me, try writing a script to do regular expression-based parsing of CLI outputs.  It’s a lot easier with YANG.

The point is not for network engineers to become programmers.  The point is to add some tools to your toolbox to help you focus on what you do well.  One weekend spent with a Python course and one more weekend with a DevNet course on YANG will give you a tool you can use to make your life easier.  That’s it.  Some customers may take it a lot further, of course, and go way into CI/CD workflows and that’s fine.  If you want to do 95% of your work in CLI and write a few scripts to do the other 5%, that’s fine.  If you want to use Cisco DNAC to do almost everything, knock yourself out.  It’s about what works best for you, as a network engineer.

I often point out how lousy my code quality is.  I’m sometimes ashamed to show the code for some of the scripts I’ve written.  I’m not a coder!  That’s a point I often make.  I don’t want to be a full-time software developer.  I’m a network engineer.  So for Brian and all the other CCIE’s out there, keep doing what you do best, but don’t close yourself off to some additional tools that will make your life easier.

TAC Tales #16: To microburst or not to microburst

I’ve mentioned before that EIGRP SIA was my nightmare case at TAC, but there was one other type of case that I hated–QoS problems.  Routing protocol problems tend to be binary.  Either the route is there or it isn’t;  either the pings go through or they don’t.  Even when a route is flapping, that’s just an extreme version of the binary problem.  QoS is different.  QoS cases often involved traffic that was passing sometimes or in certain amounts, but would start having problems when different sizes of traffic went through, or possibly traffic was dropping at a certain rate.  Thus, the routes could be perfectly fine, pings could pass, and yet QoS was behaving incorrectly.

In TAC, we would regularly get cases where the customer claimed traffic was dropping on a QoS policy below the configured rate.  For example, if they configured a policing profile of 1000 Mbps, sometimes the customer would claim the policer was dropping traffic at, say, 800 Mbps.  The standard response for a TAC agent struggling to figure out a QoS policy issue like this was to say that the link was experiencing “microbursting.”  If a link is showing a 800 Mbps traffic rate, this is actually an average rate, meaning the link could be experiencing short bursts above this rate that exceed the policing rate, but are averaged out in the interface counters.  “Microbursting” was a standard response to this problem for two reasons:  first, it was most often the problem;  second, it was an easy way to close the case without an extensive investigation.  The second reason is not as lazy as it may sound, as microbursts are common and are usually the cause of these symptoms.

Thus, when one of our large service provider customers opened a case stating that their LLQ policy was dropping packets before the configured threshold, I was quick to suspect microbursts.  However, working in high-touch TAC, you learn that your customers aren’t pushovers and don’t always accept the easy answer.  In this case, the customer started pushing back, claiming that the call center which was connected to this circuit generated a constant stream of traffic and that he was not experiencing microbursts.  So much for that.

This being the 2000’s, the customer had four T1’s connected in a single multi-link PPP (MLPPP) bundle.  The LLQ policy was dropping traffic at one quarter of the threshold it was configured for.  Knowing I wouldn’t get much out of a live production network, I reluctantly opened a lab case for the recreate, asking for a back-to-back router with the same line cards, a four-link T1 interconnection, and a traffic generator.  As always, I made sure my lab had exactly the same IOS release as the customer.

Once the lab was set up I started the traffic flowing, and much to my surprise, I saw traffic dropping at one quarter of the configured LLQ policy.  Eureka!  Anyone who has worked in TAC will tell you that more often than not, lab recreates fail to recreate the customer problem.  I removed and re-applied the service policy, and the problem went away.  Uh oh.  The only thing worse than not recreating a problem is recreating it and then losing it again before developers get a chance to look at it.

I spent some time playing with the setup, trying to get the problem back.  Finally, I reloaded the router to start over and, sure enough, I got the traffic loss again.  So, the problem occurred at start-up, but when the policy was removed and re-applied, it corrected itself.  I filed a bug and sent it to engineering.

Because it was so easy to recreate, it didn’t take long to find the answer.  The customer was configuring their QoS policy using bandwidth percentages instead of absolute bandwidth numbers.  This means that the policy bandwidth would be determined dynamically by the router based on the links it was applied to.  It turns out that IOS was calculating the bandwidth numbers before the MLPPP bundle was fully up, and hence was using only a single T1 as the reference for the calculation instead of all four.  The fix was to change the priority of operations in IOS, so that the MLPPP bundle came up before the QoS policy was applied.

So much for microbursts.  The moral(s) of the story?  First, the most obvious cause is often not the cause at all.  Second, determined customers are often right.  And third:  even intimidating QoS cases can have an easy fix.

Do we hate network engineers?

I was doing well on the blog for a few months but lately fell behind.  With (now) 12 people reporting to me, and three major areas of responsibility (SD-Access, Assurance, and Programmability), it’s not easy to find time to write up a blog post.   I have about five drafts needing work but I cannot seem to find the will to finish them.  Sometimes, however, it just takes a spark to get me going. That spark came in my inbox from Ivan Peplnjak.  I like Ivan’s blog posts, which, while often not favorable to Cisco, are nonetheless fair and balanced and raise some very important points.

“Why Is Every SDN Vendor Bashing Networking Engineers?” asks Ivan in the form email I received.  “[T]he vendors know they wouldn’t be able to sell their latest concoctions to people who actually understand how networking works and why some architectures have no chance of ever working in real life,” answers Ivan.  “The only way to sell the warez is to try to convince everyone else how to get rid of the pesky ossified CLI jockeys.”

Now I work for a vendor, and since I deal with the aforementioned products, I guess I am an SDN vendor.  That would seem to qualify me to speak on this subject.  (With, of course, the usual disclaimer that the opinions here are my own and do not represent Cisco officially.)

Selling Concoctions

I must admit, I do want to sell our products.  Everyone at Cisco should want our products to sell.  Just about all of us have a personal, financial stake in the matter, whether we have stock grants or ESPP.  We would be insane not to want people to buy our products.  I, and most of my co-workers, are driven by far more than finance, however.  We all want to know that our work means something, and that we are coming up with innovative solutions to problems.  Otherwise, why show up in the office every day?

We operate in a highly competitive environment, which means if we are not constantly innovating and coming up with better ways to do things, we will all suffer.  You can complain about the macroeconomic system, and believe me, I’m not a Randian, objectivist believer in unbridled capitalism.  But, at the end of the day, a public company needs to create the perception of future value in the eyes of the stock market, and that’s a motivating factor for all of us.

These things being said, I’ve been in product management for a few years now and I have never heard anyone, ever, talk about trying to put one over on our customers.  I’m not saying that’s what Ivan means here, but it’s an accusation I’ve heard before.  In the first place, our customers are network engineers who are quite smart.  If ever I’ve presented to my customer and was not crystal clear on what I was talking about and what advantage it would bring the customer, they will let me know it.  We’re constantly trying to find ways to do things better and make our customers’ lives easier.  As somebody who worked in IT for more years than product management, I’m very interested in this subject.  There were a lot of things that were frustrating and I want to fix things that used to annoy me.  You can argue about whether we’ve come up with the right ideas, but I hope nobody questions our motivations.

CLI Jockeys

Do I bash CLI jockeys in order to sell my products?  I should hope not, given that most of my customers are CLI jockeys, as I am myself!  I have two CCIEs and a JNCIE.  I spent a couple years in routing protocols TAC and many years in IT.  I spent a long time learning my trade and I have a lot of respect for those who have put the time and effort into learning it as well.  It’s not easy.

However, I don’t operate under the delusion that network engineers do a good job of configuring and managing CLI.  When I was at Juniper, I had designed a new NGMVPN system for our WAN.  I handed it off to the implementation team with some sample configs and asked them to come back to me with their plan.  I think we were touching about 20 devices the first go around.  The engineer came back with 20 Word documents.  He took my sample config and copied and pasted it into Word, and then modified the config in a separate Word doc for each CE/PE he was touching.  CLI itself isn’t a problem, but how we manage it.  This is where programmability and automation tools come in.  At the very least Ansible templating would have made this easier.  Software-Defined Networking (a very loose term, for what it’s worth), is not about replacing ossified CLI jockeys but getting them to focus on what they should be doing (network engineering) and avoiding what they should not (pasting stuff in Word docs.)

SD-Access takes this quite a bit further than Ansible, NETCONF, and other device-level tools.  Rather than saying “I want this device to be a LISP MS/MR” and so forth, you just say “I want this device to be a control plane node” and the system figures out what you need.  Theoretically we could change from LISP to some other protocol and the end-user shouldn’t even notice.  The idea here is somewhat like a fly-by-wire system.  When a pilot operated the controls of an airplane, they used to be directly coupled to the control surfaces via hydraulics.  Now, the pilot is operating what is essentially a joystick, providing control inputs to a computer, which then computes the best way to move the control surfaces given the conditions.  This is then relayed to servo motors in the wings, tail, etc.  The complexity of a fly-by-wire system is much higher than an old hydraulic system, but the complexity is hidden from the pilot in order to provide a better experience.  Likewise, with SD-Access, we’ve made the details more complex in order to deliver a better experience (TrustSec, layer 3 routed backbone, etc.) while hiding the complexity from the user.  It’s a different approach, for sure, but the idea is to allow engineers to focus on the right problems, like how to design their network, and not worry so much about configuration.

A New Era?

I’ve written extensively (see, for example, here, and here) about the role for CLI-jockey network engineers in the future.  When airplanes switched from the old dials and gauges to sleek, modern computerized (glass) cockpits, I’m sure some old timers threw up their hands, retired, and got their old Piper Super Cubs out of the hanger to do some “real” flying.  But most adapted, and in the end, saw how the new automation systems helped them do their jobs better.  That’s an era I’m looking forward to.  And as I always, always say, the pilots who fly the new cockpits still need to understand weather systems, engines, navigation, etc.  We still need network engineers who know how networks operate.

Meanwhile, I won’t bash any CLI jockeys and I hope nobody else here does either.