netconf

All posts tagged netconf

As I write this, a number of sites out on the Internet are down because of an outage at Amazon Web Services.  Delta Airlines is suffering a major outage.  On a personal note, my wife’s favorite radio app and my Lutron lighting system are not operating correctly.  Of course, this outage is a reminder of the simple principle of not putting one’s eggs in a single basket.  AWS became the dominant web provider early on, but there are multiple viable alternatives now.  Long before the modern cloud emerged, I regularly ran disaster recovery exercises to ensure business continuity when a data center or service provider failed.  Everyone who uses a cloud provider better have a backup, and you better figure out a way to periodically test that backup.  A few startups have emerged to make this easier.

While the cause of the outage is yet unknown, there was an interesting comment in an Newsweek article on the outage.  Doug Madory, director of internet analysis an Kentik Inc, said:  “More and more these outages end up being the product of automation and centralization of administration…”  I’ve been involved in automation in some form or another for my entire six years at Cisco, and one aspect of automation is not talked about enough:  automation gone wild.  Let me give a non-computer example.

Back when I worked at the San Francisco Chronicle, the production department installed a new machine in our Union City printing plant.  The Sunday paper, back then, had a large number of inserts with advertisements and circulars that needed to be stuffed into the paper.  They were doing this manually, if you can believe it.

The new machine had several components.  One part of the process involved grabbing the inserts and carrying them in a conveyor system high above the plant floor, before dropping them down into the inserter.  It’s hard to visualize, so I’ve included a picture of a similar machine.

You can see the inserts coming in via the conveyor, hanging vertically.  This conveyor extended quite far.  One day I was in the plant, working on some networking thing or other, and the insert machine was running.  I looked back and saw the conveyor glitch somehow, and then a giant ball of paper started to form in the corner of the room, before finally exploding and raining paper down on the floor of the plant.  There was a commotion and one of the workers had to shut the machine down.

The point is, automation is great until it doesn’t work.  When it fails, it fails big.  You don’t just get a single problem, but a compounding problem.  It wasn’t just a single insert that got hit by the glitch, but dozens of them, if not more.  When you use manual processes, failures are contained.

Let’s tie this back to networking.  Say you need to configure hundreds of devices with some new code, perhaps adding a new routing protocol.  If you do it by hand in one device, and suddenly routes start dropping out of the routing table, chances are you won’t proceed with the other devices.  You’ll check your config to see what happened and why.  But if you set up, say, a Python script to run around and do this via NETCONF to 100 devices, suddenly you might have a massive outage on your hands.  The same could happen using a tool like Ansible, or even a vendor network management platform.

There are ways to combat this problem, of course.  Automated checks and validation after changes is an important one, but the problem with this approach is you cannot predict every failure.  If you program 10 checks, it’s going to fail in way #11, and you’re out of luck.

As I said, I’ve spent years promoting automation.  You simply couldn’t build a network like Amazon’s without it.  And it’s critical for network engineers to continue developing skills in this area.  We, as vendors and promoters of automation tools, need to be careful how we build and sell these tools to limit customer risk.

Eventually they got the inserter running again.  Whatever the cause of Amazon’s outage, let’s hope it’s not automation gone wild.

There were quite a few big announcements at Cisco Live this year.  One of the big ones was the overhaul of the certification program.  A number of new certifications were introduced (such as the DevNet CCNA/CCNP), and the existing ones were overhauled.  I wanted to do a post about this because I was involved with the certification program for quite a while on launching these.  I’m posting this on my personal blog, so my thoughts here are, of course, personal and not official.

First, the history.  Back when I was at Juniper, I had the opportunity to write questions for the service provider written exams.  It was a great experience, and I got thorough training from the cert program on how to properly write exam questions.  I don’t really remember how I got invited to do it, but it was a good opportunity, as a certified (certifiable?) individual, to give back to the program.  When I came to Cisco, I quickly connected with the cert program here, offering my services as a question writer. I had the training from Juniper, and was an active CCIE working on programmability.  It was a perfect fit, and a nice chance to recertify without taking the test, as writing/reviewing questions gets your CCIE renewed.

As I was managing a team within the business unit that was working on Software-Defined Access and programmability, it seemed logical for me to talk to the program about including those topics on the test.  I can assure you there was a lot of internal debate about this, as the CCIE exam is notoriously complex, and the point of our Intent-Based Networking products is simplicity.  One product manager even suggested a separate CCIE track for SD-Access, an idea I rejected immediately for that very reason.

Still, as I often point out here and elsewhere, SDN technologies do not mitigate the need for network engineers.  SDN products, all SDN products, are complex precisely because they are automated.  Automation enables us to build more complex things, in general.  You wouldn’t want to configure all the components of SD-Access by hand.  Still, we need engineers who understand what the automation tools are doing, and how to work with all the components which comprise a complex solution like SD-Access.  Network engineers aren’t going to disappear.

For this reason, we wanted SD-Access, SDWAN, and also device programmability (NETCONF/YANG, for example) to be on the lab.  We want to have engineers who know and understand these technologies, and the certification program is a fantastic way to help people to learn them.  I, and some members of my team, spent several months working with the CCIE program to build a new blueprint, which became the CCIE Enterprise Infrastructure.  The storied CCIE Routing and Switching will be no more.

At the end of the day, the CCIE exam has always adapted to changed in networking.  The R/S exam no longer has ISDN or IPX on it, nor should it.  Customers are looking for more automated solutions, and the exam is keeping pace.  If you’re studying for this exam, the new blueprint may be intimidating.  That said, CCIE exams have always been intimidating.  But think about this:  if you pass this exam, your resume will have skills on it that will make you incredibly marketable.

The new CCIE-EI (we always abbreviate stuff, right?) breaks down like this:

  • 60% is classic networking, the core routing protocols we all know and love.
  • 25% is SDx:  SD-Access and SD-WAN, primarily.
  • 15% is programmability.  NETCONF/YANG, controller APIs, Ansible, etc.

How do you study for this?  Like you study for anything.  Read about it and lab it.  There is quite a bit of material out there on all these subjects, but let me make some suggestions:

Programmability

You are not expected to be a programming expert for this section of the exam.  It’s not about seeing if you can write complex programs, but whether you know the basics well enough to execute some tasks via script/Ansible/etc instead of CLI.  DevNet is replete with examples of how to send NETCONF messages, or read data off a router or switch with programmable interfaces.  Download them, play with them, spend some time learning the fundamentals of Python, and relax about it.

  • Learn:  DevNet is a phenomenal resource.  Hank Preston, an evangelist for DevNet, has put out a wealth of material on programmability.  In addition, there is the book on IOS XE programmability I wrote with some colleagues.
  • Lab:  You can lab programmability stuff easily on your laptop.  Python and ncclient are free, as is Ansible.  If you have any sort of lab setup already, all you need to do is set up a Linux VM or install some tools on to your laptop.

Software-Defined

This is, as I said before, a tough one to test on.  After all, to add a device to an SD-Access fabric, you select it and click “Add to Fabric.”  What’s there to test?  Well, since these are new products you of course need to understand the components of SD-Access/SDWAN and how they interoperate.  How does policy work?  How do fabric domains talk to non-fabric domains?  There is plenty to study here.

  • Learn:  Again, we’ve written books on SD-Access and SD-WAN.  Also, we are moving a lot of documentation into Cisco Communities.
  • Lab:  Well, this is harder.  We’re working on getting SD-Access into the hands of learning partners, so you’ll have a place to get your hands on it.  We’re also working on virtualizing SD-Access as much as possible, to make it easier for people to run in labs.  I don’t have a timeframe on the latter, but hopefully the former we can do soon.

These are huge but exciting changes. I’ve been very lucky to have landed at a job where I am at the forefront of the changes in the industry, but this new exam will give others the opportunity to move themselves in that direction as well.  Happy labbing!

I recently replied to a comment that I think warrants a full blog post.

I’ve been here at Cisco working on programmability for a few years.  Brian Turner wrote in to say, essentially:  Hang on!  I became a network engineer precisely because I don’t want to be a coder!  I tried programming and hated it!  Now you’re telling me to become a programmer!

As I said in my reply, I have a lot of sympathy for him.  It reminds me of a story.

Back when I was at Juniper, I met with the IT department’s head of automation to discuss using some of his tools for network automation.  Jeremy was an expert in all things Puppet and Ansible, and a rather enthusiastic promoter of these tools on the server/app side of the house.  He had also managed to get Puppet running on a Junos device.  I was meeting with him because, frankly, the wind seemed to be blowing in his direction.  That said, I did not share his enthusiasm.  He told me about a server guy he had worked with, Stephane.  When Jeremy proposed to Stephane that he should use automation tools to make his life easier, Stephane vehemently rejected the idea, and the meeting ended with Stephane banging his fists on the table and shouting “I am not a coder!”

Flash forward a couple years and Stephane ended up the head of automation for a major company.  Apparently he finally bought into the idea.

Frankly I had no desire to become a coder either.  When I interviewed at Cisco, most of my discussions were around the controllers I was working with at the time, data center fabrics, etc.  When I arrived, my new boss assigned me as his Principal TME for programmability.  I never claimed to be an expert in this area.  Two months later I was presenting to Tech Field Day, and experienced automation guys like Jason Edelman and Matt Oswalt on how to run Puppet on a Nexus switch.  Three years later and I’m known as a NETCONF/YANG guy.  I’d barely heard of them when I started.

As I replied to Brian, Cisco doesn’t want him or anyone to learn Python or YANG or whatever.  Think about it from my perspective in product management.  Implementing YANG models for all of IOS XE is a massive undertaking.  Engineering devoted a huge amount of effort to pull this off.  Huge.  Mandating YANG models for their ongoing development burns cycles.  Product marketing and engineering would never prioritize this unless we thought there was a high probability someone would use it.  In other words, we don’t want people to use it so much as customers want us to develop it.  We have demand for programmable interfaces for network devices, and hence we’ve delivered on it.   My job as a TME is not to push NETCONF/YANG on anyone, but to provide the enablement to make it easier for someone to use this technology if they themselves want to.

As I often say in my presentations, the why is important.  Why do some customers demand these interfaces?  Well, because they know Notepad is a horrible automation tool, and it’s what 90% of network engineers use.  If you want to configure 50 switches, you’re going to configure one, paste the config into Notepad, tweak a few values, and then paste it into the next switch.  Do this 48 more times and tell me if this is the best use of your time as a highly skilled network engineer.  You can write a script to do this and save yourself a lot of trouble.  Or use Ansible to do it.  Or Cisco DNAC.  Whatever you want.  But if you want any of these tools to work efficiently, you need a machine interface, which CLI is not.  If you don’t believe me, try writing a script to do regular expression-based parsing of CLI outputs.  It’s a lot easier with YANG.

The point is not for network engineers to become programmers.  The point is to add some tools to your toolbox to help you focus on what you do well.  One weekend spent with a Python course and one more weekend with a DevNet course on YANG will give you a tool you can use to make your life easier.  That’s it.  Some customers may take it a lot further, of course, and go way into CI/CD workflows and that’s fine.  If you want to do 95% of your work in CLI and write a few scripts to do the other 5%, that’s fine.  If you want to use Cisco DNAC to do almost everything, knock yourself out.  It’s about what works best for you, as a network engineer.

I often point out how lousy my code quality is.  I’m sometimes ashamed to show the code for some of the scripts I’ve written.  I’m not a coder!  That’s a point I often make.  I don’t want to be a full-time software developer.  I’m a network engineer.  So for Brian and all the other CCIE’s out there, keep doing what you do best, but don’t close yourself off to some additional tools that will make your life easier.

I was doing well on the blog for a few months but lately fell behind.  With (now) 12 people reporting to me, and three major areas of responsibility (SD-Access, Assurance, and Programmability), it’s not easy to find time to write up a blog post.   I have about five drafts needing work but I cannot seem to find the will to finish them.  Sometimes, however, it just takes a spark to get me going. That spark came in my inbox from Ivan Peplnjak.  I like Ivan’s blog posts, which, while often not favorable to Cisco, are nonetheless fair and balanced and raise some very important points.

“Why Is Every SDN Vendor Bashing Networking Engineers?” asks Ivan in the form email I received.  “[T]he vendors know they wouldn’t be able to sell their latest concoctions to people who actually understand how networking works and why some architectures have no chance of ever working in real life,” answers Ivan.  “The only way to sell the warez is to try to convince everyone else how to get rid of the pesky ossified CLI jockeys.”

Now I work for a vendor, and since I deal with the aforementioned products, I guess I am an SDN vendor.  That would seem to qualify me to speak on this subject.  (With, of course, the usual disclaimer that the opinions here are my own and do not represent Cisco officially.)

Selling Concoctions

I must admit, I do want to sell our products.  Everyone at Cisco should want our products to sell.  Just about all of us have a personal, financial stake in the matter, whether we have stock grants or ESPP.  We would be insane not to want people to buy our products.  I, and most of my co-workers, are driven by far more than finance, however.  We all want to know that our work means something, and that we are coming up with innovative solutions to problems.  Otherwise, why show up in the office every day?

We operate in a highly competitive environment, which means if we are not constantly innovating and coming up with better ways to do things, we will all suffer.  You can complain about the macroeconomic system, and believe me, I’m not a Randian, objectivist believer in unbridled capitalism.  But, at the end of the day, a public company needs to create the perception of future value in the eyes of the stock market, and that’s a motivating factor for all of us.

These things being said, I’ve been in product management for a few years now and I have never heard anyone, ever, talk about trying to put one over on our customers.  I’m not saying that’s what Ivan means here, but it’s an accusation I’ve heard before.  In the first place, our customers are network engineers who are quite smart.  If ever I’ve presented to my customer and was not crystal clear on what I was talking about and what advantage it would bring the customer, they will let me know it.  We’re constantly trying to find ways to do things better and make our customers’ lives easier.  As somebody who worked in IT for more years than product management, I’m very interested in this subject.  There were a lot of things that were frustrating and I want to fix things that used to annoy me.  You can argue about whether we’ve come up with the right ideas, but I hope nobody questions our motivations.

CLI Jockeys

Do I bash CLI jockeys in order to sell my products?  I should hope not, given that most of my customers are CLI jockeys, as I am myself!  I have two CCIEs and a JNCIE.  I spent a couple years in routing protocols TAC and many years in IT.  I spent a long time learning my trade and I have a lot of respect for those who have put the time and effort into learning it as well.  It’s not easy.

However, I don’t operate under the delusion that network engineers do a good job of configuring and managing CLI.  When I was at Juniper, I had designed a new NGMVPN system for our WAN.  I handed it off to the implementation team with some sample configs and asked them to come back to me with their plan.  I think we were touching about 20 devices the first go around.  The engineer came back with 20 Word documents.  He took my sample config and copied and pasted it into Word, and then modified the config in a separate Word doc for each CE/PE he was touching.  CLI itself isn’t a problem, but how we manage it.  This is where programmability and automation tools come in.  At the very least Ansible templating would have made this easier.  Software-Defined Networking (a very loose term, for what it’s worth), is not about replacing ossified CLI jockeys but getting them to focus on what they should be doing (network engineering) and avoiding what they should not (pasting stuff in Word docs.)

SD-Access takes this quite a bit further than Ansible, NETCONF, and other device-level tools.  Rather than saying “I want this device to be a LISP MS/MR” and so forth, you just say “I want this device to be a control plane node” and the system figures out what you need.  Theoretically we could change from LISP to some other protocol and the end-user shouldn’t even notice.  The idea here is somewhat like a fly-by-wire system.  When a pilot operated the controls of an airplane, they used to be directly coupled to the control surfaces via hydraulics.  Now, the pilot is operating what is essentially a joystick, providing control inputs to a computer, which then computes the best way to move the control surfaces given the conditions.  This is then relayed to servo motors in the wings, tail, etc.  The complexity of a fly-by-wire system is much higher than an old hydraulic system, but the complexity is hidden from the pilot in order to provide a better experience.  Likewise, with SD-Access, we’ve made the details more complex in order to deliver a better experience (TrustSec, layer 3 routed backbone, etc.) while hiding the complexity from the user.  It’s a different approach, for sure, but the idea is to allow engineers to focus on the right problems, like how to design their network, and not worry so much about configuration.

A New Era?

I’ve written extensively (see, for example, here, and here) about the role for CLI-jockey network engineers in the future.  When airplanes switched from the old dials and gauges to sleek, modern computerized (glass) cockpits, I’m sure some old timers threw up their hands, retired, and got their old Piper Super Cubs out of the hanger to do some “real” flying.  But most adapted, and in the end, saw how the new automation systems helped them do their jobs better.  That’s an era I’m looking forward to.  And as I always, always say, the pilots who fly the new cockpits still need to understand weather systems, engines, navigation, etc.  We still need network engineers who know how networks operate.

Meanwhile, I won’t bash any CLI jockeys and I hope nobody else here does either.

Since I finished my series of articles on the CCIE, I thought I would kick off a new series on my current area of focus:  network programmability.  The past year at Cisco, programmability and automation have been my focus, first on Nexus and now on Catalyst switches.  I did do a two-part post on DCNM, a product which I am no longer covering, but it’s well worth a read if you are interested in learning the value of automation.

One thing I’ve noticed about this topic is that many of the people working on and explaining programmability have a background in software engineering.  I, on the other hand, approach the subject from the perspective of a network engineer.  I did do some programming when I was younger, in Pascal (showing my age here) and C.  I also did a tiny bit of C++ but not enough to really get comfortable with object-oriented programming.  Regardless, I left programming (now known as “coding”) behind for a long time, and the field has advanced in the meantime.  Because of this, when I explain these concepts I don’t bring the assumptions of a professional software engineer, but assume you, the reader, know nothing either.

Thus, it seems logical that in starting out this series, I need to explain what exactly programmability means in the context of network engineering, and what it means to do something programmatically.

Programmability simply means the capacity for a network device to be configured and managed by a computer program, as opposed to being configured and managed directly by humans.  This is a broad definition, but technically using an interface like Expect (really just CLI) or SNMP qualifies as a type of programmability.  Thus, we can qualify this by saying that programmability in today’s world includes the design of interfaces that are optimized for machine-to-machine control.

To manage a network device programmatically really just means using a computer program to control that network device.  However, when we consider a computer program, it has certain characteristics over and above simply controlling a device.  Essential to programming is the design of control structures that make decisions based on certain pieces of information.

Thus, we could use NETCONF to simply push configuration to a router or switch, but this isn’t the most optimal reason to use it.  It would be a far more effective use of NETCONF if we read some piece of data from the device (say interface errors) and took an action based on that data (say, shutting the interface down when the counters got too high.)  The other advantage of programmability is the ability to tie together multiple systems.  For example, we could read a device inventory out of APIC-EM, and then push config to devices based on the device type.  In other words, the decision-making capability of programmability is most important.

Network programmability encompasses a number of technologies:

  • Day 0 technologies to bring network devices up with an appropriate software version and configuration, with little to no human intervention.  Examples:  ZTP, PoAP, PnP.
  • Technologies to push and read configuration and operational data from devices.  Examples:  SNMP, NETCONF.
  • Automation systems such as Puppet, Chef, and Ansible, which are not strictly programming languages, but allow for configuration of numerous devices based on the role of the device.
  • The use of external programming languages, such as Ruby and Python, to interact with network devices.
  • The use of on-box programming technologies, such as on-box Python and EEM, to control network devices.

In this series of articles we will cover all of these topics as well as the mysteries of data models, YANG, YAML, JSON, XML, etc., all within the context of network engineering.  I know when I first encountered YANG and data models, I was quite confused and I hope I clear up some of this confusion.

1
1