Skip navigation

Tag Archives: nexus

Starting a new job is always a challenge.  It’s one thing to do an internal transfer within a company, like I did when moving from Technical Marketing Engineer to Product Management.  I even stayed in the same team.  But moving to a new company and new role is overwhelming and risky.

I experienced this when I went to Juniper.  I’d defined my entire existence in terms of Cisco.  When I worked at a Gold partner, I refused to touch anything–servers, desktops, etc.–that was not Cisco-branded.  I’d barely even touched a Juniper device.  Now I was expected to coherently articulate a network architecture when I didn’t even understand the products that were involved.  (One could argue network architecture should not be product-specific, but we all know better.)

Ironically enough, after six years at Juniper, going back to Cisco proved challenging.  I didn’t touch a single Cisco device in six years.  At the time I went to Juniper, Nexus products didn’t even exist.  Now I was expected to be doing technical marketing–at the principal level–for products I hadn’t even heard of.

My boss, the legend Carl Solder, assigned me to programmability.  I knew nothing about it.  During my interviews with Carl, I had mostly talked about controller-based architectures.  While I was at Juniper, Jeremy Pruitt, an automation expert, had managed to get Puppet working for managing Juniper devices.  He pitched it to me and I had no interest at all.  This was a server technology, not a networking technology.  I was a protocols guy.

Now I was assigned to YANG, NETCONF, Python scripting, Ansible, and (ha, ha) Puppet.  I was fortunate to end up on a team of low-ego engineers who helped me to learn this stuff and who were willing to share opportunities with me.  They are still friends to this day.

One of the opportunities they shared was a chance to present at Tech Field Day/Networking Field Day.  In fact, TFD is a high-stress presentation, and nobody really wanted to do it.  Thus, three months into my new job, knowing nothing about Puppet or Nexus, I was called upon to present Puppet management of Nexus to one of the most challenging audiences we have.  I think TFD delegates can be a bit nicer these days, but back then a few of them loved to eviscerate vendors, especially Cisco.  Oh, and Jason Edelman and Matt Oswalt, two automation experts, would be in the audience.  Also Terry Slattery, who is the first person ever to pass the CCIE lab.  And my boss Carl, one of the best public speakers at Cisco.  And it would be livestreamed and everyone (it seemed) in Cisco would be watching it.  Great.

A tough room to stand in front of

The only way out was through.  My team was very helpful in preparing me, and I was able to spend a lot of time with engineering.  Truth be told, it was just technology, and I’m a technology guy.  It was learnable.  Cisco’s marketing team was also very helpful, explaining possible pitfalls and traps in presenting to the TFD audience.  For example, if you put “Cisco Confidential” on your slides, they’ll make fun of you.  (It’s a default on our PowerPoint templates.)  They told us not to cite Gartner.  I made a fake slide with ridiculously large “Cisco Confidential” and “DO NOT SHARE”  and a magic quadrant and presented it at rehearsal.  The look of horror from the marketing team told me they didn’t get the obvious joke.

It’s a joke, really!

Puppet was particularly challenging because, unlike Ansible, it was agent-based.  (I’m not sure if this is the case anymore, I rarely hear about Puppet in networking contexts.)  This meant you had to bring up a guestshell and install the agent on any device that it managed.  To be honest, this is less than ideal.  It means that before you could use Puppet to manage the device, there was a fair amount of setup required.  This could be done through PoAP (Power-on Auto-Provisioning, Cisco’s version of ZTP for Nexus), but it required a lot more work than simply enabling a few CLIs, all that was needed for Ansible.

There was a non-technical challenge that faced me as well:  I had been working on overcoming a terrible fear of public speaking.  I had melted down in one presentation in my early years at Juniper, and had spent the last several years doing training, Toastmasters, and anything I could to get over this problem.  I had successfully delivered some major presentations at Juniper before I left, but this was the highest pressure I had ever been under.  Would I panic?  Would I crack?  Would I be shown the door only a few months into the new job?  Public speaking is risky, because when you fail, you fail in front of a lot of people.

I’m still here, of course, and aside from a couple minor stumbles, it went off quite well.  Watching the video eight years later, I do cringe a bit at some of my slides.  I’m not a genius when it comes to PowerPoint, but I’ve done a lot of work to improve the quality of my decks.  Given the time constraints, I had to borrow some slides and hack together others.  I do hear some of my “signature” lines taking shape.  In nearly all my presentations on automation, I joke about Notepad being the most common automation tool used by network engineers, and this must be the first place I used that line.  And while Jason and Terry did ask questions, they didn’t break me and I was able to answer them quite assuredly.

My demo was a bit simplistic.  I provisioned a few VLANs and interfaces, and pushed a patch to a Nexus 9k.  As I started to take over lead on programmability, I changed my approach to it.  For example, when I showed network engineers how to provision a VLAN with NETCONF, their eyes would glaze over upon seeing the massive block of Python and XML needed to make it work.  Why not just type one line of CLI?  I became convinced we needed to show the value of programmability by showing how we can use it beyond just simple tasks.  Configure a VLAN?  Meh.  Use Webex Teams (or Slack) to interact with a router, better.  We did some silly demos like using Alexa to configure switches, but the point was actually to show that programmability opens up possibilities that were not there before.  Nonetheless, when I did this TFD, it was still early days.

My TFD session also ended up generating a popular meme, although it wasn’t on my account.  Jason Edelman and Matt Oswalt were sitting next to each other.  Jason was dressed in a sportcoat, clean cut, with Windows PC, and no stickers on it.  Matt was sitting next to him with unkempt hair, using a Mac covered with stickers.  Someone screenshotted it and posted it to Reddit with the caption, “the two types of network engineers”.  It’s funny because Matt and Jason wrote a book on network automation together.  (In the photo, my boss Carl looks on in the background on the right.  Dead center in red is Lauren Friedman, with whom I worked many times as our TFD coordinator in marketing.  Stephen Foskett, one of the TFD organizers, is between Matt and Jason.)

The two types of network engineers

 

In our careers as network engineers, we’re often thrown into things we know nothing about and are expected to quickly turn around and become experts. It happened every week in TAC. Working in product marketing, we’re often on the cutting edge and constantly fighting to stay above water. I had only a few weeks to learn Puppet, get a demo up and running, and present it to a challenging audience. The most important thing is to not get cocky, to understand what you don’t know, and to spend the hands-on time to plug those gaps in your knowledge.

Every public speaker knows the feeling of relief when you finally finish and the stress unloads. As I closed my talk and walked to the back of the room, Carl, my new boss, looked at me and gave me a thumbs-up. At that point if you told me I’d win Hall of Fame at Cisco Live for talking about programmability, I’d have thought you were crazy.

As for Puppet, I never touched it again.

Since I finished my series of articles on the CCIE, I thought I would kick off a new series on my current area of focus:  network programmability.  The past year at Cisco, programmability and automation have been my focus, first on Nexus and now on Catalyst switches.  I did do a two-part post on DCNM, a product which I am no longer covering, but it’s well worth a read if you are interested in learning the value of automation.

One thing I’ve noticed about this topic is that many of the people working on and explaining programmability have a background in software engineering.  I, on the other hand, approach the subject from the perspective of a network engineer.  I did do some programming when I was younger, in Pascal (showing my age here) and C.  I also did a tiny bit of C++ but not enough to really get comfortable with object-oriented programming.  Regardless, I left programming (now known as “coding”) behind for a long time, and the field has advanced in the meantime.  Because of this, when I explain these concepts I don’t bring the assumptions of a professional software engineer, but assume you, the reader, know nothing either.

Thus, it seems logical that in starting out this series, I need to explain what exactly programmability means in the context of network engineering, and what it means to do something programmatically.

Programmability simply means the capacity for a network device to be configured and managed by a computer program, as opposed to being configured and managed directly by humans.  This is a broad definition, but technically using an interface like Expect (really just CLI) or SNMP qualifies as a type of programmability.  Thus, we can qualify this by saying that programmability in today’s world includes the design of interfaces that are optimized for machine-to-machine control.

To manage a network device programmatically really just means using a computer program to control that network device.  However, when we consider a computer program, it has certain characteristics over and above simply controlling a device.  Essential to programming is the design of control structures that make decisions based on certain pieces of information.

Thus, we could use NETCONF to simply push configuration to a router or switch, but this isn’t the most optimal reason to use it.  It would be a far more effective use of NETCONF if we read some piece of data from the device (say interface errors) and took an action based on that data (say, shutting the interface down when the counters got too high.)  The other advantage of programmability is the ability to tie together multiple systems.  For example, we could read a device inventory out of APIC-EM, and then push config to devices based on the device type.  In other words, the decision-making capability of programmability is most important.

Network programmability encompasses a number of technologies:

  • Day 0 technologies to bring network devices up with an appropriate software version and configuration, with little to no human intervention.  Examples:  ZTP, PoAP, PnP.
  • Technologies to push and read configuration and operational data from devices.  Examples:  SNMP, NETCONF.
  • Automation systems such as Puppet, Chef, and Ansible, which are not strictly programming languages, but allow for configuration of numerous devices based on the role of the device.
  • The use of external programming languages, such as Ruby and Python, to interact with network devices.
  • The use of on-box programming technologies, such as on-box Python and EEM, to control network devices.

In this series of articles we will cover all of these topics as well as the mysteries of data models, YANG, YAML, JSON, XML, etc., all within the context of network engineering.  I know when I first encountered YANG and data models, I was quite confused and I hope I clear up some of this confusion.

1
1

Introduction

My role at Cisco is transitioning to enterprise so I won’t be working on Nexus switches much any more.  I figured it would be a good time to finish this article on DCNM.  In my previous article, I talked about DCNM’s overlay provisioning capabilities, and explained the basic structure DCNM uses to describe multi-tenancy data centers.  In this article, we will look at the details of 802.1q-triggered auto-configuration, as well as VMtracker-based triggered auto-configuration.  Please be aware that the types of triggers and their behaviors depends on the platform you are using.  For example, you cannot do dot1q-based triggers on Nexus 9k, and on Nexus 5k, while I can use VMTracker, it will not prune unneeded VLANs.  If you have not read my previous article, please review it so the terminology is clear.

Have a look at the topology we will use:

autoconfig

The spine switches are not particularly relevant, since they are just passing traffic and not actively involved in the auto-configuration.  The Nexus 5K leaves are, of course, and attached to each is an ESXi server.  The one on the left has two VMs in two different VLANs, 501 and 502.  The 5k will learn about the active hosts via 802.1q triggering.  The rightmost host has only one VM, and in this case the switch will learn about the host via VMtracker.  In both cases the switches will provision the required configuration in response to the workloads, without manual intervention, pulling their configs from DCNM as described in part 1.

Underlay

Because we are focused on overlay provisioning, I won’t go through the underlay piece in detail.  However, when you set up the underlay, you need to configure some parameters that will be used by the overlay.  Since you are using DCNM, I’m assuming you’ll be using the Power-on Auto-Provision feature, which allows a switch to get its configuration on bootup without human intervention.

config-fabric

Recall that a fabric is the highest level construct we have in DCNM.  The fabric is a collection of switches running an encapsulation like VXLAN or FabricPath together.  Before we create any PoAP definitions, we need to set up a fabric.  During the definition of the fabric, we choose the type of provisioning we want.  Since we are doing auto-config, we choose this option as our Fabric Provision Mode.  The previous article describes the Top Down option.

Next, we need to build our PoAP definitions.  Each switch that is configured via PoAP needs a definition, which tells DCNM what software image and what configuration to push.  This is done from the Configure->PoAP->PoAP Definitions section of DCNM.  Because generating a lot of PoAP defs for a large fabric is tedious, DCNM10 also allows you to build a fabric plan, where you specify the overall parameters for your fabric and then DCNM generates the PoAP definitions automatically, incrementing variables such as management IP address for you.  We won’t cover fabric plans here, but if you go that route the auto-config piece is basically the same.config-poap-defs

Once we are in the PoAP definition for the individual switch, we can enable auto-configuration and select the type we want.

poap-def

In this case I have only enabled the 802.1q trigger.  If I want to enable VMTracker, I just check the box for it and enter my vCenter server IP address and credentials in the box below.  I won’t show the interface configuration, but please note that it is very important that you choose the correct access interfaces in the PoAP defs.  As we will see, DCNM will add some commands under the interfaces to make the auto-config work.

Once the switch has been powered on and has pulled down its configuration, you will see the relevant config under the interfaces:

n5672-1# sh run int e1/33
interface Ethernet1/33
switchport mode trunk
encapsulation dynamic dot1q
spanning-tree port type edge trunk

If the encapsulation command is not there, auto-config will not work.

Overlay Definition

Remember from the previous article that, after we define the Fabric, we need to define the Organization (Tenant), the Partition (VRF), and then the Network.  Defining the organization is quite easy: just navigate to the organizations screen, click the plus button, and give it a name.  You may only have one tenant in your data center, but if you have more than one you can define them here.  (I am using extremely creative and non-trademark-violating names here.)  Be sure to pick the correct Fabric name in the drop-down at the top of the screen;  often when you don’t see what you are expecting in DCNM, it is because you are not on the correct fabric.

config-organization

Next, we need to add the partition, which is DCNM’s name for a VRF.  Remember, we are talking about mutlitenancy here.  Not only do we have the option to create multiple tenants, but each tenant can have multiple VRFs.  Adding a VRF is just about as easy as adding an organization.  DCNM does have a number of profiles that can be used to build the VRFs, but for most VXLAN fabrics, the default EVPN profile is fine.  You only need to enter the VRF name.  The partition ID is already populated for you, and there is no need to change it.

partition

There is something important to note in the above screen shot.  The name given to the VRF is prepended with the name of the organization.  This is because the switches themselves have no concept of organization.  By prepending the org name to the VRF, you can easily reuse VRF names in different organizations without risk of conflict on the switch.

Finally, let’s provision the network.  This is where most of the configuration happens.  Under the same LAN Fabric Automation menu we saw above, navigate to Networks.  As before, we need to pick a profile, but the default is fine for most layer 3 cases.

network

Once we specify the organization and partition that we already created, we tell DCNM the gateway address.  This is the Anycast gateway address that will be configured on any switch that has a host in this VLAN.  Remember that in VXLAN/EVPN, each leaf switch acts as a default gateway for the VLANs it serves.  We also specify the VLAN ID, of course.

Once this is saved, the profile is in DCNM and ready to go.  Unlike with the underlay config, nothing is actually deployed on the switch at this point.  The config is just sitting in DCNM, waiting for a workload to become active that requires it.  If no workload requires the configuration we specified, it will never make it to a switch.  And, if switch-1 requires the config while switch-2 does not, well, switch-2 will never get it.  This is the power of auto-configuration.  It’s entirely likely that when you are configuring your data center switches by hand, you don’t configure VLANs on switches that don’t require them, but you have to figure that out yourself.  With auto-config, we just deploy as needed.

Let’s take a step back and review what we have done:

  1. We have told DCNM to enable 802.1q triggering for switches that are configured with auto-provisioning.
  2. We have created an organization and partition for our new network.
  3. We have told DCNM what configuration that network requires to support it.

Auto-Config in Action

Now that we’ve set DCNM up, let’s look at the switches.  First of all, I verify that there is no VRF or SVI configuration for this partition and network:


jemclaug-hh14-n5672-1# sh vrf all
VRF-Name VRF-ID State Reason
default 1 Up --
management 2 Up --

jemclaug-hh14-n5672-1# sh ip int brief vrf all | i 192.168
jemclaug-hh14-n5672-1#

We can see here that there is no VRF other than the default and management VRFs, and there are no SVI’s with the 192.168.x.x prefix. Now I start a ping from my VM1, which you will recall is connected to this switch:

jeffmc@ABC-VM1:~$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=9 ttl=255 time=0.794 ms
64 bytes from 192.168.1.1: icmp_seq=10 ttl=255 time=0.741 ms
64 bytes from 192.168.1.1: icmp_seq=11 ttl=255 time=0.683 ms

Notice from the output that the first ping I get back is sequence #9. Back on the switch:

jemclaug-hh14-n5672-1# sh vrf all
VRF-Name VRF-ID State Reason
ABCCorp:VRF1 4 Up --
default 1 Up --
management 2 Up --
jemclaug-hh14-n5672-1# sh ip int brief vrf all | i 192.168
Vlan501 192.168.1.1 protocol-up/link-up/admin-up
jemclaug-hh14-n5672-1#

Now we have a VRF and an SVI! As I stated before, the switch itself has no concept of organization, which is really just a tag DCNM applies to the front of the VRF. If I had created a VRF1 in the XYZCorp organization, the switch would not see it as a conflict because it would be XYZCorp:VRF1 instead of ABCCorp:VRF1.

If we want to look at the SVI configuration, we need to use the expand-port-profile option. The profile pulled down from DCNM is not shown in the running config:


jemclaug-hh14-n5672-1# sh run int vlan 501 expand-port-profile

interface Vlan501
no shutdown
vrf member ABCCorp:VRF1
ip address 192.168.1.1/24 tag 12345
fabric forwarding mode anycast-gateway

VMTracker

Let’s have a quick look at VMTracker. As I mentioned in this blog and previous one, dot1q triggering requires the host to actually send data before it auto-configures the switch. The nice thing about VMTracker is that it will configure the switch when a VM becomes active, regardless of whether it is actually sending data. The switch itself is configured with the address of and credentials for your vCenter server, so it becomes aware when a workload is active.

Note: Earlier I said you have to configure the vCenter address and credentials in DCNM. Don’t be confused! DCNM is not talking to vCenter, the Nexus switch actually is. You only put it in DCNM if you are using Power-on Auto-Provisioning. In other words, DCNM will not establish a connection to vCenter, but will push the address and credentials down to the switch, and the switch establishes the connection.

We can see the VMTracker configuration on the second Nexus 5K:


jemclaug-hh14-n5672-2(config-vmt-conn)# sh run | sec vmtracker
feature vmtracker
encapsulation dynamic vmtracker
vmtracker fabric auto-config
vmtracker connection vc
remote ip address 172.26.244.120
username administrator@vsphere.local password 5 Qxz!12345
connect

The feature is enabled, and the “encapsulation dynamic vmtracker” command is applied to the relevant interfaces. (You can see the command here, but because I used the “| sec” option to view the config, you cannot see what interface it is applied under. We can see that I also supplied the vCenter IP address and login credentials. (The password is sort-of encrypted.) Notice also the connect statement. The Nexus will not connect to the vCenter server until this is applied. Now we can look at the vCenter connection:

jemclaug-hh14-n5672-2(config-vmt-conn)# sh vmtracker status
Connection Host/IP status
-----------------------------------------------------------------------------
vc 172.26.244.120 Connected

We have connected successfully!

As with dot1q triggering, there is no VRF or SVI configured yet for our host:

jemclaug-hh14-n5672-2# sh vrf all
VRF-Name VRF-ID State Reason
default 1 Up --
management 2 Up --
jemclaug-hh14-n5672-2# sh ip int brief vrf all | i 192.168

We now go to vSphere (or vCenter) and power up the VM connected to this switch:

 

vcenter-power-on

Once we bring up the VM, we can see the switch has been notified, and the VRF has been automatically provisioned, along with the SVI.

jemclaug-hh14-n5672-2# sh vmtracker event-history | i VM2
761412 Dec 21 2016 13:43:02:572793 ABC-VM2 on 172.26.244.177 in DC4 is powered on
jemclaug-hh14-n5672-2# sh vrf all | i ABC
ABCCorp:VRF1 4 Up --
jemclaug-hh14-n5672-2# sh ip int brief vrf all | i 192
Vlan501 192.168.1.1 protocol-up/link-up/admin-up
jemclaug-hh14-n5672-2#

Thus, we had the same effect as with dot1q triggering, but we didn’t need to wait for traffic!

I hope these articles have been helpful. Much of the documentation on DCNM right now is not in the form of a walk-through, and while I don’t offer as much detail, hopefully these articles should get you started. Remember, with DCNM, you get advanced features free for 30 days, so go ahead and download and play with it.