Lab Notes: Proxmox

Note: I’m playing with themes right now. I don’t particularly like the one I’m on today, except code blocks look better and it reads nicer on phones. We’ll see if I can tweak the CSS to get it where I want it, or will need to switch themes again.

The nice thing about being an individual contributor again, and not being allowed to travel to Cisco Live in Amsterdam, is that I can spend more time in the lab. I acquired a UCS (actually DN appliance) server, for some new projects. It had a bunch of 2TB drives, and being a lab server I just threw them all in a RAID 0 for 12 TB. Simple. Next, getting a hypervisor up and running.

I’ve used ESXi for years, but it’s gotten a bit tiresome. The licensing is expensive and complex. For internal Cisco users, we need to go through a lot of hoops to get a license. So I decided to try ProxMox instead. It’s free, I don’t need to muck around with licenses, and so far, it seems to behave basically like ESXi. I had a little trouble finding a thing or two, but ChatGPT was quite helpful.

I’m setting up a GNS3 instance, on which I hope to install SONIC. The docs for doing SONIC on GNS3 say that you should install it on Windows, but I’m not convinced this is the case. I’m probably wrong and will find out. The SONIC image is 24 GB (!) so I had to make sure to account for that.

Anyways, the ProxMox install was a breeze. Just mounted the ISO via CIMC and away I went. My only complaint is that every time I log in to the browser, it asks me to buy a subscription. But this is not needed for day-to-day operation, just for updates.

So far, I’d recommend ProxMox enthusiastically. We’ll see if it holds up.

VRRP, IPv6, and the “Why” Question

Ivan Pelpnjak has an interesting post (H/T to Ethan Banks) critiquing Cisco’s VRRP v3 CLI implementation for IPv6.  It got me thinking about a question I was told never to ask:  “Why?”

Never ask why?

Back when I really wanted to get into networking, I took a five-day bootcamp with Global Knowledge taught by a gentleman named Tony Marshall.  It was four-days of study, and then on day 5 you took the CCNA exam.  Tony was an awesome teacher, and I passed with flying colors.  However, if ever you asked Tony “Why?”, he would interrupt you before you finished your sentence and say, “never ask why!”  The way Cisco implemented it was the gospel truth handed down from above.  This is probably the correct approach for a four-day CCNA bootcamp, and producing automatons who never question Cisco’s decisions was also great for business.  In product management we call that “competitive advantage.”

Since I’ve been doing management stuff for eight years my CLI skills are a little rusty.  I’m going to walk through the VRRP problem in a lot more detail, largely for my own benefit, and then circle back to the “Why?” question in the broader sense.

VRRP and IPv6

Virtual Router Redundancy Protocol (VRRP) is a protocol used to provide default gateway redundancy, and is an IETF cousin of Cisco’s earlier Hot Standby Routing Protocol (HSRP).  VRRP allows two (or more) routers to share a single IP address, which one router actively responds to.  Should the router go down, the secondary router begins responding to that address.  If you’re reading this blog, you probably already know that.

VRRP is simple to configure.  I set up a VRRP group in my lab, using VRRP v3, which supports both IPv4 and IPv6 addresses:

s1(config-if)#vrrp 1 address-family ipv4
s1(config-if-vrrp)#address 10.1.1.100 primary
s1(config-if-vrrp)#priority 200

Here you can see I specify we’re using a v4 address, and I made 10.1.1.100 the primary address.  (You can have more.)  I also configured the priority value for this router (actually switch).  I did the same on the other router with a lower priority, and viola, we have a VIP responding to traffic.  From another router on this segment:

r1#p 10.1.1.100
.!!!!

 

How does ARP work in this case?  The MAC address used comes from a reserved block designated for VRRP.  The last octet is the VRRP group number.

r1#sh arp
Internet 10.1.1.100 0 0000.5e00.0101 ARPA Ethernet0/0

 

If this were VRRP group 2, then the MAC address would be 0000.5e00.0102.

So far so good, but how does it work for IPv6?

If I just attempt to configure a global IPv6 address with VRRP, a couple funky things happen.  First, I cannot add the “primary” keyword:

s1(config-if)#vrrp 1 address-family ipv6
s1(config-if-vrrp)#address 2001:db8:cafe:100::100/64
s1(config-if-vrrp)#address 2001:db8:cafe:100::100/64 pri?
% Unrecognized command

 

It will take the command without the “primary” keyword, but VRRP gets stuck in INIT:

Vlan111 - Group 1 - Address-Family IPv6
State is INIT (No Primary virtual IP address configured)

 

As Ivan points out, this is per the RFC.   The RFC requires that the source IPv6 address be a link-local address.  If you haven’t done a lot of v6, recall that each interface will have a globally routable IPv6 address as well as a link-local address, which, as the name implies, can only be used on the local link and is not routable beyond that.  Without getting into all the details, the link local address is assigned from the FE80::/10 block, and on Ethernet interfaces it is made unique by using the interface MAC address.

OK, the RFC says:

“…each router has its own link-local IPv6 address on the LAN interface and a link-local IPv6 address per VRID that is shared with the other routers that serve the same VRID.” (Emphasis added.)

So the router has its own address (assigned automatically when IPv6 comes up), but each virtual router ID requires a link-local address too.  Elsewhere, the RFC states the primary address must be the link-local address.  But why?

The RFC is explicit that this link-local address is the source-address for VRRP packets, so presumably this is to keep the VRRP packets local, and to prevent them from being routed outside the segment.

Fair enough.  Let’s add a link-local primary address:

s1(config-if)#vrrp 1 address-family ipv6
s1(config-if-vrrp)#address FE80::1 primary

And…it works!

Vlan111 - Group 1 - Address-Family IPv6
  State is MASTER

Easy, right?  One more step, perhaps, but it’s not too hard.  Except, as Ivan rightly points out, some vendors (like Arista) assign this link-local address automatically for you based on the group MAC address.  Aside from the fact that this makes it easier (no need to keep track of another VRRP address), it makes interoperability challenging.  If I want my IOS XE device to be in the same VRRP group as an Arista device, I need them both agreeing on the LLA.  (Although, to be honest, I’m not totally convinced on this use case.  Usually devices peering in VRRP groups will be identical hardware, even in mixed-vendor environments.  But I might be wrong.)

Let’s go with Ivan’s thought that this is poorly designed.  I have access to internal Cisco tools that Ivan doesn’t.  What are people saying?  Are we getting TAC cases on this?

It turns out, not many.  I found one case in Spanish, which I can read (even without ChatGPT).  The customer complained he configured VRRP “pero no hay ping”.  No “hay ping” because no hay link-local address.  The TAC engineer pointed the customer to a bug, so why isn’t it fixed?  Well, the bug (login may be required) is a documentation bug, requesting the link-local address be added to the VRRP docs.  They added this:

Device(config-if-vrrp)# address FE80::46B6:BEFF:FE50:DBB0 primary

 

…but nowhere do they tell you how they derived that address.  Meanwhile, on one of our internal mailers, an engineer asked about just using FE80::1 like I did.  The development engineer working on IPv6 VRRP replied:

Is the FE80::1 recommended?    <<< usage is fine AFAIK

 

Usage is fine, ok, but is this the best way to go about assigning link-local addresses for VRRP?

The “Why?” Question

Back to the larger question of why.  Cisco is not the only company to have built confusing CLI.  In Ivan’s very post, he even notes that Dell allows you to skip the link-local address altogether, in flagrant violation of the RFC.  And years ago I wrote pieces on utterly baffling Juniper CLI, such as the concept of and syntax for configuring RIB Groups.

Who designs the CLI?  A committee of experts who used to work at Apple, after thorough user experience research, you think?  More than likely, it was designed by the engineer charged with implementing the protocol.  This engineer is certainly not a network engineer and does not need to deal with operating a network.  Sure, he/she can configure routers a bit, but mainly for lab testing of his own features.  What makes sense to an engineer burning the oil at 3am in Bangalore might not make sense for the rest of us, but as long as the feature works, some way, some how, he can close his assignment and move on to the next thing.

But surely the executives at the vendor care, right?  Short answer:  they will care, and care a lot, if they meet with a giant customer and the giant customer screams at them that feature XYZ is impossible to configure.  Otherwise, it’s highly unlikely a senior vice president cares how VRRP is configured, and also unlikely he knows what it is.  Thus, the development of CLI can be a bit Lord of the Flies.  No adults in the room, so to speak.

What about filing an actual bug to get it fixed?  I could do so, for Ivan’s sake, but two problems.  First, the bug will be the lowest severity possible and engineering will be more focused on other things.  Second, if they do get to it, they won’t want to fix it for fear of breaking existing configs.  Years ago I brought up the problem with IOS XE’s model-driven management syntax.  To enable NETCONF you type “netconf-yang” at the global level. To enable RESTCONF you type “restconf”.  GNMI is “gnmi-yang”.  Couldn’t we have a code block called “model-based-management” (or something) and then you enable each of the features under that?  No, engineering told me, people are already using it the old way.  (Honestly, like, nobody was in 2016, but anwyays…)  Great idea, Jeff, but we can’t do it.

The TL;DR summary for you:  Tony Marshall was right.  Never ask: “Why?”

Postscript for those who think AI will replace network engineers:

I asked ChatGPT to provide me a configuration for VRRP v6, and provided it the global standby address.  ChatGPT gave me the configuration without a primary address.  Then, when I showed ChatGPT the “show vrrp” output which explicitly says “No primary virtual IP address configured”, it responded by telling me to:

  1. Make sure IPv6 is enabled globally. (!)
  2. Make sure the interface is up (!!)
  3. Reapply exactly the same config [without an LLA] (!!!)

So apparently we built CLI so confusing that even genius-level AI cannot figure it out.  Or, we got a ways to go before we get replaced by ChatGPT.  Just sayin’, “analysts.”

Programmability for Network Engineers

Since I finished my series of articles on the CCIE, I thought I would kick off a new series on my current area of focus:  network programmability.  The past year at Cisco, programmability and automation have been my focus, first on Nexus and now on Catalyst switches.  I did do a two-part post on DCNM, a product which I am no longer covering, but it’s well worth a read if you are interested in learning the value of automation.

One thing I’ve noticed about this topic is that many of the people working on and explaining programmability have a background in software engineering.  I, on the other hand, approach the subject from the perspective of a network engineer.  I did do some programming when I was younger, in Pascal (showing my age here) and C.  I also did a tiny bit of C++ but not enough to really get comfortable with object-oriented programming.  Regardless, I left programming (now known as “coding”) behind for a long time, and the field has advanced in the meantime.  Because of this, when I explain these concepts I don’t bring the assumptions of a professional software engineer, but assume you, the reader, know nothing either.

Thus, it seems logical that in starting out this series, I need to explain what exactly programmability means in the context of network engineering, and what it means to do something programmatically.

Programmability simply means the capacity for a network device to be configured and managed by a computer program, as opposed to being configured and managed directly by humans.  This is a broad definition, but technically using an interface like Expect (really just CLI) or SNMP qualifies as a type of programmability.  Thus, we can qualify this by saying that programmability in today’s world includes the design of interfaces that are optimized for machine-to-machine control.

To manage a network device programmatically really just means using a computer program to control that network device.  However, when we consider a computer program, it has certain characteristics over and above simply controlling a device.  Essential to programming is the design of control structures that make decisions based on certain pieces of information.

Thus, we could use NETCONF to simply push configuration to a router or switch, but this isn’t the most optimal reason to use it.  It would be a far more effective use of NETCONF if we read some piece of data from the device (say interface errors) and took an action based on that data (say, shutting the interface down when the counters got too high.)  The other advantage of programmability is the ability to tie together multiple systems.  For example, we could read a device inventory out of APIC-EM, and then push config to devices based on the device type.  In other words, the decision-making capability of programmability is most important.

Network programmability encompasses a number of technologies:

  • Day 0 technologies to bring network devices up with an appropriate software version and configuration, with little to no human intervention.  Examples:  ZTP, PoAP, PnP.
  • Technologies to push and read configuration and operational data from devices.  Examples:  SNMP, NETCONF.
  • Automation systems such as Puppet, Chef, and Ansible, which are not strictly programming languages, but allow for configuration of numerous devices based on the role of the device.
  • The use of external programming languages, such as Ruby and Python, to interact with network devices.
  • The use of on-box programming technologies, such as on-box Python and EEM, to control network devices.

In this series of articles we will cover all of these topics as well as the mysteries of data models, YANG, YAML, JSON, XML, etc., all within the context of network engineering.  I know when I first encountered YANG and data models, I was quite confused and I hope I clear up some of this confusion.

1
1

Cisco DCNM 10 Overlay Provisioning Part 2

Introduction

My role at Cisco is transitioning to enterprise so I won’t be working on Nexus switches much any more.  I figured it would be a good time to finish this article on DCNM.  In my previous article, I talked about DCNM’s overlay provisioning capabilities, and explained the basic structure DCNM uses to describe multi-tenancy data centers.  In this article, we will look at the details of 802.1q-triggered auto-configuration, as well as VMtracker-based triggered auto-configuration.  Please be aware that the types of triggers and their behaviors depends on the platform you are using.  For example, you cannot do dot1q-based triggers on Nexus 9k, and on Nexus 5k, while I can use VMTracker, it will not prune unneeded VLANs.  If you have not read my previous article, please review it so the terminology is clear.

Have a look at the topology we will use:

autoconfig

The spine switches are not particularly relevant, since they are just passing traffic and not actively involved in the auto-configuration.  The Nexus 5K leaves are, of course, and attached to each is an ESXi server.  The one on the left has two VMs in two different VLANs, 501 and 502.  The 5k will learn about the active hosts via 802.1q triggering.  The rightmost host has only one VM, and in this case the switch will learn about the host via VMtracker.  In both cases the switches will provision the required configuration in response to the workloads, without manual intervention, pulling their configs from DCNM as described in part 1.

Underlay

Because we are focused on overlay provisioning, I won’t go through the underlay piece in detail.  However, when you set up the underlay, you need to configure some parameters that will be used by the overlay.  Since you are using DCNM, I’m assuming you’ll be using the Power-on Auto-Provision feature, which allows a switch to get its configuration on bootup without human intervention.

config-fabric

Recall that a fabric is the highest level construct we have in DCNM.  The fabric is a collection of switches running an encapsulation like VXLAN or FabricPath together.  Before we create any PoAP definitions, we need to set up a fabric.  During the definition of the fabric, we choose the type of provisioning we want.  Since we are doing auto-config, we choose this option as our Fabric Provision Mode.  The previous article describes the Top Down option.

Next, we need to build our PoAP definitions.  Each switch that is configured via PoAP needs a definition, which tells DCNM what software image and what configuration to push.  This is done from the Configure->PoAP->PoAP Definitions section of DCNM.  Because generating a lot of PoAP defs for a large fabric is tedious, DCNM10 also allows you to build a fabric plan, where you specify the overall parameters for your fabric and then DCNM generates the PoAP definitions automatically, incrementing variables such as management IP address for you.  We won’t cover fabric plans here, but if you go that route the auto-config piece is basically the same.config-poap-defs

Once we are in the PoAP definition for the individual switch, we can enable auto-configuration and select the type we want.

poap-def

In this case I have only enabled the 802.1q trigger.  If I want to enable VMTracker, I just check the box for it and enter my vCenter server IP address and credentials in the box below.  I won’t show the interface configuration, but please note that it is very important that you choose the correct access interfaces in the PoAP defs.  As we will see, DCNM will add some commands under the interfaces to make the auto-config work.

Once the switch has been powered on and has pulled down its configuration, you will see the relevant config under the interfaces:

n5672-1# sh run int e1/33
interface Ethernet1/33
switchport mode trunk
encapsulation dynamic dot1q
spanning-tree port type edge trunk

If the encapsulation command is not there, auto-config will not work.

Overlay Definition

Remember from the previous article that, after we define the Fabric, we need to define the Organization (Tenant), the Partition (VRF), and then the Network.  Defining the organization is quite easy: just navigate to the organizations screen, click the plus button, and give it a name.  You may only have one tenant in your data center, but if you have more than one you can define them here.  (I am using extremely creative and non-trademark-violating names here.)  Be sure to pick the correct Fabric name in the drop-down at the top of the screen;  often when you don’t see what you are expecting in DCNM, it is because you are not on the correct fabric.

config-organization

Next, we need to add the partition, which is DCNM’s name for a VRF.  Remember, we are talking about mutlitenancy here.  Not only do we have the option to create multiple tenants, but each tenant can have multiple VRFs.  Adding a VRF is just about as easy as adding an organization.  DCNM does have a number of profiles that can be used to build the VRFs, but for most VXLAN fabrics, the default EVPN profile is fine.  You only need to enter the VRF name.  The partition ID is already populated for you, and there is no need to change it.

partition

There is something important to note in the above screen shot.  The name given to the VRF is prepended with the name of the organization.  This is because the switches themselves have no concept of organization.  By prepending the org name to the VRF, you can easily reuse VRF names in different organizations without risk of conflict on the switch.

Finally, let’s provision the network.  This is where most of the configuration happens.  Under the same LAN Fabric Automation menu we saw above, navigate to Networks.  As before, we need to pick a profile, but the default is fine for most layer 3 cases.

network

Once we specify the organization and partition that we already created, we tell DCNM the gateway address.  This is the Anycast gateway address that will be configured on any switch that has a host in this VLAN.  Remember that in VXLAN/EVPN, each leaf switch acts as a default gateway for the VLANs it serves.  We also specify the VLAN ID, of course.

Once this is saved, the profile is in DCNM and ready to go.  Unlike with the underlay config, nothing is actually deployed on the switch at this point.  The config is just sitting in DCNM, waiting for a workload to become active that requires it.  If no workload requires the configuration we specified, it will never make it to a switch.  And, if switch-1 requires the config while switch-2 does not, well, switch-2 will never get it.  This is the power of auto-configuration.  It’s entirely likely that when you are configuring your data center switches by hand, you don’t configure VLANs on switches that don’t require them, but you have to figure that out yourself.  With auto-config, we just deploy as needed.

Let’s take a step back and review what we have done:

  1. We have told DCNM to enable 802.1q triggering for switches that are configured with auto-provisioning.
  2. We have created an organization and partition for our new network.
  3. We have told DCNM what configuration that network requires to support it.

Auto-Config in Action

Now that we’ve set DCNM up, let’s look at the switches.  First of all, I verify that there is no VRF or SVI configuration for this partition and network:


jemclaug-hh14-n5672-1# sh vrf all
VRF-Name VRF-ID State Reason
default 1 Up --
management 2 Up --

jemclaug-hh14-n5672-1# sh ip int brief vrf all | i 192.168
jemclaug-hh14-n5672-1#

We can see here that there is no VRF other than the default and management VRFs, and there are no SVI’s with the 192.168.x.x prefix. Now I start a ping from my VM1, which you will recall is connected to this switch:

jeffmc@ABC-VM1:~$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=9 ttl=255 time=0.794 ms
64 bytes from 192.168.1.1: icmp_seq=10 ttl=255 time=0.741 ms
64 bytes from 192.168.1.1: icmp_seq=11 ttl=255 time=0.683 ms

Notice from the output that the first ping I get back is sequence #9. Back on the switch:

jemclaug-hh14-n5672-1# sh vrf all
VRF-Name VRF-ID State Reason
ABCCorp:VRF1 4 Up --
default 1 Up --
management 2 Up --
jemclaug-hh14-n5672-1# sh ip int brief vrf all | i 192.168
Vlan501 192.168.1.1 protocol-up/link-up/admin-up
jemclaug-hh14-n5672-1#

Now we have a VRF and an SVI! As I stated before, the switch itself has no concept of organization, which is really just a tag DCNM applies to the front of the VRF. If I had created a VRF1 in the XYZCorp organization, the switch would not see it as a conflict because it would be XYZCorp:VRF1 instead of ABCCorp:VRF1.

If we want to look at the SVI configuration, we need to use the expand-port-profile option. The profile pulled down from DCNM is not shown in the running config:


jemclaug-hh14-n5672-1# sh run int vlan 501 expand-port-profile

interface Vlan501
no shutdown
vrf member ABCCorp:VRF1
ip address 192.168.1.1/24 tag 12345
fabric forwarding mode anycast-gateway

VMTracker

Let’s have a quick look at VMTracker. As I mentioned in this blog and previous one, dot1q triggering requires the host to actually send data before it auto-configures the switch. The nice thing about VMTracker is that it will configure the switch when a VM becomes active, regardless of whether it is actually sending data. The switch itself is configured with the address of and credentials for your vCenter server, so it becomes aware when a workload is active.

Note: Earlier I said you have to configure the vCenter address and credentials in DCNM. Don’t be confused! DCNM is not talking to vCenter, the Nexus switch actually is. You only put it in DCNM if you are using Power-on Auto-Provisioning. In other words, DCNM will not establish a connection to vCenter, but will push the address and credentials down to the switch, and the switch establishes the connection.

We can see the VMTracker configuration on the second Nexus 5K:


jemclaug-hh14-n5672-2(config-vmt-conn)# sh run | sec vmtracker
feature vmtracker
encapsulation dynamic vmtracker
vmtracker fabric auto-config
vmtracker connection vc
remote ip address 172.26.244.120
username administrator@vsphere.local password 5 Qxz!12345
connect

The feature is enabled, and the “encapsulation dynamic vmtracker” command is applied to the relevant interfaces. (You can see the command here, but because I used the “| sec” option to view the config, you cannot see what interface it is applied under. We can see that I also supplied the vCenter IP address and login credentials. (The password is sort-of encrypted.) Notice also the connect statement. The Nexus will not connect to the vCenter server until this is applied. Now we can look at the vCenter connection:

jemclaug-hh14-n5672-2(config-vmt-conn)# sh vmtracker status
Connection Host/IP status
-----------------------------------------------------------------------------
vc 172.26.244.120 Connected

We have connected successfully!

As with dot1q triggering, there is no VRF or SVI configured yet for our host:

jemclaug-hh14-n5672-2# sh vrf all
VRF-Name VRF-ID State Reason
default 1 Up --
management 2 Up --
jemclaug-hh14-n5672-2# sh ip int brief vrf all | i 192.168

We now go to vSphere (or vCenter) and power up the VM connected to this switch:

 

vcenter-power-on

Once we bring up the VM, we can see the switch has been notified, and the VRF has been automatically provisioned, along with the SVI.

jemclaug-hh14-n5672-2# sh vmtracker event-history | i VM2
761412 Dec 21 2016 13:43:02:572793 ABC-VM2 on 172.26.244.177 in DC4 is powered on
jemclaug-hh14-n5672-2# sh vrf all | i ABC
ABCCorp:VRF1 4 Up --
jemclaug-hh14-n5672-2# sh ip int brief vrf all | i 192
Vlan501 192.168.1.1 protocol-up/link-up/admin-up
jemclaug-hh14-n5672-2#

Thus, we had the same effect as with dot1q triggering, but we didn’t need to wait for traffic!

I hope these articles have been helpful. Much of the documentation on DCNM right now is not in the form of a walk-through, and while I don’t offer as much detail, hopefully these articles should get you started. Remember, with DCNM, you get advanced features free for 30 days, so go ahead and download and play with it.

Cisco DCNM 10 Overlay Provisioning

Introduction

I’ve been side-tracked for a while doing personal articles, so I thought it would be a good time to get back to some technical explanations.  Seeing that I work for Cisco now, I thought it would be a good time to cover some Cisco technology.  My focus here has been on programmability and automation.  Some of this work has involved using tools like Puppet and Ansible to configure switches, as well as Python and NETCONF.  I also recently had a chance to present BRKNMS-2002 at Cisco Live in Las Vegas, on LAN management with Data Center Network Manager 10.  It was my first Cisco Live breakout, and of course I had a few problems, from projector issues to live demo failures.  Ah well.  But for those of you who don’t have access to the CL library, or the time to watch the breakout, I thought I’d cover an important DCNM concept for you here on my blog.

Continue reading