Ivan Pelpnjak has an interesting post (H/T to Ethan Banks) critiquing Cisco’s VRRP v3 CLI implementation for IPv6. It got me thinking about a question I was told never to ask: “Why?”
Never ask why?
Back when I really wanted to get into networking, I took a five-day bootcamp with Global Knowledge taught by a gentleman named Tony Marshall. It was four-days of study, and then on day 5 you took the CCNA exam. Tony was an awesome teacher, and I passed with flying colors. However, if ever you asked Tony “Why?”, he would interrupt you before you finished your sentence and say, “never ask why!” The way Cisco implemented it was the gospel truth handed down from above. This is probably the correct approach for a four-day CCNA bootcamp, and producing automatons who never question Cisco’s decisions was also great for business. In product management we call that “competitive advantage.”
Since I’ve been doing management stuff for eight years my CLI skills are a little rusty. I’m going to walk through the VRRP problem in a lot more detail, largely for my own benefit, and then circle back to the “Why?” question in the broader sense.
VRRP and IPv6
Virtual Router Redundancy Protocol (VRRP) is a protocol used to provide default gateway redundancy, and is an IETF cousin of Cisco’s earlier Hot Standby Routing Protocol (HSRP). VRRP allows two (or more) routers to share a single IP address, which one router actively responds to. Should the router go down, the secondary router begins responding to that address. If you’re reading this blog, you probably already know that.
VRRP is simple to configure. I set up a VRRP group in my lab, using VRRP v3, which supports both IPv4 and IPv6 addresses:
s1(config-if)#vrrp 1 address-family ipv4
s1(config-if-vrrp)#address 10.1.1.100 primary
s1(config-if-vrrp)#priority 200
Here you can see I specify we’re using a v4 address, and I made 10.1.1.100 the primary address. (You can have more.) I also configured the priority value for this router (actually switch). I did the same on the other router with a lower priority, and viola, we have a VIP responding to traffic. From another router on this segment:
r1#p 10.1.1.100 .!!!!
How does ARP work in this case? The MAC address used comes from a reserved block designated for VRRP. The last octet is the VRRP group number.
r1#sh arp Internet 10.1.1.100 0 0000.5e00.0101 ARPA Ethernet0/0
If this were VRRP group 2, then the MAC address would be 0000.5e00.0102.
So far so good, but how does it work for IPv6?
If I just attempt to configure a global IPv6 address with VRRP, a couple funky things happen. First, I cannot add the “primary” keyword:
s1(config-if)#vrrp 1 address-family ipv6 s1(config-if-vrrp)#address 2001:db8:cafe:100::100/64 s1(config-if-vrrp)#address 2001:db8:cafe:100::100/64 pri? % Unrecognized command
It will take the command without the “primary” keyword, but VRRP gets stuck in INIT:
Vlan111 - Group 1 - Address-Family IPv6 State is INIT (No Primary virtual IP address configured)
As Ivan points out, this is per the RFC. The RFC requires that the source IPv6 address be a link-local address. If you haven’t done a lot of v6, recall that each interface will have a globally routable IPv6 address as well as a link-local address, which, as the name implies, can only be used on the local link and is not routable beyond that. Without getting into all the details, the link local address is assigned from the FE80::/10 block, and on Ethernet interfaces it is made unique by using the interface MAC address.
OK, the RFC says:
“…each router has its own link-local IPv6 address on the LAN interface and a link-local IPv6 address per VRID that is shared with the other routers that serve the same VRID.” (Emphasis added.)
So the router has its own address (assigned automatically when IPv6 comes up), but each virtual router ID requires a link-local address too. Elsewhere, the RFC states the primary address must be the link-local address. But why?
The RFC is explicit that this link-local address is the source-address for VRRP packets, so presumably this is to keep the VRRP packets local, and to prevent them from being routed outside the segment.
Fair enough. Let’s add a link-local primary address:
s1(config-if)#vrrp 1 address-family ipv6 s1(config-if-vrrp)#address FE80::1 primary
And…it works!
Vlan111 - Group 1 - Address-Family IPv6 State is MASTER
Easy, right? One more step, perhaps, but it’s not too hard. Except, as Ivan rightly points out, some vendors (like Arista) assign this link-local address automatically for you based on the group MAC address. Aside from the fact that this makes it easier (no need to keep track of another VRRP address), it makes interoperability challenging. If I want my IOS XE device to be in the same VRRP group as an Arista device, I need them both agreeing on the LLA. (Although, to be honest, I’m not totally convinced on this use case. Usually devices peering in VRRP groups will be identical hardware, even in mixed-vendor environments. But I might be wrong.)
Let’s go with Ivan’s thought that this is poorly designed. I have access to internal Cisco tools that Ivan doesn’t. What are people saying? Are we getting TAC cases on this?
It turns out, not many. I found one case in Spanish, which I can read (even without ChatGPT). The customer complained he configured VRRP “pero no hay ping”. No “hay ping” because no hay link-local address. The TAC engineer pointed the customer to a bug, so why isn’t it fixed? Well, the bug (login may be required) is a documentation bug, requesting the link-local address be added to the VRRP docs. They added this:
Device(config-if-vrrp)# address FE80::46B6:BEFF:FE50:DBB0 primary
…but nowhere do they tell you how they derived that address. Meanwhile, on one of our internal mailers, an engineer asked about just using FE80::1 like I did. The development engineer working on IPv6 VRRP replied:
Is the FE80::1 recommended? <<< usage is fine AFAIK
Usage is fine, ok, but is this the best way to go about assigning link-local addresses for VRRP?
The “Why?” Question
Back to the larger question of why. Cisco is not the only company to have built confusing CLI. In Ivan’s very post, he even notes that Dell allows you to skip the link-local address altogether, in flagrant violation of the RFC. And years ago I wrote pieces on utterly baffling Juniper CLI, such as the concept of and syntax for configuring RIB Groups.
Who designs the CLI? A committee of experts who used to work at Apple, after thorough user experience research, you think? More than likely, it was designed by the engineer charged with implementing the protocol. This engineer is certainly not a network engineer and does not need to deal with operating a network. Sure, he/she can configure routers a bit, but mainly for lab testing of his own features. What makes sense to an engineer burning the oil at 3am in Bangalore might not make sense for the rest of us, but as long as the feature works, some way, some how, he can close his assignment and move on to the next thing.
But surely the executives at the vendor care, right? Short answer: they will care, and care a lot, if they meet with a giant customer and the giant customer screams at them that feature XYZ is impossible to configure. Otherwise, it’s highly unlikely a senior vice president cares how VRRP is configured, and also unlikely he knows what it is. Thus, the development of CLI can be a bit Lord of the Flies. No adults in the room, so to speak.
What about filing an actual bug to get it fixed? I could do so, for Ivan’s sake, but two problems. First, the bug will be the lowest severity possible and engineering will be more focused on other things. Second, if they do get to it, they won’t want to fix it for fear of breaking existing configs. Years ago I brought up the problem with IOS XE’s model-driven management syntax. To enable NETCONF you type “netconf-yang” at the global level. To enable RESTCONF you type “restconf”. GNMI is “gnmi-yang”. Couldn’t we have a code block called “model-based-management” (or something) and then you enable each of the features under that? No, engineering told me, people are already using it the old way. (Honestly, like, nobody was in 2016, but anwyays…) Great idea, Jeff, but we can’t do it.
The TL;DR summary for you: Tony Marshall was right. Never ask: “Why?”
Postscript for those who think AI will replace network engineers:
I asked ChatGPT to provide me a configuration for VRRP v6, and provided it the global standby address. ChatGPT gave me the configuration without a primary address. Then, when I showed ChatGPT the “show vrrp” output which explicitly says “No primary virtual IP address configured”, it responded by telling me to:
- Make sure IPv6 is enabled globally. (!)
- Make sure the interface is up (!!)
- Reapply exactly the same config [without an LLA] (!!!)
So apparently we built CLI so confusing that even genius-level AI cannot figure it out. Or, we got a ways to go before we get replaced by ChatGPT. Just sayin’, “analysts.”