VRRP, IPv6, and the “Why” Question

Ivan Pelpnjak has an interesting post (H/T to Ethan Banks) critiquing Cisco’s VRRP v3 CLI implementation for IPv6.  It got me thinking about a question I was told never to ask:  “Why?”

Never ask why?

Back when I really wanted to get into networking, I took a five-day bootcamp with Global Knowledge taught by a gentleman named Tony Marshall.  It was four-days of study, and then on day 5 you took the CCNA exam.  Tony was an awesome teacher, and I passed with flying colors.  However, if ever you asked Tony “Why?”, he would interrupt you before you finished your sentence and say, “never ask why!”  The way Cisco implemented it was the gospel truth handed down from above.  This is probably the correct approach for a four-day CCNA bootcamp, and producing automatons who never question Cisco’s decisions was also great for business.  In product management we call that “competitive advantage.”

Since I’ve been doing management stuff for eight years my CLI skills are a little rusty.  I’m going to walk through the VRRP problem in a lot more detail, largely for my own benefit, and then circle back to the “Why?” question in the broader sense.

VRRP and IPv6

Virtual Router Redundancy Protocol (VRRP) is a protocol used to provide default gateway redundancy, and is an IETF cousin of Cisco’s earlier Hot Standby Routing Protocol (HSRP).  VRRP allows two (or more) routers to share a single IP address, which one router actively responds to.  Should the router go down, the secondary router begins responding to that address.  If you’re reading this blog, you probably already know that.

VRRP is simple to configure.  I set up a VRRP group in my lab, using VRRP v3, which supports both IPv4 and IPv6 addresses:

s1(config-if)#vrrp 1 address-family ipv4
s1(config-if-vrrp)#address 10.1.1.100 primary
s1(config-if-vrrp)#priority 200

Here you can see I specify we’re using a v4 address, and I made 10.1.1.100 the primary address.  (You can have more.)  I also configured the priority value for this router (actually switch).  I did the same on the other router with a lower priority, and viola, we have a VIP responding to traffic.  From another router on this segment:

r1#p 10.1.1.100
.!!!!

 

How does ARP work in this case?  The MAC address used comes from a reserved block designated for VRRP.  The last octet is the VRRP group number.

r1#sh arp
Internet 10.1.1.100 0 0000.5e00.0101 ARPA Ethernet0/0

 

If this were VRRP group 2, then the MAC address would be 0000.5e00.0102.

So far so good, but how does it work for IPv6?

If I just attempt to configure a global IPv6 address with VRRP, a couple funky things happen.  First, I cannot add the “primary” keyword:

s1(config-if)#vrrp 1 address-family ipv6
s1(config-if-vrrp)#address 2001:db8:cafe:100::100/64
s1(config-if-vrrp)#address 2001:db8:cafe:100::100/64 pri?
% Unrecognized command

 

It will take the command without the “primary” keyword, but VRRP gets stuck in INIT:

Vlan111 - Group 1 - Address-Family IPv6
State is INIT (No Primary virtual IP address configured)

 

As Ivan points out, this is per the RFC.   The RFC requires that the source IPv6 address be a link-local address.  If you haven’t done a lot of v6, recall that each interface will have a globally routable IPv6 address as well as a link-local address, which, as the name implies, can only be used on the local link and is not routable beyond that.  Without getting into all the details, the link local address is assigned from the FE80::/10 block, and on Ethernet interfaces it is made unique by using the interface MAC address.

OK, the RFC says:

“…each router has its own link-local IPv6 address on the LAN interface and a link-local IPv6 address per VRID that is shared with the other routers that serve the same VRID.” (Emphasis added.)

So the router has its own address (assigned automatically when IPv6 comes up), but each virtual router ID requires a link-local address too.  Elsewhere, the RFC states the primary address must be the link-local address.  But why?

The RFC is explicit that this link-local address is the source-address for VRRP packets, so presumably this is to keep the VRRP packets local, and to prevent them from being routed outside the segment.

Fair enough.  Let’s add a link-local primary address:

s1(config-if)#vrrp 1 address-family ipv6
s1(config-if-vrrp)#address FE80::1 primary

And…it works!

Vlan111 - Group 1 - Address-Family IPv6
  State is MASTER

Easy, right?  One more step, perhaps, but it’s not too hard.  Except, as Ivan rightly points out, some vendors (like Arista) assign this link-local address automatically for you based on the group MAC address.  Aside from the fact that this makes it easier (no need to keep track of another VRRP address), it makes interoperability challenging.  If I want my IOS XE device to be in the same VRRP group as an Arista device, I need them both agreeing on the LLA.  (Although, to be honest, I’m not totally convinced on this use case.  Usually devices peering in VRRP groups will be identical hardware, even in mixed-vendor environments.  But I might be wrong.)

Let’s go with Ivan’s thought that this is poorly designed.  I have access to internal Cisco tools that Ivan doesn’t.  What are people saying?  Are we getting TAC cases on this?

It turns out, not many.  I found one case in Spanish, which I can read (even without ChatGPT).  The customer complained he configured VRRP “pero no hay ping”.  No “hay ping” because no hay link-local address.  The TAC engineer pointed the customer to a bug, so why isn’t it fixed?  Well, the bug (login may be required) is a documentation bug, requesting the link-local address be added to the VRRP docs.  They added this:

Device(config-if-vrrp)# address FE80::46B6:BEFF:FE50:DBB0 primary

 

…but nowhere do they tell you how they derived that address.  Meanwhile, on one of our internal mailers, an engineer asked about just using FE80::1 like I did.  The development engineer working on IPv6 VRRP replied:

Is the FE80::1 recommended?    <<< usage is fine AFAIK

 

Usage is fine, ok, but is this the best way to go about assigning link-local addresses for VRRP?

The “Why?” Question

Back to the larger question of why.  Cisco is not the only company to have built confusing CLI.  In Ivan’s very post, he even notes that Dell allows you to skip the link-local address altogether, in flagrant violation of the RFC.  And years ago I wrote pieces on utterly baffling Juniper CLI, such as the concept of and syntax for configuring RIB Groups.

Who designs the CLI?  A committee of experts who used to work at Apple, after thorough user experience research, you think?  More than likely, it was designed by the engineer charged with implementing the protocol.  This engineer is certainly not a network engineer and does not need to deal with operating a network.  Sure, he/she can configure routers a bit, but mainly for lab testing of his own features.  What makes sense to an engineer burning the oil at 3am in Bangalore might not make sense for the rest of us, but as long as the feature works, some way, some how, he can close his assignment and move on to the next thing.

But surely the executives at the vendor care, right?  Short answer:  they will care, and care a lot, if they meet with a giant customer and the giant customer screams at them that feature XYZ is impossible to configure.  Otherwise, it’s highly unlikely a senior vice president cares how VRRP is configured, and also unlikely he knows what it is.  Thus, the development of CLI can be a bit Lord of the Flies.  No adults in the room, so to speak.

What about filing an actual bug to get it fixed?  I could do so, for Ivan’s sake, but two problems.  First, the bug will be the lowest severity possible and engineering will be more focused on other things.  Second, if they do get to it, they won’t want to fix it for fear of breaking existing configs.  Years ago I brought up the problem with IOS XE’s model-driven management syntax.  To enable NETCONF you type “netconf-yang” at the global level. To enable RESTCONF you type “restconf”.  GNMI is “gnmi-yang”.  Couldn’t we have a code block called “model-based-management” (or something) and then you enable each of the features under that?  No, engineering told me, people are already using it the old way.  (Honestly, like, nobody was in 2016, but anwyays…)  Great idea, Jeff, but we can’t do it.

The TL;DR summary for you:  Tony Marshall was right.  Never ask: “Why?”

Postscript for those who think AI will replace network engineers:

I asked ChatGPT to provide me a configuration for VRRP v6, and provided it the global standby address.  ChatGPT gave me the configuration without a primary address.  Then, when I showed ChatGPT the “show vrrp” output which explicitly says “No primary virtual IP address configured”, it responded by telling me to:

  1. Make sure IPv6 is enabled globally. (!)
  2. Make sure the interface is up (!!)
  3. Reapply exactly the same config [without an LLA] (!!!)

So apparently we built CLI so confusing that even genius-level AI cannot figure it out.  Or, we got a ways to go before we get replaced by ChatGPT.  Just sayin’, “analysts.”

HPE Buys Juniper

In 2007, I left Cisco after two brutal years in high-touch TAC.  I honestly hated the job, but it was an amazing learning experience.  I draw on my TAC experience every single day.  A buddy of mine got a job at a Gold Partner, offered to bring me in, and I jumped on the opportunity.  Things didn’t go so well, and in 2009, I was laid off and looking for a job again.  That’s when another buddy (buddies help!) called me and told me of an opportunity at Juniper.

I knew little about Juniper.  We had a Juniper SSL box in the network I used to manage, but the routers were mostly for service provider networks.  When I was at TAC, I had one case where a major outage was caused by misconfiguration of a Juniper BGP peer.  But otherwise, I didn’t know a thing.

The opportunity was to be the “network architect” for Juniper’s corporate network.  In other words, to work in internal IT at a network vendor.  It seemed like a good career move, but little did I know I would be thrust the corporate politics at the director-level instead of technical challenges.  I ended up spending six tumultuous years there, with several highlights:

  • My boss disappeared on medical leave on my very first day.
  • I was re-assigned to a Sr. Director who was an applications person and not knowledgeable in networking.  He viewed the network a bit like Col. Kendrick, the Marine, viewed the Navy in the movie A Few Good Men:  “Every time we gotta go some place to fight, you fellas always give us a ride.”
  • I proposed and got buy-off for a program to ensure we actually ran our own gear internally and to ensure we built solid network architectures.
  • I subsequently had the program taken away from me.
  • I found out a job posting with the identical title and JD to mine was listed on Juniper’s public site without my knowledge.
  • My manager was changed to a person two pay grades below me in another country without even informing me.  (Someone noticed it in the directory and told me.)
  • I quit in disgust, without any other job.
  • I was talked into staying.
  • After another year or misery, I was demoted two pay grades myself.
  • I focused on doing the best job I could ended up getting re-promoted to director and left on good terms.

Some of the above was my own fault, much of it was dysfunctional management, some of it was the stupidity we all know lurks in every good size company.  I actually bear Juniper no resentment at all.

I worked at Juniper in the pre-Mist days, and in the midst of the fiscal crisis that began in 2008.  We went from CEO Kevin Johnson’s rah-rah “Mission10” pep rallies that we would be the “next $10B company” (uh, no), to draconian OpEx cuts when a pump-and-dump “activist investor” took over our board.

At the time I was there, Juniper made some mistakes.  NetScreen firewalls had done well for us, but then we made the decision to kill the NetScreen in favor of the JunOS-based SRX.  This is the classic mistake of product management–replace a successful, popular product with a made-from-scratch product with no feature parity.  There were some good arguments to do SRX, but it was done abruptly which signalled EOL to NetScreen customers, and SRX didn’t even have a WebUI.

We also did QFabric while I was there.  We installed one of these beasts in a data center on campus.  I have no idea if they improved it, but the initial versions took a full day to upgrade.  Imagine taking a day-long outage on your data center just to do an upgrade!

Another product that didn’t work out was Space.  JunOS Space came out at the time when the iPhone was still new.  Juniper borrowed the idea.  Instead of building an NMS product, we’d build a platform, and then software developers could build apps on top of it.  Cisco might be able to get away with that approach, but Juniper didn’t have enough of the networking market to attract developers.

In addition, a bunch of other acquisitions fizzled out, including Trapeze, our WAN accelerator, our load balancer.

All that said, Juniper had some fine products when I worked there.  (And believe me, my current employer has had many failures too.)  I got my JNCIE-SP, working on MX routers, which were a really good platform.  I thought the EX switches were decent.  And the operating system was nicely done.  Funnily enough, I worked a solid year on the JNCIE and promptly went to Cisco.  I never renewed it and now it’s expired.

I left after meeting with a strategy VP and explaining our mission to use Juniper’s corporate network to demonstrate how to build an enterprise network to our customers.  She looked at me (and the CIO) and said, “Juniper is done with enterprise networking.  I’m not interested.”  I left after that.  In her defense, Mist was years off and she couldn’t have seen it coming.

She was right, in that Juniper certainly had a core SP market.  Juniper came about at the time when Cisco was still selling 7500’s and 12000’s to its service provider customers, dated platforms running a dated OS.  Juniper did such a nice job with their platform that Cisco had to turn around and build the CRS-1 and IOS-XR, both of which had, ehm, similarities to Juniper’s products.  Juniper really couldn’t crack the enterprise market while I was there.  The lack of a credible wireless solution was always a problem.  Obviously Mist changed the game for them.

Juniper always felt like a scrappy anti-Cisco when I was there, but it was fast becoming corporatized and taken over by the MBAs.  Many old-schoolers would tell me how different things were in the startup days.  It still always had the attitude of an anti-Cisco.  One of our engineers ALWAYS referred to Cisco devices as “Crisco boxes”, and when I announced I was returning to Cisco, a long-time IT guy called me an “asshole”.  A couple funny stories around this:

A customer came in to our office for training and looked in the window of one the data centers nearby.  He saw it was packed with Cisco gear and subsequently published a video on social media captioned “Juniper uses Cisco.”  He didn’t realize that we leased the building from another company called Ariba, and the data center was theirs, not ours.  In fact, we worked very hard to not run Cisco in our internal network.  Juniper subsequently asked Ariba to block out the window.

One time we solicited a proposal from one of our largest service provider customers to host a data center for us.  The SP came back to us with an architecture which was 100% Cisco.  Cisco switches, Cisco routers, Cisco firewalls.  I told the SP I would never deploy our DC on Cisco gear.  What if a major bug hit Cisco devices causing outages and our data center went down too?  What if we got hacked due to a Cisco PSIRT and it became public?

The SP didn’t care.  We were their customer, but they were also ours.  They used Cisco in their data center, and had no desire to re-tool for another vendor.  I escalated all the way to the CEO, who agreed with me, and the deal was scuttled.  Ironically, I used this story in my Cisco interviews when asked for an example of a time when I had taken a strong stand on something.

I work at Cisco now, and even ran the competitive team for a while.  Competition is healthy and makes us all better.  I actually value our competition.  Obviously my job is to win deals against them, but I have friends who work at Juniper and I have friends who work at HPE.  We’re all engineers doing our jobs, and I wish them no ill will.  I always respected Juniper, their engineering, and their scrappy attitude.  While I know some of this will be retained as they get absorbed into a large corporation, it’s definitely the end of an era, for the industry and for me.

39
1

Juniper’s mysterious inet.3 table

When I first started configuring MPLS on Juniper routers, I came across the strange and mysterious inet.3 table.  What could it possibly be?  When I worked in Cisco TAC I handled hundreds of MPLS VPN cases, but I never had encountered anything quite like inet.3 in IOS land.  As I researched inet.3 I found the documentation was sparse and confusing, so when I finally came to understand its purpose I decided to create a clear explanation for those who are searching in vain.  I will focus on the basics of how inet.3 works, leaving details of its use for later posts. Continue reading