Netstalgia: Y2K

Gather around while I tell the story of the great Y2K bug, how close we came to societal collapse until we were saved by Deloitte consultants who selflessly worked night and day to stave off this peril!

In fact, the Y2K episode should serve as a reminder of the peril of “viral” thinking, panic and paranoia, and lack of accountability due to aforementioned “experts” patting themselves on the back for saving the world when in reality they did nothing.  Sounds vaguely like some other recent events in our history, but I guess I shouldn’t go there.

You all know the story.  To save on memory, computers used to store years in a two-digit format.  1987 was simply stored as “87”.  After all, if you were writing software in 1987, you didn’t expect anyone to be running it in 2000.  But now, when the year 2000 happened, the date would be stored as “00” and the computer would assume you meant “1900”.

It was a legitimate problem, particularly for banks which tended to run antiquated software.  The algorithms would get confused by the new dates and potentially make significant computational errors.  For those of you alive at the time, you’ll remember that there was a lot of concern about bank systems behaving unreliably in the new millennium, to the point where correcting it even featured as the job of the hapless Innotek worker in the classic movie Office Space.

Whipped up by the media, the concern expanded from bank software to, basically, all computers.  We were told that stoplights, phones, ovens, cars, and pretty much anything with a computer in would melt down come January 1, 2000.  Nike even made a hilarious ad parodying what people thought would happen.  Nonetheless, people were scared, with preppers stocking up food and ammunition for the coming chaos.

The big consulting companies were not ready to let a good crisis go to waste.  They sent armies of young MBAs in slick suits to solve the problem.  They showed up at companies and meticulously inventoried everything with a computer in it, documenting software versions and coordinating deployments of Y2K-related updates.

I took a more laissez-faire approach to the whole thing.  At the time I was operating as an independent consultant.  My largest customer was an ad agency in San Francisco.  The head of IT asked me to certify everything as Y2K compliant, including the network gear.

I couldn’t see, for the life of me, how a two-digit date code would impact traffic forwarding through a network switch.  We had 3Com switches, and since we had just moved into the office they weren’t even that old.  Even if they did have the date issue, at most I thought it might affect date/time stamps in the logs, a minor inconvenience since we used those only for troubleshooting and if we had an issue, we’d know exactly what “January 1, 1900” meant.  That said, these things were built in the 1980’s and I didn’t think that would even happen.

I told my client not to worry about it.  Nothing was going to happen.  He was an easygoing advertising agency guy, so he was fine with it.  We didn’t do anything at all to prep for Y2K.

On New Year’s Eve, 1999, most people I knew stayed home, convinced going out would be dangerous.  I grabbed a few friends and partied like it was, well, 1999.  The next day nothing was wrong except the hangover.

Of course, the consultants all patted themselves on the back.  They had saved the world from disaster.  I’m sure some of them helped with a handful of older critical systems.  But as we saw at my client, most of their work mattered not at all.  The vast majority of systems would be working just fine.

In late 2000 I took a job at the San Francisco Chronicle.  All of the switches, phones, copier machines, computers, servers, etc., had little stickers on them saying “Y2K compliant.”  I was told the entire IT department stayed at the office on New Years Eve.  They had a war room.  Afterwards, they all got a trip to Hawaii.

Viral paranoia is not a new phenomenon.  Usually it latches onto something real, but then explodes into fear of the totally unreasonable.  Our media culture whips up the frenzy, and then corporations realize they can profit from it.

I saved my client a heck of a lot of billable hours.  And I never even got a trip to Hawaii for it.

Layer 1 Friday: Flattened Pins

How about a little layer 1 Friday combined with Netstalgia?

Back in the 1990’s I was doing desktop and light network support for a variety of very small clients in San Francisco, mostly advertising agencies.  Ad agencies were experiencing a boom because of the dotcom era.  VCs funneled money into dotcoms who funneled the money to advertising agencies.  Then, they funneled the money into IT companies as they expanded and needed new computers, and so forth.  This worked great until it all came crashing down.

One of my “largest” clients was an advertising agency with some major corporate clients.  They had their own building on California Street, just up the hill from where Emperor Norton died, for those of you who know your San Francisco trivia.  They had, perhaps, 200 users in that building, not big by my standards today, but big when you were used to companies with 15 or 20 people.  My consulting company had an in-house consultant stationed there permanently, who happened to be a guy who happened to be a good friend of mine since high school.  I was assigned there one day a week to help him out.

The network was designed to the highest of our standards.  Rack-mounted hubs in IDFs on each floor, connecting into switches in the MDF on the entry level.  (The guy who designed it also connected all servers to the switches instead of hubs, on the theory that switches “are faster”.  Well, ok.)  Because this was a heavy Mac shop, and it was still the days of AppleTalk, we had a Compatible Systems router providing WAN, Internet, and AppleTalk Zoning.

It’s nearly impossible to find a picture of a Compatible Systems router these days

Not all of the Macs were Ethernet-capable, so we also had Farallon StarControllers in each closet.  These were hubs for Farallon’s PhoneNet technology, an adaptation of Apple’s LocalTalk networking system that ran over a single pair of ordinary phone wires.  (LocalTalk normally took fat serial cables that had to be daisy-chained.)

The structured wiring was truly beautiful.  Our cabling vendor did a nice job of running everything neatly and carefully, all terminated on patch panels in the wiring closets.  I had never seen anything like this before, being used to small offices where they slung cables over cubicle walls.

The StarControllers had a 50-pin cable, which then broke out and terminated onto an RJ11 patch panel.  If a user was running LocalTalk, we’d simply patch them to the wall using a phone wire with RJ11 on both ends.  In the wiring closet we’d then patch their location to the StarController patch panel.  The wall jacks and the terminations in the wiring closet were RJ45, but you could still just plug an RJ11 into the port and it fit.  It only used the center pair of wires, but it worked just fine.

Star Controllers normally broke out on 66 blocks, but this deployment used an RJ11 patch panel.

Or so we thought.  As time went on, more and more users got newer computers and converted over to Ethernet.  We’d remove their RJ11s and patch them in with RJ45s.  We started noticing odd behavior.  Stations that had been on LocalTalk before would drop off the network randomly.  We couldn’t figure out why.

I took a flashlight and looked into the patch panel jacks.  It turned out that where the RJ11’s had been inserted, they had flattened out the outer RJ45 pins.  Normally those pins, even if not in use, fit into a sort of slot in the connector.  But the RJ11 only had four pins in it, not eight, so there was no slot on the outer edges of the connector.  Those pins in the patch panel were being compressed by the plastic and eventually flattened to the point they couldn’t make a solid connection.

RJ11 will fit in RJ45 jacks but the plastic housing covers the outer pins

The correct solution?  Call the cabling vendor and have them replace the patch panels.  The easiest solution?  Take a spudger and carefully pry those pins back up into position.  Which one did we go with?  Well, would you want to tell the customer you had ruined all their patch panels?  Spudger it was!  And it worked just fine.  From then on we used RJ45-RJ11 custom patch cables for LocalTalk connections.

Every real network engineer owns a spudger

I’ve often mentioned my fondness for the early suite of Apple networking protocols, including PhoneNet.  Many network engineers, even of my age group (which we won’t mention) never got to experience them because Apple was a very niche market back then.  For all the talk about AI making networking turnkey and simple, Apple networks back in the 1990’s just worked.  Unless you flattened the pins.

What’s with the cosmic rays?

 

I was reading a Reddit thread bashing Cisco today.  It’s a few months old.  Some of the commentary is fair enough.  Some of it might be a bit unfair.  But I do think all vendor execs should spend time reading customer Reddit threads to understand how they’re perceived and what they can do better.

That’s not the subject of this post.  Rather, I was amused to read an offhand comment about “‘Cosmic rays’ taking down VIP2s.”  The poster thought it was a “bullshit excuse” and I just had to laugh.  I worked in TAC from 2005-2007, and cosmic rays were very much a thing back then.

I worked on a team called “Routing Protocols and Large Scale Architectures”.  We were supposed to work on routing protocols cases, but this was not backbone TAC (WW-TAC), this was High-Touch Tech Support (HTTS).  We didn’t have all the specialized teams of WW-TAC, so the RP queue ended up becoming the dumping ground for anything that didn’t fit into a specific queue.  While I was on RP, maybe 20% of my cases were actual routing protocol issues.  One of the things that went to our queue, for no good reason, was crash.

A customer’s router would reload and then come back up.  The “show version” output would tell you the reason why it reloaded.  Even if a router reloads and comes back up and is working fine, customers always want to know what happened.  Decoding crashes is a pain in the neck, as it is a forensic exercise and an attempt to decode the past.  TAC engineers prefer cases where the issue is still happening and they can troubleshoot it live.

Often the reloaded router (or line card) would helpfully provide a traceback, a list of the function pointers in the stack that had been executed up until the crash.  The top of the stack was the function that had crashed, and the rest of the stack showed you the functions that had been called leading up to that.  The traceback was a series of hexadecimal numbers that are meaningless to customers.  TAC engineers have tools that will decode the hex and tell you what the function names are.  TAC engineers aren’t software developers, so that’s only slightly more helpful.  The next step was to plop the function names into Topic, our internal search tool, and look for bugs that were filed with the same stack pointers, or perhaps another case with a similar traceback.  If you were lucky, there was a bug filed and fixed, you recommended the new code, and closed the case.

Most of the time, you were not lucky.  If you had good reason to believe this was a legitimate software bug, you could file one and try to get engineering to do the work.  Usually, the bug was either junked or marked “unreproducible.”

Sometimes we’d see crashes due to parity errors.  A parity error means, essentially, that something which was put into memory is not being read back correctly.  One explanation for this is bad memory.  Another?  Cosmic rays.

I didn’t say it was a good explanation.  But it was widely used by TAC engineers.  Depending on how credulous your customer was, it might work.  You see, often in TAC you end up in a situation where you have no explanation at all, but you want to give the customer something.  Often the customer will not accept “I don’t know” as an answer.  So, cosmic rays (or the related “sun spots”) was a TAC engineer’s attempt to close the case while providing some sort of explanation.

I remember when I first heard this.  I was in my cubicle in building K, a newly minted customer support engineer, staring at a parity error case.  Alex, one of the senior CSEs was looking at the case.  “Tell them it was sun spots,” Alex said.

“Sun spots?!” I remember asking in disbelief.

Alex smiled.  “That’s what engineering says causes these kinds of errors.”

I’m not sure if Alex even believed what he was saying.  I’m not even sure engineering had actually said that.  Maybe at some point, some where, a hardware engineer had speculated on it.  But the story of cosmic rays circulated and became a legend among TAC engineers.  I myself never used it after a couple attempts.  I found it hard to believe, and our large, high-touch customers received the explanation coldly.  Apparently, some of them still remember it and are bothered enough to post about it on Reddit.

P.S.  I asked ChatGPT if cosmic rays can cause parity errors in RAM.  It said “Yes, cosmic radiation can cause parity errors in RAM.”  So there you go, we were right all along.  Take that, Redditor!

Topological Qubits

I’ve been very unconvinced on the reality and benefits of quantum computing.  Sure, a lot of people with fancy degrees from fancy places say it will work miracles.  Sure, they make really impressive machines with cooling units that look like they’ll beam you into the movie Tron.  Sure, Microsoft just released a new chip with a Hollywood-grade video, B-roll of high-end oscilloscopes included.  All of it is based on some very-theoretical theoretical physics, and I’m not sure that these machines will deliver what they promise.  Sure, I may be wrong.  I’m often wrong with technology predictions.

Satya Nadella, the CEO of Microsoft, announced their Majorana 1 chip, their leap forward in quantum computing, by saying “Imagine a chip that can fit in the palm of your hand yet is capable of solving problems that even all the computers on Earth today combined could not!”

Is it just me or does anyone else find that statement, I dunno…maybe…a little fucking concerning?

More computing power than ALL of the computers on Earth combined?  Do we ever stop to think if this sort of thing is really a good idea?

Oh sure, their video, with slick and well-spoken physicists, extols the ability of their topological qubits to invent medicines, develop new materials, run EV batteries forever, and all but solve world hunger.

On the other hand, nuclear physics gave us both nuclear power and nuclear weapons.  Science fiction movies have been warning about world domination by machines for decades.  If we unleash ChatGPT powered by more horsepower than all computers combined, what the hell is it going to invent?  Why would this be confined to materials scientists in a lab?  Wouldn’t the machine start doing whatever it wants?  Why wouldn’t it invent a fatal virus, unleash it, and rule the world itself?  Or at least, wouldn’t nefarious human beings try to use it to cook up a weapon that could hold the planet hostage, like in a James Bond movie?  Could it enable mass-scale spying and privacy invasion by governments?  Will it be smart enough to warn us of the negative consequences of the materials it invents, or will we be inundated by worse than the microplastics in our brains which weigh as much as a spoon?

I’ve been sick of the tech industry’s worship of technological progress for a long time.  All the hype assumes that technological progress is always and everywhere good.  But that’s been proven false, time and again.

Meanwhile, Nadella makes the asinine statement:  “When productivity rises, economies grow faster, benefiting every sector and every corner of the globe.”  I assume a computer that is more powerful than all computers put together will eliminate a hell of a lot of jobs.  Perhaps it might render human beings redundant.  Technological innovations always tend to replace human labor (except where it depends on third-world exploitation), wouldn’t a computer this ridiculously powerful destroy entire industries and career paths?

I’ve seen enough marketing to suspect Microsoft is exaggerating here.  They’re more than likely less interested in selling quantum computers, and more interested in selling quantum-ready products and services.  Anyways, I sure hope so.

 

Agentic AI for networking

As I’ve pointed out in several posts, and as you’ve certainly noticed, there is a teeny bit of hype surrounding AI these days. We’re told network engineers will be obsolete as our AI buddies take over our jobs. Want to roll out a network? No problem, your receptionist will do it for you while sipping a latte, pushing a button to call her AI agent. Some of us take a more realistic view. AI is a tool. An amazing one in many ways, but a tool nonetheless. And the big question is: how useful is it, really?

Most of the folks making the wild claims are marketers, “analysts”, and others who know little to nothing about networking. John Capobianco is certainly not one of these. John knows a lot about networking, and is a truly technical guy. (I’ve never met him in person, so he could be an AI agent, but I hope not.) John recently posted a detailed video about running an AI agent per device, and aggregating these up to a sort of agent-of-agents. His POC demonstrates that his AoA can go out and perform operations on multiple network devices. Cool.

That said, it brings me back to my early days in programmability, 2015 or so. One of the demos we used to run used Alexa. You could give voice commands to Alexa to go configure your routers, or collect data from them. Very cool! How many network engineers to date use Alexa to configure their networks? Approximately zero.

Of course, we weren’t really expecting to kick off a craze of Alexa-network-engineers. The demo was, as my peer Jason Frazier liked to say, about the “art of the possible.” Our model-driven interfaces made it that much easier to connect things in ways that weren’t previously possible.

As I mentioned in a recent post, programmability didn’t always click with network engineers. We would do a demo where we configured VLAN 100 using NETCONF. We just wrapped a giant block of XML in a giant block of Python, and–voilá–VLAN 100! The only problem was, every network engineer who saw the demo rolled his eyes and said, “I could do that with one line of CLI.”

Here’s the question for Capobianco: Is typing “Please configure an NTP server of 192.168.100.100 on all four devices” easier than, say, configuring an NTP server on all four devices? Or even using an Ansible script to do so? Is typing “What is the IP address of Ethernet 0/1 on R1 and R2” better than just going to R1 and R2 and looking at the “sh ip int brief” output?

We must also bear in mind that AI, still, has trouble with more complex tasks. In an earlier post I found it could not configure VRRP for IPv6 correctly. Even after providing it the operational output of the failed config, it still couldn’t provide the correct config. So, NTP servers are fine, but when we get into really complex stuff, will AI shine or fail?

I’ve been finding ChatGPT incredibly helpful working in my lab lately. I needed to do a few things on ProxMox, and it saved me a lot of searching through StackExchange. But if we want to go from useful tool to fully agentic network management, well, we have a long way to go. Right now the agentic demos feel a bit like the Alexa ones–the art of the possible, but not necessarily the probable.

Lab Notes: Proxmox

Note: I’m playing with themes right now. I don’t particularly like the one I’m on today, except code blocks look better and it reads nicer on phones. We’ll see if I can tweak the CSS to get it where I want it, or will need to switch themes again.

The nice thing about being an individual contributor again, and not being allowed to travel to Cisco Live in Amsterdam, is that I can spend more time in the lab. I acquired a UCS (actually DN appliance) server, for some new projects. It had a bunch of 2TB drives, and being a lab server I just threw them all in a RAID 0 for 12 TB. Simple. Next, getting a hypervisor up and running.

I’ve used ESXi for years, but it’s gotten a bit tiresome. The licensing is expensive and complex. For internal Cisco users, we need to go through a lot of hoops to get a license. So I decided to try ProxMox instead. It’s free, I don’t need to muck around with licenses, and so far, it seems to behave basically like ESXi. I had a little trouble finding a thing or two, but ChatGPT was quite helpful.

I’m setting up a GNS3 instance, on which I hope to install SONIC. The docs for doing SONIC on GNS3 say that you should install it on Windows, but I’m not convinced this is the case. I’m probably wrong and will find out. The SONIC image is 24 GB (!) so I had to make sure to account for that.

Anyways, the ProxMox install was a breeze. Just mounted the ISO via CIMC and away I went. My only complaint is that every time I log in to the browser, it asks me to buy a subscription. But this is not needed for day-to-day operation, just for updates.

So far, I’d recommend ProxMox enthusiastically. We’ll see if it holds up.

Legacy language

The spambots who regularly read my blog know that I have a fascination with language, and have a background in linguistics. So, I’m always fascinated by language and how we use it. This post is a bit pointless, but amusing (perhaps only to me).

In my last post I mentioned I might file a bug on VRRP. Apparently, I still have access to do this, although I cannot remember how to assign it.

Anyways, most Cisco customers know that a bug at Cisco is called a “DDTS”. We’ve called bugs DDTS’s as long as I’ve worked on Cisco products. But what does it actually stand for?

DDTS stands for Distributed Defect Tracking System. DDTS is actually the name of the bug-tracking software that Cisco used, a long time ago, to file and track bugs. DDTS was replaced long ago by CDETS, Cisco Defect and Enhancement Tracking System. Even when I was in TAC in 2005, we used CDETS, and not DDTS, to file and track bugs.

Not only is the term DDTS obsolete, it doesn’t even make sense if you expand the acronym. “There is a DDTS in that version of IOS XE” actually means “There is a distributed defect tracking system in that version of IOS XE”. Not likely!

Language evolves and words often lose meaning as they are used. I remember the Greek word “thumos” which meant something like “warrior spirit” in the days of Homer. A positive thing, something you’d want to have, a certain virility or militancy. By the time of New Testament Greek, 700 or so years later, it just means “anger”.

So, at Cisco my exec will have an “off-site”. This used to mean you’d go to Hawaii. Then, a Hilton in another city. Then, the local Hilton. Then, another building on the same campus. Now, at Cisco, we have “off-sites” on the same floor of the same building we work in every day. Nothing off-site about that!

Maybe we should have an off-site about DDTS’s…

Themes, again and again

For my regular readers, as I like to say, all 2 of you, I’ve noticed that the theme I switched to a few months back does not display well on mobile. In particular, my last post had code blocks which get cut off when viewing on a mobile browser. Now that I’m not a manager-type anymore, I’d like to do more posts with code blocks, so this isn’t great. But even text-based articles don’t display well with this theme.

I’m working on finding a mobile-responsive theme, you may notice some changes to the look and feel over the next few weeks. I like dark themes, so hopefully I can find one that works.

VRRP, IPv6, and the “Why” Question

Ivan Pelpnjak has an interesting post (H/T to Ethan Banks) critiquing Cisco’s VRRP v3 CLI implementation for IPv6.  It got me thinking about a question I was told never to ask:  “Why?”

Never ask why?

Back when I really wanted to get into networking, I took a five-day bootcamp with Global Knowledge taught by a gentleman named Tony Marshall.  It was four-days of study, and then on day 5 you took the CCNA exam.  Tony was an awesome teacher, and I passed with flying colors.  However, if ever you asked Tony “Why?”, he would interrupt you before you finished your sentence and say, “never ask why!”  The way Cisco implemented it was the gospel truth handed down from above.  This is probably the correct approach for a four-day CCNA bootcamp, and producing automatons who never question Cisco’s decisions was also great for business.  In product management we call that “competitive advantage.”

Since I’ve been doing management stuff for eight years my CLI skills are a little rusty.  I’m going to walk through the VRRP problem in a lot more detail, largely for my own benefit, and then circle back to the “Why?” question in the broader sense.

VRRP and IPv6

Virtual Router Redundancy Protocol (VRRP) is a protocol used to provide default gateway redundancy, and is an IETF cousin of Cisco’s earlier Hot Standby Routing Protocol (HSRP).  VRRP allows two (or more) routers to share a single IP address, which one router actively responds to.  Should the router go down, the secondary router begins responding to that address.  If you’re reading this blog, you probably already know that.

VRRP is simple to configure.  I set up a VRRP group in my lab, using VRRP v3, which supports both IPv4 and IPv6 addresses:

s1(config-if)#vrrp 1 address-family ipv4
s1(config-if-vrrp)#address 10.1.1.100 primary
s1(config-if-vrrp)#priority 200

Here you can see I specify we’re using a v4 address, and I made 10.1.1.100 the primary address.  (You can have more.)  I also configured the priority value for this router (actually switch).  I did the same on the other router with a lower priority, and viola, we have a VIP responding to traffic.  From another router on this segment:

r1#p 10.1.1.100
.!!!!

 

How does ARP work in this case?  The MAC address used comes from a reserved block designated for VRRP.  The last octet is the VRRP group number.

r1#sh arp
Internet 10.1.1.100 0 0000.5e00.0101 ARPA Ethernet0/0

 

If this were VRRP group 2, then the MAC address would be 0000.5e00.0102.

So far so good, but how does it work for IPv6?

If I just attempt to configure a global IPv6 address with VRRP, a couple funky things happen.  First, I cannot add the “primary” keyword:

s1(config-if)#vrrp 1 address-family ipv6
s1(config-if-vrrp)#address 2001:db8:cafe:100::100/64
s1(config-if-vrrp)#address 2001:db8:cafe:100::100/64 pri?
% Unrecognized command

 

It will take the command without the “primary” keyword, but VRRP gets stuck in INIT:

Vlan111 - Group 1 - Address-Family IPv6
State is INIT (No Primary virtual IP address configured)

 

As Ivan points out, this is per the RFC.   The RFC requires that the source IPv6 address be a link-local address.  If you haven’t done a lot of v6, recall that each interface will have a globally routable IPv6 address as well as a link-local address, which, as the name implies, can only be used on the local link and is not routable beyond that.  Without getting into all the details, the link local address is assigned from the FE80::/10 block, and on Ethernet interfaces it is made unique by using the interface MAC address.

OK, the RFC says:

“…each router has its own link-local IPv6 address on the LAN interface and a link-local IPv6 address per VRID that is shared with the other routers that serve the same VRID.” (Emphasis added.)

So the router has its own address (assigned automatically when IPv6 comes up), but each virtual router ID requires a link-local address too.  Elsewhere, the RFC states the primary address must be the link-local address.  But why?

The RFC is explicit that this link-local address is the source-address for VRRP packets, so presumably this is to keep the VRRP packets local, and to prevent them from being routed outside the segment.

Fair enough.  Let’s add a link-local primary address:

s1(config-if)#vrrp 1 address-family ipv6
s1(config-if-vrrp)#address FE80::1 primary

And…it works!

Vlan111 - Group 1 - Address-Family IPv6
  State is MASTER

Easy, right?  One more step, perhaps, but it’s not too hard.  Except, as Ivan rightly points out, some vendors (like Arista) assign this link-local address automatically for you based on the group MAC address.  Aside from the fact that this makes it easier (no need to keep track of another VRRP address), it makes interoperability challenging.  If I want my IOS XE device to be in the same VRRP group as an Arista device, I need them both agreeing on the LLA.  (Although, to be honest, I’m not totally convinced on this use case.  Usually devices peering in VRRP groups will be identical hardware, even in mixed-vendor environments.  But I might be wrong.)

Let’s go with Ivan’s thought that this is poorly designed.  I have access to internal Cisco tools that Ivan doesn’t.  What are people saying?  Are we getting TAC cases on this?

It turns out, not many.  I found one case in Spanish, which I can read (even without ChatGPT).  The customer complained he configured VRRP “pero no hay ping”.  No “hay ping” because no hay link-local address.  The TAC engineer pointed the customer to a bug, so why isn’t it fixed?  Well, the bug (login may be required) is a documentation bug, requesting the link-local address be added to the VRRP docs.  They added this:

Device(config-if-vrrp)# address FE80::46B6:BEFF:FE50:DBB0 primary

 

…but nowhere do they tell you how they derived that address.  Meanwhile, on one of our internal mailers, an engineer asked about just using FE80::1 like I did.  The development engineer working on IPv6 VRRP replied:

Is the FE80::1 recommended?    <<< usage is fine AFAIK

 

Usage is fine, ok, but is this the best way to go about assigning link-local addresses for VRRP?

The “Why?” Question

Back to the larger question of why.  Cisco is not the only company to have built confusing CLI.  In Ivan’s very post, he even notes that Dell allows you to skip the link-local address altogether, in flagrant violation of the RFC.  And years ago I wrote pieces on utterly baffling Juniper CLI, such as the concept of and syntax for configuring RIB Groups.

Who designs the CLI?  A committee of experts who used to work at Apple, after thorough user experience research, you think?  More than likely, it was designed by the engineer charged with implementing the protocol.  This engineer is certainly not a network engineer and does not need to deal with operating a network.  Sure, he/she can configure routers a bit, but mainly for lab testing of his own features.  What makes sense to an engineer burning the oil at 3am in Bangalore might not make sense for the rest of us, but as long as the feature works, some way, some how, he can close his assignment and move on to the next thing.

But surely the executives at the vendor care, right?  Short answer:  they will care, and care a lot, if they meet with a giant customer and the giant customer screams at them that feature XYZ is impossible to configure.  Otherwise, it’s highly unlikely a senior vice president cares how VRRP is configured, and also unlikely he knows what it is.  Thus, the development of CLI can be a bit Lord of the Flies.  No adults in the room, so to speak.

What about filing an actual bug to get it fixed?  I could do so, for Ivan’s sake, but two problems.  First, the bug will be the lowest severity possible and engineering will be more focused on other things.  Second, if they do get to it, they won’t want to fix it for fear of breaking existing configs.  Years ago I brought up the problem with IOS XE’s model-driven management syntax.  To enable NETCONF you type “netconf-yang” at the global level. To enable RESTCONF you type “restconf”.  GNMI is “gnmi-yang”.  Couldn’t we have a code block called “model-based-management” (or something) and then you enable each of the features under that?  No, engineering told me, people are already using it the old way.  (Honestly, like, nobody was in 2016, but anwyays…)  Great idea, Jeff, but we can’t do it.

The TL;DR summary for you:  Tony Marshall was right.  Never ask: “Why?”

Postscript for those who think AI will replace network engineers:

I asked ChatGPT to provide me a configuration for VRRP v6, and provided it the global standby address.  ChatGPT gave me the configuration without a primary address.  Then, when I showed ChatGPT the “show vrrp” output which explicitly says “No primary virtual IP address configured”, it responded by telling me to:

  1. Make sure IPv6 is enabled globally. (!)
  2. Make sure the interface is up (!!)
  3. Reapply exactly the same config [without an LLA] (!!!)

So apparently we built CLI so confusing that even genius-level AI cannot figure it out.  Or, we got a ways to go before we get replaced by ChatGPT.  Just sayin’, “analysts.”

My years in management: Looking forward

 

As of December, my role at Cisco has transitioned from a leadership role back to an individual contributor.  Gone are the constant approval emails;  gone is the stack ranking of employees;  gone are the performance reviews and the one-on-ones.  It’s both relieving and eerily quiet.  As I make this transition back, after nearly eight years, I thought it would be a good time to reflect on leadership and what it means to me.

I never wanted to be a “people manager”.  When I first started in networking, way back in the early 2000’s, I was explicitly clear about this to my boss.  I loved hands-on work.  I wanted to be in the trenches.  If I were a cop, I wouldn’t want to be a detective–I’d want to be in the patrol car.  I also lacked the self-confidence to take on a team.  Routers I could handle; people not so much.

I worked in Cisco TAC, then at a partner, and then at Juniper Networks.  In all those years, the idea of managing people never once came up.  I didn’t ask for it, and nobody asked me to do it.  I was happy being the hands-on guy.  When I got married in 2012, I told my wife to never expect me to be in a leadership role.

That all changed when I came to Cisco for the second time in 2015.  I had landed my dream job as a technical marketing engineer.  I loved having a lab, doing Cisco Live presentations, writing blogs and books, and working with customers.  I was quite fine with this when my boss, Carl Solder, one day stunned me in our 1:1 by asking me to lead a small team.  I objected lightly–I want to do hands-on work, not manage people.  Don’t worry, he said, you still can.  To my surprise, as well as my wife’s, I said yes.

My first team had 12 people on it, covering Software-Defined Access, Assurance, and Programmability.  A bit of a hodge-podge.  The day the team was announced I was pulled one-by-one by my new team members into a room, to listen to each of their demands.  What had I gotten myself into?  My boss later told me that two experienced managers who were sitting outside the small conference room rolled their eyes and simultaneously said:  “welcome to management!”

The management philosophy I brought to my team didn’t come from books or coursework.  It came from my own observations.  Simply put, there are two management styles:  negative and positive.  Negative leaders are the most common.  They lack empathy and are thus unable to work with people.  They manage by assigning tasks and tracking metrics.  They pile on their people.  They are hard on their teams, task masters, and are mainly interested in their own self-promotion.  They see their team as a tool to achieve their own personal goals.  I’d had many such leaders, and worked poorly under them.  I struggled to remain motivated, and would usually do just enough work to get by.

Positive leadership looks at employees as individuals.  Positive leaders try to understand their employees’ needs and help them grow in their careers.  They look for potential in employees and try to develop that potential.  They try to align assignments to employee’s skills rather than forcing work upon people.  They look for strengths and play to those strengths.  Positive leaders are “servant leaders”, as much as I hate the cliche.  They care more about their people than themselves.  They promote their people rather than themselves.  Under this style of leadership, employees work hard because they feel their leadership cares about them.  They usually want to make their boss look good, and feel personally disappointed when they let their boss down.

I learned a simple rule from my first boss, Henry Sandigo, who practiced this style of leadership.  When your team fails, you take the blame as the leader.  When your team succeeds, you give them the credit.  Negative leaders do the opposite.  When their team succeeds they take the credit, and when their team fails, they blame their team.

One time, my team held up a software release due to critical bugs.  This upset engineering, who pushed back.  One of the product management leaders was furious with me.   He came to me, stood an inch from my face, and with bloodshot eyes yelled at me:  “I want the name of everyone who was in that room making the decision.”  I said to him he could have one name, mine, because it was my team and I was responsible.  It turns out we were right, but I had to endure the fury of that leader for a long time.

If negative leadership is so ineffective, why is it so common?  The answer is simple.  Negative leaders tend to be ladder-climbers and self-promoters, whereas positive leaders are humble by nature.  Negative leaders are always out for themselves, and in the corporate world, that tends to advance you to higher positions.  Additionally, if negative leaders are in power, they have no respect for positive leaders.  They tend to promote people with their own leadership style, and view positivity as soft and weak.

As a leader, you become involved in people’s lives, often quite intimately.  I had two employees go through bitter divorces while on my team.  My philosophy was to give them the room they needed to recover and get back to work.  I had interpersonal conflicts that went to HR.  I had one employee drop from a cardiac arrest and who has been in a coma for two and a half years.  I’ve been in the room with him and his crying family, and dealt with his long-term leave of absence.  Configuring routers is easy in comparison to the harsh realities of life.

I’ve also had to lay people off more times than I can count.  One of the main reasons I didn’t become a manager for so long was that I never wanted to lay someone off.  It’s an unfortunate reality in the corporate world today.  When I’ve made that call, some have been angry, some have cried, most were just quiet.  I can only say I hated having to do it, and that I fought hard to not do it.  In fact, I’ve been criticized for fighting too hard to save jobs.  I cannot really complain as it’s much harder to receive the call than to make it.  But I tried to see my employees as humans and to help them as much as I could.  The unfortunate reality of the corporate world is that people are just seen as OpEx, as numbers on a spreadsheet, and not as human beings whose lives are horribly affected by losing their jobs.  I don’t know if I will ever return to leadership, but I’d be happy if I never had to make those calls again.  (For what it’s worth, I once was on the receiving end of the call.)

I also got to make several happy calls.  Promoting people is one of the great joys of management.  I helped several people get director and principal promotions, which are very hard to get at Cisco.  Although they did the work, I did the back-end negotiation, and I’m proud of each one of them. 

I tried to go the extra mile for my employees.  During the COVID lockdowns, I drove around to each of my direct’s houses and surprised them with a Christmas gift.  They were grateful to see a co-worker after so much time in lockdown.  When one employee, a gin lover, had a really bad day with a difficult VP, I sent him a nice bottle of gin, at my own expense.  I found little touches like this go a long way in building loyalty and positivity in a team.

We all learn from the great leaders we worked for.  I mentioned Carl and Henry.  Bask Iyer was another one. Bask came in as CIO of Juniper at a time when working in IT was like working in a morgue.  We were all depressed, beat up by the business, and unmotivated.  Bask would defend us in company meetings when we got attacked.  He went to bat for us.  He was a great technologist, but what really impressed me was his ability to stand up for us.  Gary Clark, who reported to Bask, exuded the same positivity.  When you met with Gary for the first time, he had a series of lego blocks with different personality traits.  He would arrange them in order while he was talking to you, building a model of your personality.  In other words, Gary wanted to know who you were, how you thought, and meet you where you were.  I always appreciated that.  

At my peak, I had 50 people reporting to me and multiple layers of management.  Then, through attrition, it dwindled to 30, 20, then 8, and now none.  A team of 50 seems like a distant memory to me now.  It’s hard to believe I did that.

Many technical people come to the same fork in the road that I did.  Do you stay technical, or do you take a management job to advance your career?  Notice I put these in opposition.  I can affirm that when you go into the management track, your technical skills atrophy.  As much as I tried to maintain and work in a lab, it became nearly impossible.

There was one day when I was invited, as a senior director, to a meeting on software-defined access architecture with a bunch of distinguished engineers.   They were discussing an idea around multicast.  As I listened, I decided to interject:  “If you do it that way, you’ll break PIM registration.”  One of the DEs rolled his eyes, but then another said “wait, Jeff’s right!”  It was nice to know I still had it.

Those moments, however, become rare.  I know that all of my employees respected that I had a technical background.  It’s important that leaders in technology companies know technology.  But if you go the management route, you will definitely find the technical side of things recedes as people become your main concern.

The hardest thing about being a team leader at a company like Cisco is pleasing all the factions that will ultimately provide feedback on you.  The second you step into leadership, there is a target on your back.  The corporate world is Machiavellian.  Nice guys finish last.  If you try to partner with and please one leader, another leader will get upset.  Pivot to him, and the first guy gets mad.  This was especially true for TME, which is seen as a service organization.

It’s important to remember that the corporate world is vicissitudinous.  Over the course of the years, you will see roles come and go.   I’ve seen executives who were flying high one day shown the door the next, and a whole new regime comes in.

As I said at the beginning, in some ways it’s a relief.  On the other hand, one of our product management VPs saw me with my team and said I was like a “proud papa.”  While it’s nice to do things myself again, I can say I was proud of the teams I lead, and I miss taking pride in my team instead of myself.

Will I ever lead a team again?  Who knows.  If I’m asked I will gladly do it.  If not, I’ll do my job as an individual contributor.  I don’t think there’s much room in the corporate world for leaders with my style, anyways.

The upside is, I can now spend time in the lab.  My routers won’t ask for raises.

25
0