Skip navigation

I must admit, I’m a huge fan of Ivan Peplnjak.  This despite the fact that he is a major thorn in the side of product management at Cisco.  It is, of course, his job to be a thorn in our side and Ivan is too smart to ignore.  He has a long history with Cisco, with networking, and his opinions are well thought out and highly technical.  He is a true network engineer.  The fact that I like Ivan does not mean he gave me an easy time a few years back when I did a podcast with him on NETCONF at Cisco Live Berlin.

Ivan had an interesting post recently entitled “Keep Blogging, Some of Us Still Read“.  It reminds of my own tongue-in-cheek FAQ for this blog, in which I said I wouldn’t use a lot of graphics because I intended my blog for “people who can read”.  As a blogger, I think I quite literally have about 3 regular readers, which occasionally makes me wonder why I do it at all.  I could probably build a bigger readership if I worked at it, but I really don’t work at it.  I think part of the reason I do it is simply that I find it therapeutic.

Anyhow, the main claim Ivan is responding to is that video seems to be dominant these days and blogging is becoming less rewarding.  There is no question video creation has risen dramatically, and in many ways it’s easier to get noticed on YouTube than on some random blog like mine.  Then again, with the popularity of SubStack I think people are actually still reading.

Ivan says “Smart people READ technical content.”  Well, perhaps.  I remember learning MPLS back in 2006 when I worked at TAC.  I took a week off to run through a video series someone had produced and it was one of the best courses I’ve taken.  Sometimes a technical person doesn’t want to learn by reading content.  Sometimes listening to new concepts explained well at a conversational pace and in a conversational style is more conducive to actually understanding the material.  This is why people go to trade shows like Cisco Live.  They want to hear it.

I’ve spent a lot of time on video lately, developing a series on technical public speaking as well as technical videos for Cisco.  In the process I’ve had to learn Final Cut Pro and DaVinci resolve.  Both have, frankly, horrendous user interfaces that are hard to master.  Nine times out of ten I turn to a YouTube video when I’m stuck trying to do something.  Especially with GUI-based tools, video is much faster for me to learn something than screen shots.

On the other hand, it’s much harder to produce video.  I can make a blog post in 15 minutes.  YouTube videos take hours and hours to produce, even simple ones like my Coffee with TMEs series.

The bottom line is I’m somewhere down the middle here.  Ivan’s right, technical documentation in video format is much harder to search and to use for reference.  That said, I think video is often much better for learning, that is for being guided through an unfamiliar concept or technology.

If you are one of my 3 regular readers and you would prefer to have my blogs delivered to your inbox, please subscribe at https://subnetzero.substack.com/ where I am cross-posting content!

A post recently showed up in my LinkedIn feed.  It was a video showing a talk by Steve Jobs and claiming to be the “best marketing video ever”.  I disagree.  I think it is the worst ever.  I hate it.  I wish it would go away.  I have deep respect for Jobs, but on this one, he ruined everything and we’re still dealing with the damage.

A little context:  In the 1990’s, Apple was in its “beige box” era.  I was actively involved in desktop support for Macs at the time.  Most of my clients were advertising agencies, and one of them was TBWA Chiat Day, which had recently been hired by Apple.  Macs, once a brilliant product line, had languished, and had an out-of-date operating system.  The GUI was no longer unique to them as Microsoft had unleashed Windows 95.  Apple was dying, and there were even rumors Microsoft had acquired it.

In came Steve Jobs.  Jobs was what every technology company needs–a visionary.  Apple was afflicted with corporatism, and Jobs was going to have none of it.

One of his most famous moves was working with Chiat Day to create the “Think Different” ad campaign.  When it came out, I hated it immediately.  First, there was the cheap grammatical trick to get attention.  “Think” is a verb, so it’s modified by an adverb (“differently”).  By using poor grammar, Apple got press beyond their purchased ad runs.  Newspapers devoted whole articles to whether Apple was teaching children bad grammar.

The ads featured various geniuses like Albert Einstein and Gandhi and proclaimed various trite sentiments about “misfits” and “round pegs in square holes”.  But the ads said nothing about technology at all.

If you watch the video you can see Jobs’ logic here.  He said that ad campaigns should not be about product but about “values”.  The ads need to say something about “who we are”.

I certainly knew who Chiat Day was since I worked there.  I can tell you that the advertising copywriters who think up pabulum like “Think Different” couldn’t  write technical ads because they could barely turn on their computers without me.  They had zero technological knowledge or capability.  They were creating “vision” and “values” about something they didn’t understand, so they did it cheaply with recycled images of dead celebrities.

Unfortunately, the tech industry seems to have forgotten something.  Jobs didn’t just create this “brilliant” ad campaign with Chiat Day.  He dramatically improved the product.  He got Mac off the dated OS it was running and introduced OS X.  He simplified the product line.  He killed the Apple clone market.  He developed new chips like the G3.  He made the computers look cool.  He turned Macs from a dying product into a really good computing platform.

Many tech companies think they can just do the vision thing without the product.  And so they release stupid ad campaigns with hired actors talking about “connecting all of humanity” or whatever their ad agency can come up with.  They push their inane “values” and “mission” down the throats of employees.  But they never fix their products.  They ship the same crappy products they always shipped but with fancy advertising on top.

The thing about Steve Jobs is that everybody admires his worst characteristics and forgets his best.  Some leaders and execs act like complete jerks because Steve Jobs was reputed to be a complete jerk.  They focus on “values” and slick ad campaigns, thinking Jobs succeeded because of these things.  Instead, he succeeded in spite of them.  At the end of the day, Apple was all about the product and they made brilliant products.

The problem with modern corporatism is the army of non-specialized business types who rule over everything.  They don’t understand the products, they don’t understand those who use them, they don’t understand technology, but…Steve Jobs!  So, they create strategy, mission, values, meaningless and inspiring but insipid ad campaigns, and they don’t build good products.  And then they send old Jobs videos around on LinkedIn to make the problem worse.

I started this blog in 2016, and I never advertise it.  Partly this is because I don’t really care if people are reading it.  Partly it’s because I’m concerned some of my views might controversial in the industry and I don’t want to blast them far and wide.  When I started this blog, my goal was to provide clear technical explanations.  Ironically, given the title of the blog, my first big hits were articles about configuring Junos, such as the article on Juniper’s inet.3 routing table and the article on RIB groups.

Over time I started publishing memoir-type pieces, like my 10 years a CCIE series and TAC Tales.  I also started publishing reflections on the industry, business, and the corporate world in general.

The platform I’m using is WordPress on DreamHost, which requires a fair amount of maintenance.  Overall it’s been a good platform, and I like that I can make summary pages which collect my article series and make them easier to find.  I hate having to pick and customize themes and sometimes I have database issues.

SubStack is becoming a more popular platform and seems to be a lot easier to manage.  The subscription model works well also.  So, I imported the content of this blog into SubStack at https://subnetzero.substack.com/

For now I will cross-post between this blog and the substack to see how it goes.  I encourage any regular readers (if I have any!) to sign up there.  At this point I have no plans to charge for it.

If you have any thoughts, feel free to comment!

An old theory of personality holds that people fall into two types–A and B.  Put simply, Type A personalities are highly aggressive and competitive, whereas Type B are not.  We all have seen this broad difference in personalities.  Some people we encounter seem ready to walk over their own grandmothers to get ahead.  Like all stereotypes, this is a gross oversimplification, but there’s a lot of truth in it.

In the corporate world, type A personalities tend to rise to the top.  Why?  Because their very personality is aggressive and competitive.  They like to push, push, push for what they want and are willing to drive their agenda at any cost.  They frequently are talkers but rarely listeners.  They also judge people through their own lens.  If you’re an introvert, quiet, or deliberative, if you’re a listener instead of a talker, they think you are not “driven” and probably not worth listening to or promoting.

The question is:  does being type A make you right?  Does it make your opinions more valuable?  I cannot think of any reason why being aggressive and competitive makes you more likely to be correct about anything.  In fact, I think the opposite is true.  If you don’t listen well and are always pushing your own agenda, you’re less likely to consider the opinions of others, which means your decision-making is less well-rounded.

I’ve pointed out before, that quiet, deliberative people, type B’s, are often the ones you really want to listen to.  However, in the corporate world, they are often kicked to the curb.  “He never says anything in meetings,” the type A’s say.  Well, if you hired him maybe he’s actually a smart person but has a hard time contributing in a meeting with 20 people all talking fast.  Maybe he needs time to digest what he heard before providing recommendations.

Type A’s tend to be in positions of power not necessarily because they are smarter but because they fight for position in hierarchies.  This is not to say they are without value.  Their decisiveness and drive are very important to a healthy organization.  They can break indecision and move companies forward in ways type B’s cannot.

The key for both personality types is the old Greek maxim:  know thyself.  If you’re type A, you need to be careful not to be too aggressive.  Listen to your quieter colleagues, accept that they may have a different personality, and meet them where they are at.  Call on them in meetings, give them time to deliberate and come back to you.

For type B’s, you need to learn to speak out more.  You’re probably more respected than you realize, and when you do speak, you’re probably listened to.  Try to find forums that are more comfortable for you, like expressing your opinion in writing or 1:1’s.

Sadly, because of the cutthroat nature of the business world, I see little self-awareness and frequent domination of businesses by type A’s.  At the end they may get people to follow them, but if they’re leading you off a cliff, their drive may not be such a good thing.

When I worked at the San Francisco Chronicle, I started a project to bring Internet connectivity to a number of sites that had only limited mainframe circuits.  To do this I decided to get DSL lines and run IPSec over them, a relatively new way of doing things for the time.  It was a lot cheaper than the Frame Relay we used at larger sites.

After setting up connectivity at one of our sites, the local office manager called me.  Web pages, he said, were only loading partially.  Some of the text and none of the images would show up.

Everyone blamed the network for everything, so I punted him to desktop support.  I could ping across the tunnel, I could send traffic just fine, the latency was minimal, and nothing was obviously wrong.  The network is usually up or down, but web pages don’t partially load when everything else is working.  Degraded service might cause the pages to load slowly, but not partially.

The desktop guys told me it was my problem.  We had a constant battle, as nine times out of ten they blamed the network, and nine times out of ten it was not the network.  The office manager was getting angry, so I decided I would do some investigation on site and prove to the desktop guys that they were wrong.

I went to the office and fired up my laptop.  Pages were partially loading for me too.  Hmmm.  I did what every network engineer does and fired up a packet sniffer.

I could see the TCP handshake succeeding, and the browser requests and data exchange.  It looked normal, but why wasn’t the browser displaying the images?  I tried another browser and saw the same thing.

As I examined the sniffs, something hit me.  All the packets were being sourced with the Do not Fragment (DF) bit set in the header.  Could it be that the IPSec/GRE headers were causing the packets to be large enough to require fragmentation?  And why was Windows setting the DF bit anyways?

As I wasn’t a desktop guy, I left the latter question alone.  I jumped on the router and built a routing policy which cleared the DF bit on incoming packets.  The pages started loading fine.  I left the policy in place and hoped that there would not be any unanticipated consequences.  I never saw any.

Sometimes, it is, indeed, the network.

I’ve been thinking about the corporate world, how it operates, and the effects of corporatism on our lives.  If you’re a network engineer and think this is boring, pay attention.  Corporate culture, the influence of Wall Street, and the rise of a non-skilled management class have direct impact on your work and personal life.  The products you use are heavily influenced by corporate culture.  Why vendors release certain products, when, and how, are all controlled by corporate culture.  When a company tries to sell you something that doesn’t work and doesn’t serve your needs, when the company discontinues support for a product you bought after crashing and burning with it, when companies force products down your throat with buzzword messaging that means nothing to you, corporate culture explains it.

If you work in a corporation, the culture creates politics which affect what projects you work on, your career trajectory, and how you interact with your team.  In your personal life, the food you eat and drugs you take are very much explained by corporate culture.

I wrote in a previous post about the lack of anything permanent in the corporate world.  Everything seems to be temporary, everything is always in flux.  Companies are afflicted by short-term thinking, and short-term thinking is killing everyone.

One way this manifests itself is quarter-by-quarter thinking.  We all know sales people are judged on a quarterly basis, but corporations in general are as well.  Publicly traded companies have to present results to analysts, and thus to investors, every single quarter.  The results are compared against the last quarter, against the same quarter the previous year, and against other companies in the industry.  The results have a huge impact on stock price, executive compensation, and even executives’ jobs.

The effect of this trickles down to all levels of a public company.  Business units are judged by the quarterly performance of their products.  This means product managers are judged by the quarter, much like sales people.  Product managers are not commissioned directly like sales people, but they live and die by quarterly numbers.  As a result, they want to do everything possible to ensure quarterly numbers shine.

Now, imagine you are a product manager.  You have a deal worth, $20 million on the line if you deliver specific features the customer wants.  You are going to do anything possible to win the deal, so your quarterly numbers look good.  Now it probably is the case that the $20 million customer’s feature requests are specific to their environment.  That is, adding the features will help that one customer, but probably very few others.  So, instead of trying to build a product that caters to a broad range of customers who might bring smaller deals, you end up building a product that caters to a narrow set of customers that make you look good in your quarterly business reviews.

Now this type of short-term thinking might be an obvious problem if you planned to spend twenty years at your company.  But instead you spend two years at a company, so you only have to pull this off for eight quarters.  You can put big happy numbers in your LinkedIn profile (“successfully drove record quarter of $100 million in sales!”) and then exit stage right to repeat the process elsewhere.  And the folks left-behind have to clean up the mess.  Keep in mind your success within the company is also being judged by non-technical MBAs who are looking to do the same thing you are.

The companies that do the best long-term are those that eschew short-term thinking.  Apple is a great example of this.  They’ve had some disasters, but have generally taken risks to build products with long-term appeal.  I often mention Zappos founder Tony Hsieh, who while he had serious personal problems, forsook short-term gain for long-term performance.  Even within a company, quarterly thinking can vary by business unit and leader.

At the end of the day, however, it’s Wall Street that encourages this.  Like any metric, execs end up chasing their stock price like a dog chasing its tail.  It doesn’t get you anywhere, however much progress you may think you are making.  Meanwhile, you may get rich, but you leave disaster in your wake.

In my years in the corporate world, I’ve attended many corporate self-help type sessions on how to or increase leadership, creativity, and innovation.  There are many young consultants who are starting their careers off helping us to develop new skills in these areas, so I thought I would provide some helpful tips to get started.  Enjoy!

  1. If you are going to do consulting or presentations on innovation and leadership, it’s very important that you have never led anyone or invented anything.  Rather, you simply need to interview people who have done those things.  A lot of them.  Two or three thousand.  Actually, even if it’s only been 10 or 11, just say you’ve interviewed two or three thousand innovators or leaders.  This is called “research.”
  2. It’s especially important, if you are teaching career technology people how to innovate, that you loathe technology and cannot even upgrade your iPhone without help.  Remember, they may understand technology, but you understand how to innovate!  Two different things.
  3. You’re going to be making claims that are either wrong, or so obvious they don’t bear repeating.  Remember that you need to do several things to make those statements credible:
    • Begin by citing unverifiable claims from evolutionary biology as the basis for your statements.  Be sure to mention that we used to live out on open plains where we were at risk for being eaten.  Also be sure to mention “fight or flight.”  Bad:  “Strong leaders need to cultivate loyalty.” Good: “evolutionary biology has shown us that, back when we lived on the plain and were vulnerable to getting eaten by lions, our brains developed a need to be loyal to a leader.”
    • Next, cite the latest neuroscience to substantiate your claims.  In fact, it doesn’t have to be real neuroscience.  Remember, nobody will ever check!  Just say “the latest research on the brain has shown…” and leave it at that.  Bad:  “To be innovative we need time to think.”  Good:  “The latest neuroscience has shown that our brains can’t innovate when they are overwhelmed and don’t have time to reason properly.”
    • Remember, if you’re going to be hired by corporations and paid thousands of dollars in speaking fees, you need to state obvious truths in a technical way that makes you seem smart.  Invent new terminology so when you regurgitate to people what they already know, you sound authoritative.  For example, instead of saying, “criticism hurts people’s feelings and can cause them to leave,” invent a “criticism-despair cycle.”  Make a diagram with arrows showing “criticism->rejection->despair->attrition”.  See how much more impressive you sound already?
  4. It really helps if you are a “Doctor”.  There are many unaccredited diploma mills that will send you a Ph.D. based on your “life experience.”  Or better yet, just start calling yourself “doctor”.  Do you really think anyone will call and verify your doctorate?

Remember, the most lucrative careers don’t involve building skills through years of hard work, but telling people who know better than you how to do their jobs.  I hope you have a rewarding career as a consultant!

I’ve mentioned in the past how my first job in IT (starting in 1995) was as a “systems administrator” for a small company in Marin County, California.  The company designed and built museum exhibits, and its team of around 60 employees was split between fabricators, who built the exhibits, and office workers.  Some of the office workers did administrative work, while others were designers.  So, I was managing a network of around 30 computers, all Macs.

When I got to the company, the computers were networked using LocalTalk, a LAN technology from Apple, and specifically the PhoneNet variation.  PhoneNet was a product from Farallon Networks which enabled you to send the LocalTalk signal down a single pair of ordinary telephone wire.  The common practice was to use an extra pair of phone wires in the same cable that carried the user’s phone line.  In my first Netstalgia piece, I mentioned that my PhoneNet network was entirely passive, and ran into a lot of challenges as a result.

PhoneNet was also slow, and our designers had to transfer large files.  I decided to set up a separate Ethernet network for them.  All I knew about Ethernet was that it was faster, and that the higher-end PowerPC’s used by the designers supported it.  These computers had an AAUI port, a modification of the AUI port commonly in use for Ethernet connectivity at the time.  An AUI port required a transceiver to connect it to the Ethernet network.  Why?  Because we had Thicknet and Thinnet coaxial Ethernet, 10Base-T twisted pair and fiber optic Ethernet as well.  The universal AUI port (and Apple’s AAUI equivalent) gave you a choice of medium.

I didn’t really know how to make this work, and Google was not available at the time.  I had heard that you needed a “hub”, but I wasn’t sure exactly why or what the hub did.  The MacWarehouse catalogs I used to receive at the time advertised a product called an Etherwave, from the same company that made the PhoneNet transceiver.  The Etherwave allowed daisy-chaining of a twisted-pair Ethernet network.  I don’t know why, but this seemed easier and cheaper to me that buying a hub.  It was neither.

Farallon Etherwave Adapter

I bought a bunch of Etherwave adapters, got a ladder, and spend a night running Cat 3 cable in the suspended ceiling, and crimping RJ45 cubes.  Finally, I daisy chained everything together, and switched the computers to the new Ethernet network.  It worked very well–file transfers were screaming!

The designers loved it, but there was a flaw.  The Ethernet network was not connected at all to the LocalTalk network.  The LocalTalk network was where email, printing, and many other services resided.  Their computers had connections to both, but they had to go to a control panel and switch between one or the other.  That meant, if they wanted to do a file transfer, the two designers would have to shout to each other to switch networks, at which point they could do it peer-to-peer.

There was another problem.  Apple’s networking software, called OpenTransport, was notoriously buggy.  The switches between Ethernet and LocalTalk resulted in frequent crashes and reboots.  The initial thrill was wearing off.

I searched through catalogs and primitive websites looking for a solution.  I learned that I could buy a device called a router to connect the Ethernet and LocalTalk into a single unified network.  I desperately looked for the cheapest one.  My go-to vendor, Farallon, made a router but it was way too expensive.  Finally, I found a cheap router called a PathFinder manufactured by Dayna systems.

I went to my boss, the VP of operations.  I showed her the price (maybe $800?) and she balked.  This company ran a tight ship, and she said we couldn’t afford it.

I went back to our head designer and asked her to keep a post-in on her computer for a day, with a tally mark each time she had to reboot due to the Ethernet to LocalTalk switching.  Then we timed how long it took her to reboot.  I went back to my boss and showed her how much time was being lost each day to a single designer.  Her left hand flew over her the buttons on the calculator on her desk, then she looked up at me.  “Buy the router,” she said.

The PathFinder did indeed fix the problem.  And so my first Ethernet network, as well as my first experience configuring a router, came at a company in 1995 with 30 Macs, and I’ve spent decades working with both technologies since those days.

Some years ago, I worked for a company with a CEO who had a background in marketing.  It was 2010, and he decided to use his marketing skills to launch a huge new campaign called “Mission 10”.  Our goal:  to become the next $10 billion company.  At the time I think I revenue was less than five billion.  Slick slides were drawn up, pep rally company meetings were held, and everyone in the company began pivoting their work to fit the new agenda.  Anyone who has worked in the corporate world has been there more than once.  Suddenly every initiative had to have a “Mission 10” theme.

The problem?  Despite the rah-rah of our CEO, we never achieved even close to $10 billion in revenue.  In fact, that company is still below $5 billion last I checked.  The bigger problem?  The CEO moved three years after that, having never really achieved this or any other goal he set.  He later ended up CEO of an even larger and more famous company that has nothing to do with technology.  This is known as “failing upward”.

In light of the “great resignation” I’d like to write a little about permanence, or the lack thereof.  We live in a temporary world.  People pick up a job and stay for two or three years, and then move on.  This was true even before COVID.  I myself have several two-year stints on my resume.  The longest I’ve worked anywhere is six.

Three years is just long enough to kick off some major initiative and get out at the peak, just before the whole thing crashes.  The damage done by corporate executives pursuing this short-term strategy is massive.  It works like this.  An exciting new executive is hired on from a big company.  The new executive launches a new product, architecture, marketing campaign, acquisition, whatever.  Everyone rallies around it because, well he’s the boss, and because if you want funding for anything it needs to be tied to the boss’ initiative.  The new initiative (let’s say it’s a product) is pumped up with cash, the marketing engine kicks in, the company oversells the product, and then customers start snapping it up.  It doesn’t perform as expected.  Things start to crash.  Money dries up.  The executive exits.   And whoever decides to stay is left picking up the pieces of the mess that this guy created.

In Ancient Greece, you faced consequences for this sort of thing, usually exile, sometimes death.  While I’m not advocating the death penalty for corporate screw ups (although in some industries they do cost lives), what’s fascinating is that in the corporate world, the consequences are the opposite.  Said executive who just screwed up royally walks away with huge bonuses, lots of stock, has a nice sabbatical, and begins the cycle again somewhere else.

If you think executives are the only problem, think again.  It happens at every level of the corporate world.  When a junior IT guy messes up a new system and then bolts for another job, you have the same issue at a smaller scale.  He just doesn’t get the bonus and sabbatical.  As a leader of technical marketing engineers, we face all sorts of challenges when an experienced TME leaves and takes knowledge with him.  Features can be stalled when the people who were working on them leave.

In my grandfather’s era, and even my father’s, it was expected that you would start and end your career at the same company.  There was an expectation of permanence.  People were proud of their companies and how they were treated, and bragged about the excellent pension they’d receive when leaving.  Now, we spend three years and jump ship to boost our salary.

Companies, are of course, largely responsible.  Often they don’t create the sort of employment experience that anyone would want to tolerate for long.  People stopped being human beings and started becoming human “resources”.  Executives, under various pressures, began to see their workforce as mere “metrics” to be manipulated as they learned in their B-school classes.  Times are good?  Dial up the workforce.  Times are bad?  Lay off 3%.  People are just numbers on a slide to many execs, and the difficulties of terminating employment are a remote problem to be dealt with by line managers.  As a result, employment is not a long-term commitment but a short-term business transaction on both ends.

The temporary workforce has an interesting effect on longer-term employees as well.  Someone who has worked at the same company for 15 or 20 years sees executives and initiatives come and go, ebbing and flowing like the tide on a beach.  They often develop an apathy and callousness that makes their own work unproductive.  They tend to focus on the day-to-day instead of the long-term, and often dismissively ignore the plans of new leadership, figuring the leaders will just be replaced and the cycle will start over.  Thus, while they have a long-term career, they often have a short-term level of focus.

We all live in a temporary world now, and permanence is in short supply.  If you want to understand why companies build bad products, why executives start disastrous programs and leave, and why there never seem to be consequences, this is a huge part of it.  I don’t really have a solution I’m afraid. Some of the causes are:  greedy hedge-fund finance people who take over corporate boards, an undisciplined corporate press/media, the instant availability of information leading to a lack of deliberation, and the rise of a management class who are not actually experts in anything other than management itself.  We can all point fingers at ourselves for up and going when the going gets tough.

The Greek philosopher Heraclitus famously said that you cannot step in the same river twice.  He meant that, if you cross a river, each time you take a step the water you were originally standing in has passed on, and you’re in new water.  Thus, there’s really no river.  Sometimes the tech world, and the corporate world in general, feel like Heraclitus’ River.  Even if you stand in one place, everything just moves on.

I have to give AWS credit for posting a fairly detailed technical description of the cause of their recent outage.  Many companies rely on crisis PR people to phrase vague and uninformative announcements that do little to inform customers and put their minds at ease.  I must admit, having read the AWS post-mortem a couple times, I don’t fully understand what happened, but it seems my previous article on automation running wild was not far off.  Of course, the point of the article was not to criticize automation.  An operation the size of AWS would be simply impossible without it.  The point was to illustrate the unintended consequences of automation systems.  As a pilot and aviation buff, I can think of several examples of airplanes crashing due to out-of-control automation as well.

AWS tells us that “an automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network.”  What’s interesting here is that the automation event was not itself a provisioning of network devices.  Rather, the capacity increase caused “a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network…”  This is just the old problem of overwhelming link capacity.  I remember one time, when I was at Juniper, and a lab device started sending a flood of traffic to the Internet, crushing the Internet-facing firewalls.  It’s nice to know that an operation like Amazon faces the same challenges.  At the end of the day, bandwidth is finite, and enough traffic will ruin any network engineer’s day.

“This congestion immediately impacted the availability of real-time monitoring data for our internal operations teams, which impaired their ability to find the source of congestion and resolve it.”  This is the age-old problem, isn’t it?  Monitoring our networks requires network connectivity.  How else do we get logs, telemetry, traps, and other information from our devices?  And yet, when our network is down, we can’t get this data.  Most large-scale customers do maintain a separate out-of-band network just for monitoring.  I would assume Amazon does the same, but perhaps somehow this got crushed too?  Or perhaps what they refer to as their “internal network” was the OOB network?  I can’t tell from the post.

“Operators continued working on a set of remediation actions to reduce congestion on the internal network including identifying the top sources of traffic to isolate to dedicated network devices, disabling some heavy network traffic services, and bringing additional networking capacity online. This progressed slowly…”  I don’t want to take pleasure in others’ pain, but this makes me smile.  I’ve spent years telling networking engineers that no matter how good their tooling, they are still needed, and they need to keep their skills sharp.  Here is Amazon, with presumably the best automation and monitoring capabilities of any network operator, and they were trying to figure out top talkers and shut them down.  This reminds me of the first broadcast storm I faced, in the mid-1990’s.  I had to walk around the office unplugging things until I found the source.  Hopefully it wasn’t that bad for AWS!

Outages happen, and Amazon has maintained a high-level of service with AWS since the beginning.  The resiliancy of such a complex environment should be astounding to anyone who has built and managed complex systems.  Still, at the end of the day, no matter how much you automate (and you should), no matter how much you assure (and you should), sometimes you have to dust off the packet sniffer and figure out what’s actually going down the wire.  For network engineers, that should be a reminder that you’re still relevant in a software-defined world.