Skip navigation

I have to give AWS credit for posting a fairly detailed technical description of the cause of their recent outage.  Many companies rely on crisis PR people to phrase vague and uninformative announcements that do little to inform customers and put their minds at ease.  I must admit, having read the AWS post-mortem a couple times, I don’t fully understand what happened, but it seems my previous article on automation running wild was not far off.  Of course, the point of the article was not to criticize automation.  An operation the size of AWS would be simply impossible without it.  The point was to illustrate the unintended consequences of automation systems.  As a pilot and aviation buff, I can think of several examples of airplanes crashing due to out-of-control automation as well.

AWS tells us that “an automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network.”  What’s interesting here is that the automation event was not itself a provisioning of network devices.  Rather, the capacity increase caused “a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network…”  This is just the old problem of overwhelming link capacity.  I remember one time, when I was at Juniper, and a lab device started sending a flood of traffic to the Internet, crushing the Internet-facing firewalls.  It’s nice to know that an operation like Amazon faces the same challenges.  At the end of the day, bandwidth is finite, and enough traffic will ruin any network engineer’s day.

“This congestion immediately impacted the availability of real-time monitoring data for our internal operations teams, which impaired their ability to find the source of congestion and resolve it.”  This is the age-old problem, isn’t it?  Monitoring our networks requires network connectivity.  How else do we get logs, telemetry, traps, and other information from our devices?  And yet, when our network is down, we can’t get this data.  Most large-scale customers do maintain a separate out-of-band network just for monitoring.  I would assume Amazon does the same, but perhaps somehow this got crushed too?  Or perhaps what they refer to as their “internal network” was the OOB network?  I can’t tell from the post.

“Operators continued working on a set of remediation actions to reduce congestion on the internal network including identifying the top sources of traffic to isolate to dedicated network devices, disabling some heavy network traffic services, and bringing additional networking capacity online. This progressed slowly…”  I don’t want to take pleasure in others’ pain, but this makes me smile.  I’ve spent years telling networking engineers that no matter how good their tooling, they are still needed, and they need to keep their skills sharp.  Here is Amazon, with presumably the best automation and monitoring capabilities of any network operator, and they were trying to figure out top talkers and shut them down.  This reminds me of the first broadcast storm I faced, in the mid-1990’s.  I had to walk around the office unplugging things until I found the source.  Hopefully it wasn’t that bad for AWS!

Outages happen, and Amazon has maintained a high-level of service with AWS since the beginning.  The resiliancy of such a complex environment should be astounding to anyone who has built and managed complex systems.  Still, at the end of the day, no matter how much you automate (and you should), no matter how much you assure (and you should), sometimes you have to dust off the packet sniffer and figure out what’s actually going down the wire.  For network engineers, that should be a reminder that you’re still relevant in a software-defined world.

As I write this, a number of sites out on the Internet are down because of an outage at Amazon Web Services.  Delta Airlines is suffering a major outage.  On a personal note, my wife’s favorite radio app and my Lutron lighting system are not operating correctly.  Of course, this outage is a reminder of the simple principle of not putting one’s eggs in a single basket.  AWS became the dominant web provider early on, but there are multiple viable alternatives now.  Long before the modern cloud emerged, I regularly ran disaster recovery exercises to ensure business continuity when a data center or service provider failed.  Everyone who uses a cloud provider better have a backup, and you better figure out a way to periodically test that backup.  A few startups have emerged to make this easier.

While the cause of the outage is yet unknown, there was an interesting comment in an Newsweek article on the outage.  Doug Madory, director of internet analysis an Kentik Inc, said:  “More and more these outages end up being the product of automation and centralization of administration…”  I’ve been involved in automation in some form or another for my entire six years at Cisco, and one aspect of automation is not talked about enough:  automation gone wild.  Let me give a non-computer example.

Back when I worked at the San Francisco Chronicle, the production department installed a new machine in our Union City printing plant.  The Sunday paper, back then, had a large number of inserts with advertisements and circulars that needed to be stuffed into the paper.  They were doing this manually, if you can believe it.

The new machine had several components.  One part of the process involved grabbing the inserts and carrying them in a conveyor system high above the plant floor, before dropping them down into the inserter.  It’s hard to visualize, so I’ve included a picture of a similar machine.

You can see the inserts coming in via the conveyor, hanging vertically.  This conveyor extended quite far.  One day I was in the plant, working on some networking thing or other, and the insert machine was running.  I looked back and saw the conveyor glitch somehow, and then a giant ball of paper started to form in the corner of the room, before finally exploding and raining paper down on the floor of the plant.  There was a commotion and one of the workers had to shut the machine down.

The point is, automation is great until it doesn’t work.  When it fails, it fails big.  You don’t just get a single problem, but a compounding problem.  It wasn’t just a single insert that got hit by the glitch, but dozens of them, if not more.  When you use manual processes, failures are contained.

Let’s tie this back to networking.  Say you need to configure hundreds of devices with some new code, perhaps adding a new routing protocol.  If you do it by hand in one device, and suddenly routes start dropping out of the routing table, chances are you won’t proceed with the other devices.  You’ll check your config to see what happened and why.  But if you set up, say, a Python script to run around and do this via NETCONF to 100 devices, suddenly you might have a massive outage on your hands.  The same could happen using a tool like Ansible, or even a vendor network management platform.

There are ways to combat this problem, of course.  Automated checks and validation after changes is an important one, but the problem with this approach is you cannot predict every failure.  If you program 10 checks, it’s going to fail in way #11, and you’re out of luck.

As I said, I’ve spent years promoting automation.  You simply couldn’t build a network like Amazon’s without it.  And it’s critical for network engineers to continue developing skills in this area.  We, as vendors and promoters of automation tools, need to be careful how we build and sell these tools to limit customer risk.

Eventually they got the inserter running again.  Whatever the cause of Amazon’s outage, let’s hope it’s not automation gone wild.

How often have you learned about a new technology, and couldn’t understand it?  How many trainings and presentations have you sat through that left you in a mental fog?  It amazes me how many technologies we are supposed to master in our industry, and how many we never do.

Let me give an example.  When I heard about “Cloud Computing” I could not, for the life of me, understand what it meant.  I went to meeting after meeting where we talked about “the Cloud” without any understanding of what it actually was.  I knew I used clouds a lot of Visio diagrams, but the MBA-types who were telling me we needed to migrate to the cloud would never be able to understand the Visio diagrams that network engineers make.  It seemed to involve using centralized computing resources, but I’d been doing this for years.  My first ISP accounts were shell accounts.  My email and other services were hosted on their computers.  Nothing was new about this.  In fact, Larry Ellison gave a hilarious talk in which he asked “What the hell is Cloud Computing?”

We all know the “cloud” has in fact made significant changes in how we engineer computing resources, but the truth is, the idea of centralized “compute” is not a new one.  (Side note:  I hate turning nouns into verbs.  “Compute”, “spend”, and “ask” are verbs, not nouns.  The MBAs who invent these terms apparently don’t have to study grammar.)  The scale is certainly different, but we all know that mainframes had both centralized computing and virtualization long before anyone said “cloud.”

SDN is another one.  I was told we needed SDN, but I couldn’t figure out what it meant.  I was a hard-core routing protocols guy.  BGP and OSPF are software.  Ergo, networks are already software defined.

Someone sent me a video from Nicira, later acquired by VMware.  The vague video described slicing networks into pools, or something like that.  I couldn’t understand what this meant.  Like a VLAN?  I finally found a document that described SDN as separation of the control plane from the data plane.  OK, but we already had been doing that in routers and switches for years?  Yes, but SDN was a centralized control plane.  Kind of like BGP route reflectors?  I couldn’t figure it out.  I spent some time getting OpenFlow up and running to try to understand it from the ground up.  What a waste that was.  Whatever SDN has become, it’s certainly not what it was originally defined to be.  And don’t get me started on SASE.

I used to think maybe I was stupid, but now I realized all of these things confused me because they were (a) confusing in themselves, or (b) so badly explained that nobody really understood them.  A little more detail:

  • Some technologies are simply vague marketing terms.  They don’t correspond to anything precise in reality.
  • Some technologies do correspond to reality, but they are simply bucket terms.  That is, the marketers took five, six, ten technologies, and slapped a new label on them.  In this case, you’re looking for some precise definition of term X and you realize term X refers to ten different things at once.
  • Sometimes new technologies are invented, and the inventors don’t want to cough up too much proprietary information.  So the produce vaguely worded marketing content that appeals to “analysts” with MBAs in marketing, but which technical people realize are meaningless.  Said “analysts” now run around creating hype (“You need software-defined cloud secure-access zero trust!”) and now we’re told we to implement it.
  • A lot of technical people are really bad explainers.  Sometimes there is a new technology which is clear and well-defined, but the people sent to explain it are completely incapable of explaining anything at all.

My point is, it’s ok to be confused.  A lot of times we’re in the room and everyone seems to be getting it, but we have no idea what is going on.  Chances are, nobody else really understands what is being said either.  Ask questions, drill down, and if you don’t understand something, chances are it’s hot air.  In a world where we prioritize talk over reality, there seems to be an abundance of that.

As a part of my job at Cisco I’ve been looking into Zscaler and their offerings.  It started me thinking back to the early days of remote access, and I figured it would make a good topic for Netstalgia.

I wrote in the past about how bulletin board systems (BBSs) work, and in another article I resurrected my old BBS in an Apple II emulator.  In a nutshell, a computer with a BBS set up had a modem on it and users dialed in using their own modem over dial-up phone lines.  I’m not sure how many readers are young and don’t remember modems, and how many are dinosaurs like me, but as a reminder, modems connect computers to phone lines.  One modem is set to answer any call that comes in, and waits.  Then another user with a modem inputs the phone number of the other end into his software.  His modem dials out, the phone rings, and the other modem answers with a carrier tone.  Then the dialer responds and after some negotiation on the line, a connection is established and data is sent.

Now in my first job, at a small company in Marin California in the mid-1990’s, we had one computer set up as a dedicated remote access server.  It had a single modem with a single phone line, and ran Apple Remote Access server, since we were a Mac shop.  We only had one user with a laptop, the CEO, so when he traveled he would dial-in and be able to access basic functions like email and our file server.  There was no Internet access back then.

When I moved on to a consulting company, I did a few more industrial set ups.  Usually these involved remote access servers that were comprised of a bunch of modems and a LAN port.  The remote access server would accept a bunch of phone lines and then provide TCP/IP or AppleTalk connectivity to the network.  By this time users had Internet connectivity.  The Shiva LanRover is one example of this sort of device.

Shiva LANRover

When I worked at the San Francisco Chronicle, we had an Ascend Max which served this purpose.  The Max had two DS3 lines plugged into it.  It was the first time I had seen a DS3, and I remember being excited to learn the phone company could deliver a circuit over coax.  (It actually entered the building on fiber and went over coax from the MPOE.)  The DS3 was an ISDN PRI, with 24 dial-in phone lines multiplexed over a single digital circuit.  It took me months to find someone who had the password to the Max, and when I finally got in I found out that the second DS3 was unconfigured.  Users had been complaining about busy signals and all I had to do was change a few menu settings.

Ascend Max

Remote access dial-up was heavily used at the Chronicle.  Reporters filed their stories via modem.  VPN was just coming out, and I decided to replace the dial-up with VPN + dial-up.  A company called Fiberlink provided a dialer with a vast database of local Internet dial-up lines from a variety of carriers they contracted with.  Our users would pick a local phone line and then dial into it.  They then launched our Nortel VPN client to establish connectivity.  This saved us a fortune on 800-number charges, but our users hated it.  As a good senior guy, I did the initial design and left implementation to a junior guy.  I’m amazed he still talks to me.  (And he’s not junior anymore!)

Despite being a long-time Cisco guy, I never touched the Cisco remote access stuff.  I did use 2500-series routers with serial ports as terminal servers in the lab, but I never connected modems to them.  Still, when I passed my CCNP, one exam covered remote access and I needed to know a lot about modems.

Nowadays I rarely log into VPN.  Most systems I need to access can authenticate through our Zero Trust/SSO system without the need for a connection to Cisco’s network.  We’ve come a long way since the days of dial-up.  And while I said I missed wiring in another post, I sure don’t miss modem tones!

 

When I worked for the Gold partner I generally serviced clients in the San Francisco Bay Area, but because we were a national partner I was occasionally called to other locations around the country.  Being a double CCIE who had worked in TAC, I had a unique skill set among our engineers, which was often demanded by other field offices.

One day my boss called me and told me he needed my help with a customer out of Des Plaines, Illinois.  The company was a manufacturer of fuses.  They were experiencing a network meltdown and needed troubleshooting help.  Great, I thought, I left TAC and came here precisely to get out of this sort of thing.  I liked doing sales calls and new installations, not fixing buggy messes.

I was assigned to the customer on a Monday and was immediately pulled into what we often call a “shit show”.  (Pardon the language.)  The customer had a large international MPLS network with VPN backup.  Several of the sites were experiencing performance issues.  Sites were unable to perform manufacturing and the previous CIO had been fired.  The interim CIO was an ex-military person who seemed to think he was George S. Patton.  He was scheduling calls from early in the morning until late at night, status updates, live troubleshooting sessions, and pow wows with TAC.

Meanwhile, I was starting to feel ill.  Not because of the case, just sick to my stomach.  I didn’t think much of it at first, but it started to go downhill fast.  Luckily I was working from home due to the crazy hours.

But not for long.  The CIO had set up a troubleshooting session in the middle of the night Saturday, into Sunday morning.  He got my boss and me on the phone and insisted I come to Des Plains that weekend, in person.  We argued every way we could that I could be just as productive remotely, but Patton was having none of it.  “If this is your best guy,” he said to my boss, “you need to have him on a plane and out here in person.  Otherwise we can take our business somewhere else.”  Not only was it the weekend, and not only did I feel ill, it was also Memorial Day weekend.  My brother, who actually was (and is) in the Army was paying a rare visit to the Bay Area.  There was no sympathy from the customer, and soon I was booking my ticket to Illinois.  The local account executive booked me a car with GPS and promised to meet me at the airport.  Keep in mind, this was before smartphones, and so you needed to rent a car with a built-in GPS unit if you wanted to get around without maps.

I had a miserable flight and was starting to feel more sick.  There’s nothing worse than being sick to your stomach on a plane.  Being forced to stay in your seat and long lines for tiny bathrooms make for torture.  I ate nothing, and arrived at Chicago airport late.  The account executive was nowhere to be found, and the car he had arranged did not have GPS.  The rental car company didn’t have any GPS-equipped cars, so they provided me with a map and directions.

I drove through Chicago to Des Plaines.  Realizing I needed to eat something, I found a McDonalds, the only thing open at that time, and managed to choke down a half of a quarter pounder.  My stomach felt like burning acid.  I continued my drive, through a bad part of the city.  I was on the right road, but I needed to keep pulling over to check the address.  A couple times I pulled over, swarms of what I assume were drug dealers would approach.  I’d pull out just as they got to the car.

Eventually I made it to the customer site, and met the general.  I was shown in to a conference room with a raised floor right next to the data center, and we began troubleshooting.

It was nothing I couldn’t have aided with remotely.  Basically, the customer had scoped circuits that were too small for the volume of traffic they were carrying.  They were also having degradation problems on the MPLS.   Some of the sites were performing better on VPN backup circuits, so we were switching them to backups.  We performed tests with the telco.  We also looked into an issue with their core Catalyst 6k switch.  When they had done a circuit switch earlier in the week, all traffic on the network had stopped, according to the customer.  The customer had reloaded the core device and traffic came back.  Because there was no crash or crashdump file, and nothing in the logs, I could not explain this event.  It was a smorgasbord of issues, mostly due to bad design and a little due to bad luck.

The troubleshooting window was supposed to end at 2am, but we worked until 6am.  I had a flight to catch and hadn’t slept all night.  The customer wanted me to stick around but I told him to stuff it and left.  I checked in to the hotel room I had booked, slept one hour, checked out, and got on my plane.

On the flight back, I was seated in the middle seat between a two very large people.  I figured out that they were married, but they only spoke Spanish.  I used my rudimentary Spanish to extract myself.  “Yo, a la ventana.  Su esposa, aquí,” I suggested.  They liked the idea.  I moved to the window seat.  When the plane took off, they spread out a massive feast on the tray tables.  I ate a single muffin from Starbucks, one bite at a time, until I landed and went home.

I never determined if it was a norovirus or food poisoning, but I lost twenty pounds in a week.  The customer realized they needed to invest in new circuits, which had a 12-16 week turnaround time.  I think the new CIO got fired as well.  And the account executive never invited me back.  I only wish he had shown up at the airport as I might have thrown up on him.

We’ve moved into a wireless world, which is too bad for me because I love, more than anything, wiring.  I miss the days of good old Cat 3 cable, T1 lines, and ISDN BRIs.  I miss 66 blocks, punch down tools, cross-connect wire, and tone/probe kits.  And butt sets.  Especially butt sets.  Now I just have to add random wall receptacles around my house since, thank goodness, 120 volts cannot be delivered wirelessly.  But phone and network wiring was much more fun.

I first got interested in wiring when I was working at a small museum exhibit design and fabrication company in Marin, California.  One day, I had to have a new phone line installed, and I called in Pacific Bell to do the work.  Under the receptionist’s desk was a wall plate with two RJ-11 jacks, already in use.  “There aren’t any free jacks,” I told the phone guy.

“It doesn’t matter, as long as there is enough wire,” was his cryptic response.  I let him do his work, and when he was done I saw a little surface-mount RJ11 jack on the wall.  It had two blue and yellow wires running from it, going underneath the existing faceplate and somehow into the wall.  How on earth did he do that?  I remember thinking.

I waited until everyone went home.  I unscrewed the faceplate and found that the blue/yellow wire pair was spliced to the brown pair of wires in the cable that serviced the existing jacks.  The splice was a 3M UR-type, which looked like a piece of candy.  I was captivated.  How did he know where the brown wires went?  Where did the incoming phone line enter the building?  How did he connect them up?

The fun thing about the days before ubiquitous Internet was that you couldn’t get answers immediately.  When I found the 66-type punchdown blocks that formed our little MDF, I couldn’t Google the part number to figure out what they were.  Google didn’t exist.  I had no idea how to terminate a wire on one of them.  While thumbing through a Jensen Tools catalog our purchasing agent had I got an idea of how it worked.  There, I saw listed a “66 punch-down tool” and I had noticed the numbers “66” on the block.  OK, so they go together and the wire gets in the clip by “punching down.”

I didn’t have the money to afford a punch-down tool.  So I got a pair of pliers and a pair of forceps and started doing terminations myself, guiding the wire into the clip.  Sure, sparks were flying, but phone systems are robust, aren’t they?

It turns out not as robust as you might think.  Back in the day, phone lines provided by the phone company (POTS service) were indeed robust and could handle quite a bit of sparking and shorting.  But the Nitsuko (emphasis on the “suk”) system we used did not take kindly to my laissez-faire approach to wiring.  One day, while doing a move, I shorted out a couple terminals and all the phones in the building went out.  I came out of the closet with pliers in hand and everyone knew what happened.  This being before cell phones, business was done for the day.  Luckily we got our phone company out to replace the bad part, and they did it under warranty.  It was at this point I invested in a proper punch-down tool and learned how to wire correctly.

The author’s tools: Butt set, punch down tools, tone/probe set, and connectors

I saved up a lot of money and eventually got a test set, otherwise known as a “butt set”.  The butt set looks like an oversized telephone handset, and is the official sign of a telco wiring guy.  It also was my passport into telephone wiring closets–when a security guard sees you have a butt set, they just assume you’re a real phone guy.

I practiced my technique in my father’s 1910-era house.  The decades brought with them layers of phone wiring, including two 50-pair feeder cables and a 66-style punch-down block.  Why and how the feeder cables ended up in a house with 3 phone lines is a mystery to this day.  But I used the 66-block for practice and dissected the criss-crossing wires in his house, tracing them out with my trusty tone/probe set.

Wiring skills came in handy many times in my career as a network engineer.  I remember one customer I had in the late nineties who had ordered 8 Centrex lines from Pacific Bell.  The technician showed up to do the cross-connects, but he was not the sharpest knife in the drawer, did two of them, and left.  Using my butt set passport, I got access to the MDF, figured out the right punch-down block that fed to the customer’s suite, and ran the cross-connects myself.

When I worked for the San Francisco Chronicle, we actually had an in-house wiring tech.  Her name was Mona Lesa.  She couldn’t care less if I did my own wiring, as long as I adhered to her standards.  One day I had a frame relay T1 installed in our Sacramento bureau office, and once again, Pac Bell forgot to connect it to the suite.  The bureau was located in the Old Senator Office Building across from the state capitol.  I got the security guard to let me in to the MDF in the bowels of this historic building.  I figured out that one of the interconnects was in the office of a lobbying firm, and the receptionist dutifully let me in to do the wiring.  With my butt set I could clip into any of their phone lines and listen to their conversations without them knowing.  I didn’t but I was amazed that I could go anywhere to do cross-connects.

On another occasion, a customer of mine had two phone lines installed and had a weird problem.  If she picked up one line and dialed the other, she’d hear both ringing and a busy signal at the same time.  I found where the Pacific Bell guy had done the cross-connect, and realized he mixed the tip and ring wires for the two phone lines.  A couple punches and all fixed.

Phone wiring was always the perfect blend of mind and body to me.  To do it you need to work with your hands, but you also need to use your mind.  Some wiring closets were rats nests of 24-gauge wires color-coded with more colors than a tie-died shirt.  Finding that one pair you needed, and marrying it up to the other end was always a nice diversion from staring at a screen.

Alas, my tools mostly sit unused.  Regular POTS lines rarely exist now, and even they aren’t delivered on single copper pairs from the CO switch.  PBXs and key systems have gone the way of the dinosaur, replaced by voice-over-IP and now by cellular phones.  Nobody wants to be tied to wires anymore, and yet by un-encumbering ourselves in the name of freedom, we’ve paradoxically lost our freedom.  When you don’t need to be physically tied to a location for connectivity, you can never escape connectivity.  And that’s not entirely a good thing.

I mentioned several weeks ago I would be playing with the themes on this blog as my old one is broken and I don’t have time to fix it.  Pardon the changes in appearance while I play around.  I was getting tired of the tiles anyways, so maybe going back to a linear format will be more readable.

Incidentally, the new themes all lack the star ratings of the previous theme.  Frankly, they didn’t do much for me other than let me know someone was reading.  Feel free to comment if you like/dislike the theme I’m trying.

1
1

When I first started at Cisco (the second time), I remember being in a customer meeting where I had no idea what was going on.  As is typical for vendor meetings, Cisco employees outnumbered the customer by 3 to 1.  Someone from our side was presenting, though I don’t really remember about what.  I didn’t say anything because I was still pretty new, and frankly didn’t have much to say.  A Distinguished engineer, whom I know opposed my hiring, pulled me aside after the meeting and said to me with a smile:  “You know at Cisco we judge people on how much they speak in meetings.”  He was obviously implying that by keeping my mouth shut, I was proving my lack of value to the company.  I never really held it against that engineer, who wasn’t a bad guy.  But he reminded me of a problem in the corporate world, this belief that you have to always be talking.

I default to keeping my mouth shut when I don’t have anything important to say.  This probably is the result of being a child of divorce, but regardless I’ve always hated how much noise and talk are valued in our society.  Twitter is just a permanent gripe session, talk radio (whether the in-your-face conservative or sedate NPR variety) is just ceaseless hot air, and the more channels of communication we open up, the worse it becomes.  In the corporate world, decisions are often made in meetings based on the opinions of the most verbally aggressive in the room.  There is an underlying assumption to this approach to decision making:  that the loudest have the most valuable opinions, and the quietest the least.  But isn’t the opposite the case?  How many loudmouths have you known who spout nonsense, and how many super-intelligent quiet people do you know?  Some of the smartest people I’ve met are introverts.  And yet we seem to think if you’re willing to express an opinion loudly, then you’re worth following.

The problem with meeting culture in particular is that it doesn’t value thought.  I once had a VP complain to me that someone couldn’t “think on his feet”.  Most of the time, thinking on your feet doesn’t mean thinking at all.  It is simply reaction.  Thought takes time.  It requires reflection.  In corporate culture, we often prize how quickly you react, not how deeply you think.

This is not to say introverts should never try to overcome their shyness, nor that vocal people are always less intelligent.  However, I think as leaders in the corporate community, we can take steps to improve our meeting culture so that unheard but important voices have their chance to contribute.  This can be done a few ways:

  • Ensuring equal air time for participants in meetings.  If someone is talking too much, limit his time.  Call on the quiet folks and introverts explicitly to get their opinions.
  • Don’t make major decisions in meetings unless there is legitimate time pressure to do so.  At the end of a meeting, allow people to go back and reflect on what was discussed, possibly run through it in a chat room, and reconvene later when people have had time to think about the subject at hand.
  • Stop evaluating people simply on how much they speak in meetings.  Realize, particularly if you are a vocal-type, that people contribute in different ways.
  • Try to minimize interruptions in presentations and to save questions for the end.  I like to hear a presenter lay out a story in a logical fashion, and when presenters are constantly interrupted, it disturbs my ability to follow.  For those of us who are more contemplative thinkers, our ability to participate is hampered when presenters can never finish a thought.

Part of the problem is that many, if not most, who rise to leadership positions in the corporate world are the verbally aggressive and highly vocal type.  They often cannot understand how anyone could possibly approach things in a different way, and take quietness as a sign of weakness, indecision, or unintelligence.  For those leaders, recognizing the value of quieter individual contributors and leaders will help them and their organization.

Now that I’m done spouting off it’s time to log off for some silence of my own.

2
1

I’ve written before about my years at the San Francisco Chronicle, my first job which was exclusively network engineering.  It was an interesting environment, as this was back in the years before the Internet totally displaced newspapers.  We had printing plants to support, active newsrooms with reporters and photographers, and a massive circulation operation.

When I first arrived, our Internet was provided by a T1.  At the time, the 1.5 Mbps was considered pretty fast, but with over 1000 users depending on the T1, it was slowing to a crawl.  The T1 was terminated on a good old-fashioned Cisco 2500-series router.  Later I upgraded our service to a T3 and terminated it on a 7204VXR, which led to a dramatic speed improvement.  The 2500 sat in an open, free-standing rack in the corner of our data center.

One day I noticed that the Internet was down.  Even though this was the early 2000’s, Internet connectivity was already critical, especially at a newspaper.  The problem quickly escalated to the CIO and I scrambled into action.  I could reach as far as the firewall, but there was no connectivity beyond that.

Sometimes the best thing to do is to physically check on a problem.  The networking team was in the basement and our data center was on the second floor.  I sprinted up the stairs and badged in to the data center.  One of our mainframe operators, who worked in the data center grabbed me and asked “hey, do you know the Internet is…?”

“Yeah, yeah,” I said and ran to the rack with our T1 router and DMZ switches.

I saw a gentleman in white painters clothes with a paint roller in the corner.  He was painting the wall right along side the rack.  There was only a few inches of clearance between the rack and the wall where he was rolling paint.  On the side of the rack, held in place with plastic zip ties, was a power strip with all the rack’s hardware plugged in.  He’d hit it with the roller and knocked out the power.

Network engineers love to solve complex problems, but when people are yelling at you it’s a relief when the problem is simple.  I flipped the switch on the power strip and Internet was restored as soon as the router booted up.  Someone had obviously placed the power strip on the wall-side of the rack figuring it would minimize the chance of it getting bumped.  They had inadvertently created the problem they were trying to solve.  I told the painter to stay away from my rack.

Networks are complex entities, but often the problems we face involve bad splices in holes in the ground, physical obstructions to WiFi signals, loose connections, and rogue paint rollers.

 

We all have weaknesses, and one of mine is that I’m good at starting things and bad at finishing them.  Two years ago (gasp) I had started writing a series about technical interviewing.  I wrote two posts (here and here) on the subject and never finished.  A recent commenter asked for me to keep writing on the subject, so here goes.

Whenever we, as hiring managers, post a job, we have a specific idea in mind of what the ideal candidate would look like.  We are looking for a combination of skills, personality, experience, and credentials.  The fact of the matter is, we can create an ideal candidate in our head, but that person does not exist.  We will never hire this fictional person, so we have to look for the closest match.  We need to decide which of these skills, credentials, etc., we are willing to compromise on, and which are non-negotiable.  Historically, the process for deciding the match used two pieces of input:  the resume (or CV) and the interview.  The resume provides a rough view of how close the candidate is to the ideal, whereas the interview allows the hiring team to explore more deeply the closeness of the match.  I said “historically” because I’ve seen some companies using newer techniques, such as administering programming tests.  I’ll leave these newer techniques aside to focus on the interview.

The interview is an attempt to evaluate the candidate’s “fit” by asking questions of the candidate.  The largest part of this effort is trying to understand if the candidate’s skills fit with the skills we believe are required for the job.  A secondary part of the interview is assessing the candidate’s personality and “soft” skills.

How do we assess technical skill?  Let’s say the job we are hiring for is a low-level network engineer.  We need someone who has basic skills in configuring Cisco switches, routers, and wireless.  We want someone with a basic understanding of TCP/IP, IOS-XE, NX-OS, routing protocols like EIGRP and OSPF.  Often all these will be listed at the top of the candidate’s resume.  But anyone can put anything in the alphabet soup.  The interview is our chance to determine whether the alphabet soup is true.  (Hint:  DO NOT put anything in your resume you are not comfortable talking about.)  There are generally two ways of doing this.  First, we can assess the candidate’s hands-on experience with these technologies.  We can use questions like:

  • Did you work with EIGRP in your last job?  If so what did you do with it?
  • How much hands-on experience do you have with NXOS?  Tell me the environment where you configured it and what specifically you did.

In this case, I’m trying to understand whether you’ve worked with the technologies and how much you have worked with them.  You might claim experience with NXOS, but you only looked at “show interface” outputs in your last job.

The other way I can assess technical skill is by asking theoretical questions about the technologies in question.  For EIGRP, for example:

  • Can you tell me all the different components of the EIGRP metric and which are the defaults?
  • What is an EIGRP SIA condition and how would you go about resolving it?

Any skill is a combination of theory and practice.  If you don’t understand the theory behind a wing stall in an airplane, you won’t be able to resolve it, but if you don’t go out and do stall practice in the airplane, you also won’t be able to resolve it.  Most interviewers will use a combination of these techniques to assess your competency.

Another technique is to provide a theoretical scenario.  I can get up on the whiteboard and draw a bunch of OSPF areas with different types and ask you how routing would work between them.

As an interviewer, I now need to think about your answers and how they align with what I’m expecting.  What if you know a lot of EIGRP theory, but you’ve never configured it in the real world?  What if you know a lot about switching and routing but nothing about wireless?  And, most importantly, how do you compare to the other candidates I’ve interviewed?  The other strong candidate might know a ton about wireless and nothing about routing.  I also have to consider whether I want a “ready-made” engineer who can just walk in and start working, and how willing I am to provide ramp time to the hire.  If you know you’re stuff in routing and switching, but don’t understand wireless, I might assume your overall technical competency will enable you to learn wireless with relative ease.

How effective are these techniques?  Well, if I remember to continue with this series, I’ll discuss it in the next article.