Vintage DDoS

With Coronavirus spreading, events shut down, the Dow crashing, and all the other bad news, how about a little distraction?  Time for some NetStalgia.

Back in the mid 1990’s, I worked at a computer consulting firm called Mann Consulting.  Mann’s clientele consisted primarily of small ad agencies, ranging from a dozen people to a couple hundred.  Most of  my clients were on the small side, and I handled everything from desktop support to managing the small networks that these customers had.  This was the time when the Internet took the world by storm–venture capitalists poured money into the early dotcoms, who in turn poured it into advertising.  San Francisco ad agencies were at the heart of this, and as they expanded they pulled on companies like Mann to build out their IT infrastructure.

I didn’t particularly like doing desktop support.  For office workers, a computer is the primarily tool they use to do their job.  Any time you touch their primary tool, you have the potential to mess something up, and then you are dealing with angry end users.  I loved working on networks, however small they were.  For some of these customers, their network consisted of a single hub (a real hub, not a switch!), but for some it was more complicated, with switches and a router connecting them to the Internet.

Two of my customers went through DDoS episodes.  To understand them, it helps to look at the networks of them time.

Both customers had roughly the same topology.  A stack of switches was connected together via back-stacking.  The entire company, because of its size, was in a single layer2/layer 3 domain.  No VLANs, no subnetting.  To be honest, at the time I had heard of VLANs but didn’t really understand what they were.  Today we all use private, RFC1918 addressing for end hosts, except for DMZs.  Back then, our ISP assigned us a block of addresses and we simply applied the public addresses directly on the end-stations themselves.  That’s right, your laptop had a public IP address on it.  We didn’t know a thing about security;  both companies had routers connected directly to the Internet, without even a simple ACL.  I think most companies were figuring out the benefits of firewalls at the time, but we also had a false sense of security because we were Mac-based, and Macs were rarely hacked back then.

One day, I came into work at a now-defunct ad agency called Leagas Delaney.  Users were complaining that nothing was working–they couldn’t access the Internet and even local resources like printing were failing.  Macs didn’t even have ping available, so I tried hitting a few web sites and got the familiar hung browser.  Not good.

I went into Leagas’ server room.  The overhead lights were off, so the first thing I noticed were the lights on the switches.  Each port had a traffic light, and each port was solid, not blinking like they usually did.  When they did occasionally blink, they all did in unison.  Not good either.  Something was amiss, but what?

Wireshark didn’t exist at the time.  There was a packet sniffer called Etherpeek available on the Mac, but it was pricey–very pricey.  Luckily, you could download it with a demo license.  It’s been over 20 years, so I don’t quite recall how I managed to acquire it with the Internet down and no cell phone tethering, but I did.  Plugging the laptop into one of the switches, I began a packet capture and immediately saw a problem.

The network was being aggressively inundated with packets destined to the subnet broadcast address.  For illustration, I’ll use one of Cisco’s reserved banks of public IP addresses.  If the subnet was 209.165.200.224/27, then the broadcast address would be 209.165.200.255.  Sending a packet to this address means it would be received by every host in the subnet, just like the broadcast address of 255.255.255.255.  Furthermore, because this address was not generic, but had the subnet prefix, a packet sent to that broadcast address could be sent through the Internet to our site.  This is known as directed broadcast.  Now, imagine you spoof the source address to be somebody else’s.  You send a single packet to a network with, say, 100 hosts, and those 100 hosts reply back to the source address, which is actually not yours but belongs to your attack target.  This was known as a smurf attack, and they were quite common at the time.  There is really no good reason to allow these directed broadcasts, so after I called my ISP, I learned how to shut them down with the “no ip directed-broadcast” command.  Nowadays, this sort of traffic isn’t allowed, most companies have firewalls, and they don’t use public IP addresses, so it wouldn’t work anyhow.

My second story is similar.  While still working for Mann, I was asked to fill in for one of our consultants who was permanently stationed at an ad agency as their in-house support guy.  He was going on vacation, and my job was to sit in the server room/IT office and hopefully not do anything at all.  Unfortunately, the day after he left a panicked executive came into the server room complaining that the network was down.  So much for a quiet week.

As I walked around trying to assess the problem, of course I overheard people saying “see, Jon leaves, they send a substitute, and look what happens!”  People started questioning me if I had “done” anything.

A similar emergency download of a packet sniffer immediately led me to the source of the problem.  The network was flooded with broadcast traffic from a single host, a large-format printer.  I tracked it down, unplugged it, and everything started working again.  And yet several employees still seemed suspicious I had “done” something.

Problems such as these led to the invention of new technologies to stop directed broadcasts and contain broadcast storms.  It’s good to remember that there was a time before these thing existed, and before we even had free packet sniffers.  We had to improvise a lot back then, but we got the job done.

TAC Tales #19: Butt-in-Chair

This one falls into the category of, “I probably shouldn’t post this, especially now that I’m at Cisco again,” but what the heck.

I’ve often mentioned, in this series, the different practices of “backbone TAC” (or WW-TAC) and High Touch Technical Support (HTTS), the group I was a part of.  WW-TAC was the larger TAC organization, where the vast majority of the cases landed.  HTTS was (and still is) a specialized TAC group dedicated to Cisco’s biggest customers, who generally pay for the additional service.  HTTS was supposed to provide a deeper knowledge of the specifics of customer networks and practices, but generally worked the same as TAC.  We had our own queues, and when a high-touch customer would open a case, Cisco’s entitlement tool would automatically route their case to HTTS based on the contract number.

Unlike WW-TAC, HTTS did not use the “follow the sun” model.  This meant that regular TAC cases would be picked up by a region where it was currently daytime, and when a TAC agent’s shift ended, they would find another agent in the next timezone over to pick up a live (P1/P2) case.  At HTTS, we had US-based employees only, at the time, and they had to work P1/P2 cases to resolution.  This meant if your shift ended at 6pm, and a P1 case came in at 5:55, you might be stuck in the office for hours until you resolved it.  We did have a US-based nightshift that came on at 6pm, but they only accepted new cases–we couldn’t hand off a live one to nightshift.

Weekends were covered by a model I hated, called “BIC”.  I asked my boss what it stood for and he explained it was either “Butt In Chair” or “Bullet In the Chamber.”  The HTTS managers would publish a schedule (quarterly if I recall) assigning each engineer one or two 6 hour shifts during the weekends of that quarter.  During those 6 hours, we had to be online and taking cases.

Why did I hate it?  First, I hated working weekends, of course.  Second, the caseload was high.  A normal day on my queue might see 4 cases per engineer, but on BIC you typically took seven or eight.  Third, you had to take cases on every topic.  During the week, only a voice engineer would pick up a voice case.  But on BIC, I, a routing protocols engineer, might pick up a voice case, a firewall case, a switching case…or whatever.  Fourth, because BIC took place on a weekend, normal escalation channels were not available.  If you had a major P1 outage, you couldn’t get help easily.

Remember that a lot of the cases you accepted took weeks or even months to resolve.  Part of a TAC engineer’s day is working his backlog of cases:  researching, working in the lab to recreate a problem, talking to engineering, etc., all to resolve these cases.  When you picked up seven cases on a weekend, you were slammed for weeks after that.

We did get paid extra for BIC, although I don’t remember how much.  It was hundreds of dollars per shift, if I recall.  Because of this, a number of engineers loaded up on BIC shifts and earned thousands of dollars per quarter.  Thankfully, this meant there were plenty of willing recipients when I wanted to give away my shifts, which I did almost always.  (I worked two during my two years at TAC.)  However, sometimes I could not find anyone to take my shift, and in that case I actually would sell my shift, offering a hundred additional dollars if someone would take the shift.  That’s how much I hated BIC.  Of course, this was done without the company knowing about it, as I’m sure they wouldn’t approve of me selling my work!

We had one CSE on our team, I’ll call him Omar, who loaded up on BICs.  Then he would come into his week so overloaded with cases from the weekend that he would hardly take a case during the week.  We’d all get burdened with extra load because Omar was off working his weekend cases.  Finally, as team lead, I called him out on it in our group chat and Omar blew up on me.  Well, I was right of course but I had to let it go.

I don’t know if HTTS still does BIC, although I suspect it’s gone away.  I still work almost every weekend I have, but it’s to stay on top of work rather than taking on more.

Where does time go?

Two things can almost go without saying:

  1. If you start a blog, you need to commit time to writing it.
  2. When you move up in the corporate world, time becomes a precious commodity.

When I started this blog several years ago, I was a network architect at Juniper with a fair amount of time on my hands.  Then I came to Cisco as a Principal TME, with a lot less time on my hands.  Then I took over a team of TMEs.  And now I have nearly 40 people reporting to me, and responsibility for technical marketing for Cisco’s entire enterprise software portfolio.  That includes ISE, Cisco DNA Center, SD-Access, SD-WAN (Viptela), and more.  With that kind of responsibility and that many people depending on me, writing TAC Tales becomes a lower priority.

In addition, when you advance in the corporate hierarchy, expressing your opinions freely becomes more dangerous.  What if I say something I shouldn’t?  Or, do I really want to bare my soul on a blog when an employee is reading it?  Might they be offended, or afraid I would post something about them?  Such concerns don’t exist when you’re an individual contributor, even at the director level, which I was.

I can take some comfort in the fact that this blog is not widely read.  The handful of people who stumble across it probably will not cause me problems at work.  And, as for baring my soul, well, my team knows I am transparent.  But time is not something I have much of these days, and I cannot sacrifice work obligations for personal fulfillment.  And that’s definitely what the blog is.  I do miss writing it.

Is this a goodbye piece?  By no means.  The blog will stay, and if I can eek out 10 minutes here or there to write or polish an old piece, I will.  Meanwhile, be warned about corporate ladder climbing–it has a way of chewing up your time.

Before the Internet: The Bulletin Board System

It’s inevitable as we get older that we look back on the past with a certain nostalgia.  Nostalgia or not, I do think that computing in the 1980’s was more fun and interesting than it is now.  Personal computers were starting to become common, but were not omnipresent as they are now.  They were quite mysterious boxes.  An error might throw you into a screen that displayed hexadecimal with no apparent meaning.  Each piece of software had its own unique interface, since there were no set standards.  For some, there might be a menu-driven interface.  For others you might use control keys to navigate.  Some programs required text commands.  Even working with devices that had only 64 Kilobytes of memory, there was always a sense of adventure.

I got my start in network engineering in high school.  Computer networks as we understand them today didn’t really exist back then.  (There was a rudimentary Internet in some universities and the Defense Department.)  Still, we found ways to connect computers together and get them to communicate, the most common of which was the Bulletin Board System, or BBS.

The BBS was an individual computer equipped with a modem, into which other computer users could dial.  For those who aren’t familiar with the concept of a modem, this was a device that enabled computer data to be sent over analog telephone lines.    Virtually all BBS’s had a single phone line and modem connecting to a single computer.  (A few could handle multiple modems and callers, but these were rare.)  The host computer ran special BBS software which received connections from anyone who might dial into it.  Once the user dialed in, then he or she could send email, post messages on public message boards, play text-based video games, and do file transfers/downloads.  (Keep in mind, the BBS was text-only, with no graphics, so you were limited in terms of what you could do.)  An individual operator of a BBS was called a System Operator or Sysop (“sis-op”).  The sysop was the master of his or her domain, and occasionally a petty tyrant.  The sysop could decide who was allowed to log into the board, what messages and files could be posted, and whether to boot a rude user.

Because a BBS had a single modem, dialing in was a pain.  That was especially true for popular BBS’s.  You would set your terminal software to dial the BBS phone number, and you would often get a busy signal because someone else was using the service.  Then you might set your software to auto re-dial the BBS until you heard the musical sound of a ring tone followed by modems chirping to each other.

How did you find the phone numbers for BBS’s in the era before Google?  You might get them from friends, but often you would find them posted as lists on other BBS’s.  When we first bought our modem for my Apple II+, we also bought a subscription to Compuserve, a public multi-user dial-in service.  On one of their message boards, I managed to find a list of BBS’s in the 415 area code where I resided.  I dialed into each of them.  Some BBS on the list had shut down and I could hear someone saying “Hello??” through the modem speaker.  Others connected, I set up an account, and, after perusing the board, I would download a list of more BBS numbers and go on to try them.

Each sysop configured the board however seemed best, so the BBS’s tended to have a lot of variation.  The software I used–the most common among Apple II users–was called GBBS.  GBBS had its own proprietary programming language and compiler called ACOS, allowing heavy customization.  I re-wrote almost the entire stock bulletin board system in the years I ran mine.  It also allowed for easy exchange of modules.  I delegated a lot of the running of my board to volunteer co-sysops, and one of them wanted to run a fantasy football league.  He bought the software, I installed it, and we were good to go.  I had friends who ran BBS’s on other platforms that did not have GBBS, and their boards were far less customize-able.

A funny story about that fantasy football sysop.  Back then the software came on floppy disks, and while I insisted on him mailing it to me, he insisted on meeting me in person and handing it over.  I was terrified of meeting this adult and revealing that I was only 14 years old.  I wanted everyone on the board to think I was an adult, not a teenager.  It helped project authority.  He wouldn’t budge, so we agreed to meet at a local sandwich shop.  Imagine my surprise when a 12-year-old walked in carrying the disks!  We had a nice lunch and I at least knew I could be an authority figure for him.  I suspect most of my users were no older than seventeen.

Each user on a BBS had a handle, which was just a screen name.  I’m somewhat embarrassed to admit that mine was “Mad MAn”.  I don’t really recall how I thought of the name, but you always wanted to sound cool, and to a 15 year old “madman” sounded cool.  This was in the era before school violence, so it wasn’t particularly threatening.  I spelled it with two words because I didn’t know how to spell “madman”, and this was before every spelling mistake was underlined in red.  The second A was capitalized because I was a bad typist and couldn’t get my finger off the shift key fast enough.  Eventually I just adopted that as a quirk. Because the BBS population consisted largely of nerdy teenage boys, a lot of the handles came from Lord of the Rings and other fantasy and sci-fi works.  I can’t tell you how many Gandalf’s were floating around, but there were a lot.  I had a Strider for a co-sysop.  Other handles, like mine, attempted to sound tough.  I had another co-sysop whose handle was Nemesis.

Since each BBS was an island, if someone sent you an email on BBS1, you couldn’t see it on BBS2.  So, if you were active on five BBS’s, you had to log in to all five and check email separately.  At one point a sysop who went by the handle “Oggman” launched a system called OGG-Net.  (His BBS also had a cool name, “Infinity’s Edge”.)  Oggy’s BBS became a central repository for email, and subscribing boards would dial in at night to exchange emails they had queued up.  This of course meant that it could take an entire day for email to propagate from one BBS to another, but it was better than before.

I’m writing this post in my “NetStalgia” series for a couple reasons.  First, it’s always important to look back in order to know where you are going.  Second, I’ve resurrected my old BBS using an Apple II emulator, and in my next post I’m going to share a few screen shots of what this thing actually looked like.  I hope you’ll enjoy them.

TAC Tales #18: All at once

The case came into the routing protocols queue, even though it was simply a line card crash.  The RP queue in HTTS was the dumping ground for anything that did not fit into one of the few other specialized queues we had.  A large US service provider had a Packet over SONET (PoS) line card on a GSR 12000-series router crashing over and over again.

Problem Details: 8 Port ISE Packet Over SONET card continually crashing due to

SLOT 2:Aug  3 03:58:31: %EE48-3-ALPHAERR: TX ALPHA: error: cpu int 1 mask 277FFFFF
SLOT 2:Aug  3 03:58:31: %EE48-4-GULF_TX_SRAM_ERROR: ASIC GULF: TX bad packet header detected. Details=0x4000

A previous engineer had the case, and he did what a lot of TAC engineers do when faced with an inexplicable problem:  he RMA’d the line card.  As I have said before, RMA is the default option for many TAC engineers, and it’s not a bad one.  Hardware errors are frequent and replacing hardware often is a quick route to solving the problem.  Unfortunately the RMA did not fix the problem, the case got requeued to another engineer, and he…RMA’d the line card.  Again.  When that didn’t work, he had them try the card in a different slot, but it continued to generate errors and crash.

The case bounced through two other engineers before getting to me.  Too bad the RMA option was out.  But the simple line card crash and error got even weirder.  The customer had two GSR routers in two different cities that were crashing with the same error.  Even stranger:  the crash was happening at precisely the same time in both cities, down to the second.  It couldn’t be a coincidence, because each crash on the first router was mirrored by a crash at exactly the same time on the second.

The conversation with my fellow engineers ranged from plausible to ludicrous.  There was a legend in TAC, true or not, that solar flares cause parity errors in memory and hence crashes.  Could a solar flare be triggering the same error on both line cards at the same time?  Some of my colleagues thought it was likely, but I thought it was silly.

Meanwhile, internal emails were going back and forth with the business unit to figure out what the errors meant.  Even for experienced network engineers, Cisco internal emails can read like a foreign language.  “The ALPHA errors are side-effects the GULF errors,” one development engineer commented, not so helpfully.  “Engine is feeding invalid packets to GULF and that causes the bad header error being detected on GULF,” another replied, only slightly more helpfully.

The customer, meanwhile, had identified a faulty fabric card on a Juniper router in their core.  Apparently the router was sending malformed packets to multiple provider edge (PE) routers all at once, which explained the simultaneous crashing.  Because all the PEs were in the US, forwarding was a matter of milliseconds, and thus there was very little variation in the timing.  How did the packets manage to traverse the several hops of the provider network without crashing any GSRs in between?  Well, the customer was using MPLS, and the corruption was in the IP header of the packets.  The intermediate hops forwarded the packets, without ever looking at the IP header, to the edge of the network, where the MPLS labels get stripped, and IP forwarding kicks in.  It was at that point that the line card crashed due to the faulty IP headers.  That said, when a line card receives a bad packet, it should drop it, not crash.  We had a bug.

The development engineers could not determine why the line card was crashing based on log info.  By this time, the customer had already replaced the faulty Juniper module and the network was stable.  The DEs wanted us to re-introduce the faulty line card into the core, and load up an engineering special debug image on the GSRs to capture the faulty packet.  This is often where we have a gulf, pun intended, between engineering and TAC.  No major service provider or customer wants to let Cisco engineering experiment on their network.  The customer decided to let it go.  If it came back, at least we could try to blame the issue on sunspots.

Interviewing #2: Why do we interview?

In the last article on technical interviewing, I told the story of how I got my first networking job.  The interview was chaotic and unorganized, and resulted in me getting the job and being quite successful.  In this post, I’d like to start with a very basic question:  Why is it that we interview job candidates in first place?

This may seem like an obvious question, but if you think about it face-to-face interviewing is not necessarily the best way to assess a candidate for a networking position.  To evaluate their technical credentials, why don’t we administer a test? Or, force network engineering candidates to configure a small network? (Some places do!)  What exactly is it that we hope to achieve by sitting down for an hour and talking to this person face-to-face?

Interviewing is fundamentally a subjective process.  Even when an interviewer attempts to bring objectivity to the interview by, say, asking right/wrong questions, interviews are just not structured as objective tests.  The interviewer feedback is usually derived from gut reactions and feelings as much as it is from any objective criteria.  The interviewer has a narrow window into the candidate’s personality and achievements, and frequently an interviewer will make an incorrect assessment in either direction:

  • By turning down a candidate who is qualified for the job.  When I worked at TAC, I remember declining a candidate who didn’t answer some questions about OSPF correctly.  Because he was a friend of a TAC engineer, he got a second chance and did better in his second interview.  He got hired and was quite successful.
  • By hiring a candidate who is unqualified for the job.  This happens all the time.  We pass people through interviews who end up being terrible at the job.  Sometimes we just assess their personality wrong and they end up being complete jerks.  Sometimes, they knew enough technical material to skate through the interview.

Having interviewed hundreds of people in my career, I think I’m a very good judge of people.  I was on the interview team for TAC, and everyone we hired was a successful engineer.  Every TME I’ve hired as a manager has been top notch.  That said, it’s tricky to assess someone in such a short amount of time. As the interviewee, you need to remember that you only have an hour or so to convince this person you are any good, and one misplaced comment could torpedo you unfairly.

I remember when I interviewed for the TME job here at Cisco.  I did really well, and had one final interview with the SVP at the time.  He was very personable, and I felt at ease with him.  He asked me for my proudest accomplishment in my career.  I mentioned how I had hated TAC when I started, but I managed to persevere and left TAC well respected and successful.  He looked at my quizzically.  I realized it was a stupid answer.  I was interviewing for a director-level position.  He wanted to hear some initiative and drive, not that I stuck it out at a crappy job.  I should have told him about how I started the Juniper on Juniper project, for example.  Luckily I got through but that one answer gave him an impression that took me down a bit.

When you are interviewing, you really need to think about the impression you create.  You need empathy.  You need to feel how your interviewer feels, or at least be self-aware enough to know the impression you are creating.  That’s because this is a subjective process.

I remember a couple of years back I was interviewing a candidate for an open position.I asked him why he was interested in the job. The candidate proceeded to give me a depressing account of how bad things were in his current job.”It’s miserable here,” he said.  “Nobody’s going anywhere in his job.  I don’t like the team they’re not motivated.”  And so forth.  He claimed he had programming capabilities and so I asked him what his favorite programming language.”I hate them all,” he said. I actually think that he was technically fairly competent but in my opinion working with this guy would’ve been such a downer that I didn’t hire him.

In my next article I’ll take a look at different things hiring managers and interviewers are looking for in a candidate, and how they assess them in an interview.

 

TAC Tales #17: Escalations

When you open a TAC case, how exactly does the customer support engineer (CSE) figure out how to solve the case?  After all, CSEs are not super-human.  Just like any engineer, in TAC you have a range of brilliant to not-so-brilliant, and everything in between.  Let me give an example:  I worked at HTTS, or high-touch TAC, serving customers who paid a premium for higher levels of support.  When a top engineer at AT&T or Verizon opened a case, how was it that I, who had never worked professionally in a service provider environment, was able to help them at all?  Usually when those guys opened a case, it was something quite complex and not a misconfigured route map!

TAC CSEs have an arsenal of tools at their disposal that customers, and even partners, do not.  One of the most powerful is well known to anyone who has ever worked in TAC:  Topic.  Topic is an internal search engine.  It can do more now, but at the time I was in TAC, Topic could search bugs, TAC cases, and internal mailers.  If you had a weird error message or were seeing inexplicable behavior, popping the message or symptoms into Topic frequently resulted in a bug.  Failing that, it might pull up another TAC case, which would show the best troubleshooting steps to take.

Topic also searches internal mailers, the email lists used internally by Cisco employees.  TAC agents, sales people, TMEs, product managers, and engineering all exchange emails on these mailers, which are then archived.  Oftentimes a problem would show up in the mailer archives and engineering had already provided an answer.  Sometimes, if Topic failed, we would post the symptoms to the mailers in hopes engineering, a TME, or any expert would have a suggestion.  I was always careful in doing so, as if you posted something that was already answered, or asked too often, flames would be coming your way.

TAC engineers have the ability to file bugs across the Cisco product portfolio.  This is, of course, a powerful way to get engineering attention.  Customer found defects are taken very seriously, and any bug that is opened will get a development engineer (DE) assigned to it quickly.  We were judged on the quality of bugs we filed since TAC does not like to abuse the privilege and waste engineering time.  If a bug is filed for something that is not really a bug, it gets marked “J” for Junk, and you don’t want to have too many junked bugs.  That said, on one or two occasions, when I needed engineering help and the mailers weren’t working, I knowingly filed a Junk bug to get some help from engineering.  Fortunately, I filed a few real bugs that got fixed.

My team was the “routing protocols” team for HTTS, but we were a dumping ground for all sorts of cases.  RP often got crash cases, cable modem problems, and other issues, even though these weren’t strictly RP.  Even within the technical limits of RP, there is a lot of variety among cases.  Someone who knows EIGRP cold may not have a clue about MPLS.  A lot of times, when stuck on a case, we’d go find the “guy who knows that” and ask for help.  We had a number of cases on Asynchronous Transfer Mode (ATM) when I worked at TAC, which was an old WAN (more or less) protocol.  We had one guy who knew ATM, and his job was basically just to help with ATM cases.  He had a desk at the office but almost never came in, never worked a shift, and frankly I don’t know what he did all day.  But when an ATM case came in, day or night, he was on it, and I was glad we had him, since I knew little about the subject.

Some companies have NOCs with tier 1, 2, and 3 engineers, but we just had CSEs.  While we had different pay grades, TAC engineers were not tiered in HTTS.  “Take the case and get help” was the motto.  Backbone (non-HTTS) TAC had an escalation team, with some high-end CSEs who jumped in on the toughest cases.  HTTS did not, and while backbone TAC didn’t always like us pulling on their resources, at the end of the day we were all about killing cases, and a few times I had backbone escalation engineers up in my cube helping me.

The more heated a case gets, the higher the impact, the longer the time to resolve, the more attention it gets.  TAC duty managers can pull in more CSEs, escalation, engineering, and others to help get a case resolved.  Occasionally, a P1 would come in at 6pm on a Friday and you’d feel really lonely.  But Cisco being Cisco, if they need to put resources on an issue, there are a lot of talented and smart people available.

There’s nothing worse than the sinking feeling a CSE gets when realizing he or she has no clue what to do on a case.  When the Topic searches fail, when escalation engineers are stumped, when the customer is frustrated, you feel helpless.  But eventually, the problem is solved, the case is closed, and you move on to the next one.

CCIE Enterprise Infrastructure

There were quite a few big announcements at Cisco Live this year.  One of the big ones was the overhaul of the certification program.  A number of new certifications were introduced (such as the DevNet CCNA/CCNP), and the existing ones were overhauled.  I wanted to do a post about this because I was involved with the certification program for quite a while on launching these.  I’m posting this on my personal blog, so my thoughts here are, of course, personal and not official.

First, the history.  Back when I was at Juniper, I had the opportunity to write questions for the service provider written exams.  It was a great experience, and I got thorough training from the cert program on how to properly write exam questions.  I don’t really remember how I got invited to do it, but it was a good opportunity, as a certified (certifiable?) individual, to give back to the program.  When I came to Cisco, I quickly connected with the cert program here, offering my services as a question writer. I had the training from Juniper, and was an active CCIE working on programmability.  It was a perfect fit, and a nice chance to recertify without taking the test, as writing/reviewing questions gets your CCIE renewed.

As I was managing a team within the business unit that was working on Software-Defined Access and programmability, it seemed logical for me to talk to the program about including those topics on the test.  I can assure you there was a lot of internal debate about this, as the CCIE exam is notoriously complex, and the point of our Intent-Based Networking products is simplicity.  One product manager even suggested a separate CCIE track for SD-Access, an idea I rejected immediately for that very reason.

Still, as I often point out here and elsewhere, SDN technologies do not mitigate the need for network engineers.  SDN products, all SDN products, are complex precisely because they are automated.  Automation enables us to build more complex things, in general.  You wouldn’t want to configure all the components of SD-Access by hand.  Still, we need engineers who understand what the automation tools are doing, and how to work with all the components which comprise a complex solution like SD-Access.  Network engineers aren’t going to disappear.

For this reason, we wanted SD-Access, SDWAN, and also device programmability (NETCONF/YANG, for example) to be on the lab.  We want to have engineers who know and understand these technologies, and the certification program is a fantastic way to help people to learn them.  I, and some members of my team, spent several months working with the CCIE program to build a new blueprint, which became the CCIE Enterprise Infrastructure.  The storied CCIE Routing and Switching will be no more.

At the end of the day, the CCIE exam has always adapted to changed in networking.  The R/S exam no longer has ISDN or IPX on it, nor should it.  Customers are looking for more automated solutions, and the exam is keeping pace.  If you’re studying for this exam, the new blueprint may be intimidating.  That said, CCIE exams have always been intimidating.  But think about this:  if you pass this exam, your resume will have skills on it that will make you incredibly marketable.

The new CCIE-EI (we always abbreviate stuff, right?) breaks down like this:

  • 60% is classic networking, the core routing protocols we all know and love.
  • 25% is SDx:  SD-Access and SD-WAN, primarily.
  • 15% is programmability.  NETCONF/YANG, controller APIs, Ansible, etc.

How do you study for this?  Like you study for anything.  Read about it and lab it.  There is quite a bit of material out there on all these subjects, but let me make some suggestions:

Programmability

You are not expected to be a programming expert for this section of the exam.  It’s not about seeing if you can write complex programs, but whether you know the basics well enough to execute some tasks via script/Ansible/etc instead of CLI.  DevNet is replete with examples of how to send NETCONF messages, or read data off a router or switch with programmable interfaces.  Download them, play with them, spend some time learning the fundamentals of Python, and relax about it.

  • Learn:  DevNet is a phenomenal resource.  Hank Preston, an evangelist for DevNet, has put out a wealth of material on programmability.  In addition, there is the book on IOS XE programmability I wrote with some colleagues.
  • Lab:  You can lab programmability stuff easily on your laptop.  Python and ncclient are free, as is Ansible.  If you have any sort of lab setup already, all you need to do is set up a Linux VM or install some tools on to your laptop.

Software-Defined

This is, as I said before, a tough one to test on.  After all, to add a device to an SD-Access fabric, you select it and click “Add to Fabric.”  What’s there to test?  Well, since these are new products you of course need to understand the components of SD-Access/SDWAN and how they interoperate.  How does policy work?  How do fabric domains talk to non-fabric domains?  There is plenty to study here.

  • Learn:  Again, we’ve written books on SD-Access and SD-WAN.  Also, we are moving a lot of documentation into Cisco Communities.
  • Lab:  Well, this is harder.  We’re working on getting SD-Access into the hands of learning partners, so you’ll have a place to get your hands on it.  We’re also working on virtualizing SD-Access as much as possible, to make it easier for people to run in labs.  I don’t have a timeframe on the latter, but hopefully the former we can do soon.

These are huge but exciting changes. I’ve been very lucky to have landed at a job where I am at the forefront of the changes in the industry, but this new exam will give others the opportunity to move themselves in that direction as well.  Happy labbing!

Cisco Live is over! Long Live Cisco Live!

I think it’s fair to say that all technical marketing engineers are excited for Cisco Live, and happy when it’s over.  Cisco Live is always a lot of fun–I heard one person say “it’s like a family reunion except I like everyone!”  It’s a great chance to see a lot of folks you don’t get to see very often, to discuss technology that you’re passionate about with other like minded people, to see and learn new things, and, for us TMEs, an opportunity to get up in front of a room full of hundreds of people and teach them something.  We all now wait anxiously for our scores, which are used to judge how well we did, and even whether we get invited back.

It always amazes me that it comes together at all.  In my last post, I mentioned all the work we do to pull together our sessions.  A lot of my TMEs did not do sessions, instead spending their Cisco Live on their feet at demo booths.  I’m also always amazed that World of Solutions comes together at all.  Here is a shot of what it looked like at 5:30 PM the night before it opened (at 10 AM.)  How the staff managed to clear out the garbage and get the booths together in that time I can’t imagine, but they did.

The WoS mess…

My boss, Carl Solder, got to do a demo in the main keynote.  There were something like 20,000 people in the room and the CEO was sitting there.  I think I would have been nervous, but Carl is ever-smooth and managed it without looking the least bit uncomfortable.

My boss (left) on the big stage!

The CCIE party was at the air and space museum, a great location for aviation lovers such as myself.  A highlight was seeing an actual Apollo capsule.  It seemed a lot smaller than I would have imagined.  I don’t think I would ever have gotten in that thing to go to the moon.  The party was also a great chance to see some of the legends of the CCIE program, such as Bruce Caslow, who wrote the fist major book on passing the CCIE exam, and Terry Slattery, the first person to actually pass it.

CCIE Party

I delivered two breakouts this year:  The CCIE in an SDN World, and Scripting the Catalyst.  The first one was a lot of fun because it was on Monday and the crowd was rowdy, but also because the changes to the program were just announced and folks were interested in knowing what was going on.  The second session was a bit more focused and deeper, but the audience was attentive and seemed to like it.  If you want to know what it feels like to be a Cisco Live presenter, see the photo below.

My view from the stage

I closed out my week with another interview with David Bombal, as well as the famous Network Chuck.  This was my first time meeting Chuck, who is a bit of a celebrity around Cisco Live and stands out because of his beard.  David and I had already done a two-part interview (part 1, part 2) when he was in San Jose visiting Cisco a couple months back.  We had a good chat about what is going on with the CCIE, and it should be out soon.

As I said, we love CL but we’re happy when it’s over.  This will be the first weekend in a long time I haven’t worked on CL slides.  I can relax, and then…Cisco Live Barcelona!

Inside Cisco Live

 

While I’m thinking about another TAC Tale, I’m quite busy working on slides for Cisco Live.  I figured this makes for another interesting “inside Cisco” post, since most people who have been to the show don’t know much about how it comes together.  A couple years back I asked a customer if I could schedule a meeting with him after Cisco Live, since I was working on slides.  “I thought the Cisco Live people made the slides and you just showed up and presented them!” he said.  Wow, I wish that was the case.  With hundreds of sessions I’m not sure how the CL team could accomplish that, but it would sure be nice for me.  Unfortunately, that’s not the case.

If you haven’t been, Cisco Live is a large trade show for network engineers which happens four times globally: in Europe, Australia, the US, and Mexico.  The US event is the largest, but Europe is rather large as well.  Australia and Mexico are smaller but still draw a good crowd.  The Europe and US shows move around.  The last two years Europe was in Barcelona, as it will be next year, but it was in Berlin two years before that.  The US show is in San Diego this year, was in Orlando last year, and was in Las Vegas for two years before that.  Australia is always in Melbourne, and Mexico is always in Cancun.  I went to Cisco Live US twice when I worked for a partner, and I’ve been to every event at least once since I’ve worked at Cisco as a TME.

The show has an number of attractions.  There is a large show floor with booths from Cisco and partners.  There are executive and celebrity keynotes.  The deepest content is delivered in sessions–labs, techtorials, and breakout sessions which can have between 20 and several hundred attendees.  The sessions are divided into different tracks:  collaboration, security, certification, routing and switching, etc., so attendees can focus on one or more areas.

Most CL sessions are delivered by technical marketing engineers like myself, who work in a business unit, day in and day out, with their given product.  As far as I know anyone in Cisco can submit a session, so some are delivered by people in sales, IT, CX (TAC or AS), and other organizations.  Some are even delivered by partners and customers.

Six months before a given event, a “call for papers” goes out.  I’m always amused that they pulled this term from academia, as the “papers” are mostly powerpoints and not exactly academic.  If you want to do a session, you need to figure out what you want to present and then write up an abstract, which contains not only the description, but also explains why the session is relevant to attendees, what they can hope to get out of it, and what the prerequisites are.  Each track has a group of technical experts who manage it, called “Session Group Managers”, or SGMs.  They come from anywhere in the business, but have the technical expertise to review the abstracts and sessions to ensure they are relevant and well-delivered.  For about a year, the SGM for the track I usually presented actually reported to me.  They have a tough job, because they receive a large number of applications for sessions, far more than the slots they have.  They look at the topic, quality of the abstract, quality of the speaker, available slots, and other factors in figuring out which sessions get the green light.

Once you have an approved session, you can start making slides.  Other than a standard template, there is not much guidance on how to build a deck for Cisco Live.  My old SGM liked to review each new presentation live, although some SGMs don’t.  Most of us end up making our slides quite close to the event, partly because we are busy, but also because we want to have the latest and most current info in our decks.  It’s actually hard to write up a session abstract six months before the event.  Things change rapidly in our industry, and often your original plan for a session gets derailed by changes in the product or organization.  More than once I’ve had a TME on my team presenting on a topic he is no longer working on!  One of my TMEs was presenting on Nexus switches several months after our team switched to Catalyst only.

At Cisco Live you may run into the “speaker ready room.”  It’s a space for speakers to work on slides, supplied by coffee and food, but there is also a small army of graphic design experts in there who will review the speakers’ slides one last time before they are presented. They won’t comment on your design choices, but simply review them to ensure they are consistent with the template formatting.  We’re required to submit our final deck 24 hours before our session, which gives the CL staff time to post the slides for the attendees.

Standing up in front  of a room full of engineers is never easy, especially when they are grading you.  If you rate in the top 10% of speakers, you win a “Distinguished Speaker” award.  If you score below 4.2 you need to take remedial speaker training.  If your score is low more than a couple times, the SGMs might ask you not to come back.  Customers pay a lot of money to come to CL and we don’t want them disappointed.  For a presenter, being scored, and the high stakes associated with the number you receive, makes a CL presentation even more stressful.  One thing I’ve had to accept is that some people just won’t like me.  I’ve won distinguished speaker before, but I’ve had some sessions with less-than-stellar comments too.

The stress aside, CL is one of the most rewarding things we do.  Most of the audience is friendly and wants to learn.It’s a fun event, and we make great contacts with others who are passionate about their field.  For my readers who are not Cisco TMEs (most I suspect), I hope you have a chance to experience Cisco Live at least once in your career.  Now you know the amount of work that goes into it.