Skip navigation

Ah, the joys of being an “expert.”  I had forgotten what happens after you pass an exam like the JNCIE.  One of my colleagues starts grilling me on various topics for which I am unprepared, since they weren’t covered on the exam.  MX architecture, MC-LAG, MX virtual chassis, etc.  Be careful what you wish for.  The second you put “expert” on your title some people will take it as a challenge.  Of course, being in a very non-hands-on position, I am not an expert in many of the things day-to-day engineers know quite well.  I certainly know the topics on the test.  Then again, having obtained a CCIE 10 years ago, I know better than to start parading myself around as an expert on anything.  The JNCIE number goes on the blog and the bottom of my LinkedIn, but I am not putting it in my email signature.  Meanwhile, time to start reading Doug Hanks’ book on MX architecture!

Shortly after I went to work at TAC, my first team, which was dedicated to enterprise customers, was dissolved, and I ended up on the Routing Protocols team.  The RP team supported both enterprise and service provider customers, and I had zero experience on the SP side.  There was quite a learning curve ahead.

One day my phone rang with a P1.  I dreaded P1’s.  When a P1 came in, you were thrown head-first into a potentially huge outage with no knowledge of the case in advance.  Often times a case that had been worked by another engineer got raised to P1 and you had to deal with someone else’s mess.

On this particular day, the case was a line card problem on a 12000-series, or GSR.  GSR was a service provider box and I knew nothing about it.  I didn’t even think it ran IOS (it does).  I had no idea where to start.  You would think they would have given me training on every product we covered, but at HTTS, at least when I worked there, the general approach was to throw you to the wolves.

So, I politely put the customer on hold and started running around the second floor of building K where HTTS is located, shouting:  “Does anyone know GSR?  Anybody around to help me?!”  Finally I stumbled across my teammate Abe in the break room stirring up a cup of coffee.  Abe had been a product manager for the GSR before he came to TAC.  “Abe, you gotta help me!  I took a P1 on a GSR and I’ve never touched one!”

“Way to go, grab the bull by the horns!” was Abe’s response.  We rushed back to my cube and Abe walked me through GSR line card troubleshooting.  Abe and I have been great friends ever since that day.

Remember, when you call for support and the person on the other end of the phone sounds like they might not know what they are doing, they probably don’t.  And if they put you on hold they may be running around screaming for help.  It happened a lot in TAC.

This is a follow-up to my previous post on the JNCIE-SP exam.

Some thoughts about the experience of taking the JNCIE exam, especially versus Cisco:

Read More »

Back to the blog, now that the JNCIE-SP is finished. I got #2332. The last time I did an expert-level exam was 2008, and I forgot just how challenging it is. I passed my JNCIP in June and it took me until November, working solidly most of the time, to get my number. It’s been a great experience. I work in a director-level architecture role at Juniper, and I am getting more and more removed from day-to-day, hands-on work. When I was in Cisco TAC, it was extremely technical, detailed work every day. Now it is meetings and PowerPoints. However, my ability to contribute at this level is entirely dependent on my technical expertise, and it feels great to refresh the knowledge and hit the CLI again. They say CLI will be dead with automation and SDN–don’t count on it. They can’t change the fundamental way networks operate, and when you look at SDN solutions, they are a lot more complicated then how they are presented. Being acquainted with MPLS and routing protocols in depth is the best preparation for anything to come, and the only way to learn those topics is at the command line. Period. Read More »

For the handful of people who come across this blog and have posted comments, thank you very much for the kind words. This blog is on hold for a bit while I finish up my JNCIE-SP, which I am taking in a couple weeks. I’ve come across a lot of excellent blog post topics from my studying, so I hope to get back into the swing of things soon. Keep an eye out, especially if you are new to Juniper or struggling with some of the less-documented commands.

The case came in P1, and I knew it would be a bad one. One thing you learn as a TAC engineer is that P1 cases are often the easiest. A router is down, send an RMA. But I knew this P1 would be tough because it had been requeued three times. The last engineer who had it was good, very good. And it wasn’t solved. Our hotline gave me a bridge number and I dialed in.

The customer explained to me that he had a 7513 and a 7206, and they had a multilink PPP bundle between them with 8 T1 lines. The MLPPP interface had mysteriously gone down/down and they couldn’t get it back. The member links were all up/down. Why they were connecting them this way was not a question an HTTS engineer was allowed to ask. We were just there to troubleshoot. As I was on the bridge, they were systematically taking each T1 out of the bundle and putting HDLC encapsulation on it, pinging across, and then putting it back into the MLPPP bundle. This bought me time to look over the case notes.

There were multiple RMA’s in the notes. They had RMA’d the line cards and the entire chassis. The 7513 they were shipped had problems and so they RMA’d it a second time. RMA’ing an entire 7513 chassis is a real pain. I perused the configs to see if authentication was configured on the PPP interface, but it wasn’t. It looked like a PPP problem (up/down state) but the interface config was plain MLPPP vanilla.

They finished testing all of the T1’s individually. One of the engineers said “I think we need another RMA.” I told them to hang on. “Take all of the links out of the bundle and give me an MLPPP bundle with one T1,” I said. “But we tested them all individually!” they replied. “Yes, but you tested them with HDLC. I want to test one link with multilink PPP on it.” They agreed. And with a single link it was still down/down. Now we were getting somewhere. I had them switch which link was the active one. Same problem. Now disable multilink and just run straight PPP on a single link. Same thing.

“Can you turn on debug ppp with all options?” I asked. They were worried about doing it on the 7513, but I convinced them to do it on the 7206. They sent me the logs, and this stood out:

AAA/AUTHOR/LCP: Denied

Authorization failed. But why? Nothing was configured under the interface, but I looked at the top of the config, where the AAA commands are, and saw this:

aaa authorization network default

And there it was. “Guys, could you remove this one line from the config?” I asked. They did. The single PPP link came up. “Let’s do this slowly. Add the single link back into multilink mode.” Up/up. “Now add all the links back.” It was working.

It turns out they had a project to standardize their configs across all their routers and accidentally added that line. They had RMA’d an entire 7513 chassis–twice!–for a single line of config. Replacing a 7513 is a lot of work. I still can’t believe it got that far.

Some lessons from this story: first, RMAs don’t always fix the problem. Second, even good engineers make stupid mistakes. Third, when troubleshooting, always limit the scope of the problem. Troubleshoot as little as you can. And finally, even hard P1’s can turn out easy.

This article continues to be the most popular one on this blog.  However, I published it back in 2014 while I was working on my JNCIE-SP, and that was a long time ago.  I now work at Cisco and do not have access to Junos, and my memory of Junos is getting spotty.  I am happy if the article helps you, and feel free to leave a comment, but unfortunately I will not be able to help you with specific questions on this or other Juniper topics.

 

Continuing on the subject of confusing Junos features, I’d like to talk about RIB groups. When I started here at Juniper, I remember being utterly baffled by this feature and its use. RIB groups are confusing both because the official documentation is confusing, and because many people, trying to be helpful, say things that are entirely wrong. I do think there would have been an easier way to design this feature, but RIB groups are what we have, so that’s what I’ll talk about. Read More »

Before I worked at TAC, I was pretty careless about how I filled in a TAC case online. For example, when I had to select the technology I was dealing with in the drop-down menu, if I didn’t see exactly what I had then I would go ahead and pick something at random and figure TAC would sort it out. And then I would get frustrated when I didn’t get an answer on my case for hours. Working in TAC showed me why.

When you open a TAC case, and you pick a particular technology, your choice determines into which queue the case is routed. For example, if you pick Catalyst 6500, the case ends up in a queue which is being monitored by engineers who are experts on that platform. Under TAC rules (assuming it is a priority 3 case) the engineers have 20 minutes to pick up the case. If they don’t, it turns blue in their display and their duty manager starts asking questions. (In high touch TAC where I worked, we didn’t have too many blue cases, but in backbone TAC it wasn’t uncommon to see a ton of blue and even black (> 1hr) cases sitting in a busy queue.)

If the customer categorized his case wrong, this meant it was sitting in the wrong queue. Now an engineer had to notice his case, review it, determine where it should go, and “punt” it to the appropriate queue, at which point the counters are reset and the case is sitting again.

Imagine for a moment that you are an overworked TAC engineer with 30 minutes left to go on your shift. You are supposed to clear out your queue and take any cases before the next crew comes on (at least we were in HTTS). You don’t want to take any more cases, however. There is a case sitting in your queue which has turned blue and your colleagues may not be happy to see it sitting there when they come on shift. Well, you’re an experienced TAC engineer and you know what to do: punt the case to another queue, even if it’s the wrong one. If you pick a busy queue, it will take at least 30 minutes for the engineers on that queue to see the “mis-queue” and punt the case back to your queue, at which point you are off shift and it becomes the problem of your colleagues on the next shift.

My recommendation is to be very careful to select the right menu options when you open a case online with any tech support organization. Make sure you route the case to the right place the first time so you don’t have to wait for engineers and managers to look at it and re-categorize it.

When I first started configuring MPLS on Juniper routers, I came across the strange and mysterious inet.3 table.  What could it possibly be?  When I worked in Cisco TAC I handled hundreds of MPLS VPN cases, but I never had encountered anything quite like inet.3 in IOS land.  As I researched inet.3 I found the documentation was sparse and confusing, so when I finally came to understand its purpose I decided to create a clear explanation for those who are searching in vain.  I will focus on the basics of how inet.3 works, leaving details of its use for later posts. Read More »

In this post, we’ll be looking at IS-IS inter-area concepts, and hopefully clearing up some of the confusion ISIS areas create in the minds of engineers who are used to OSPF.  ISIS handles areas quite differently from OSPF, and if you think about ISIS areas in OSPF terms you are likely to be confused by some of this behavior.  The good news is that if you configure area numbers and enable ISIS, it should just work, but if you want to do anything more complex you will need a deeper understanding of how ISIS areas work.  I’ll assume you know the basics of ISIS, for example that it is not an IP native protocol, and just focus on the areas for now.  My intention here is not to go into all of the details of ISIS inter-area operation, but to help you sort out the basics in your mind so you can dive deeper in your studies. All output will be from Juniper routers, but should be self-explanatory enough for those of you using a different platform. Read More »