As I mentioned in my last post, I like modeling networks using tools like Cisco Modeling Labs or GNS3. I recalled how, back in TAC, I had access to a Cisco-internal (at the time) tool called IOS on Unix, or IOU. This enabled me to recreate customer environments in minutes, with no need to hunt down hardware. Obviously IOU didn’t work for every case. Often times, the issue the customer raised was very hardware specific, even when it was a “routing protocol” issue. However, if I could avoid hardware, I would do the recreate virtually.
When I worked at Juniper (in IT), we did a huge project to refresh the WAN. This was just before SD-WAN came about. We sourced VPLS from two different service providers, and then ran our own layer 3 MPLS on top of it. The VPLS just gave us layer 2 connectivity, like a giant switch. We had two POPs in each region which acted as aggregation points for smaller sites. For these sites we had CE routers deployed on prem, connecting to PE routers in the POPs. This is a basic service provider configuration, with us as a service provider. Larger sites had PE routers on site, with the campus core routers acting as CEs.
We got all the advantages of layer 3 MPLS (traffic engineering, segmentation via VRF) without the headaches (peering at layer 3 with your SP, yuck!)
As the “network architect” for IT, I needed a way to model and test changes to the network. I used a tool called VMM, which was similar to IOU. Using a text file I could define a topology of routers, and their interconnections. Then I used a Python script to start it up. I then had a fully functional network model running under a hypervisor, and I could test stuff out.
I never recreated the entire network–it wasn’t necessary. I created a virtual version with two simulated POPs, a tier 1 site (PE on prem), and a tier 2 site (PE in POP). I don’t fully remember the details, there may have been one or two other sites in my model.
For strictly testing routing issues assuming normally functioning devices, my VMM-based model was a dream. Before we rolled out changes we could test them in my virtual lab. We could apply the configuration exactly as entered into the real device, to see what effect there would be in the network. I just didn’t have the cool marketing term “digital twin” as it didn’t exist yet.
I remember working on a project to roll out multicast on the WAN using Next Generation Multicast VPN (NGMVPN). NGMVPN was (is?) a complex beast, and as I designed the network and sorted out things like RP placement, I used my virtual lab. I even filed bugs against Juniper’s NGMVPN code, bugs I found while using my virtual devices. I remember the night we did I pilot rollout to two sites. Our Boston office dropped off the network entirely. Luckily we had out-of-band access and rolled back the config. I SSHd into my virtual lab, applied the config, and spend a short amount of time diagnosing the problem (a duplicate loopback address applied), and did so without the stress of troubleshooting a live network.
I’ve always been a bit skeptical of the network simulation/modeling approach. This is where you have some software intellgence layer that tries to “think through” the consequences of applied changes. The problem is the variability of networks. So many things can happen in so many ways. Actual devices running actual NOS code in a virtual environment will behave exactly the way real devices will, given their constraints. (Such as: not emulating the harware precisely, not emulating all the different interface types, etc.) I may be entirely wrong on this one, I’ve spent virtually no time with these products.
The problems I was modeling were protocol issues amongst a friendly group of routers. When you add in campus networking, the complexity increases quite dramatically. Aside from wireless being in the mix, you also have hundreds, thousands of non-network devices like laptops, printers, and phones which often cause networks to behave unpredictably. I don’t think our AI models are yet at the point where they can predict what comes with that complexity.
Of course, the problem you have is always the one you don’t predict. In TAC, most of the cases I took were bugs. Hardware and software behaves unexpectedly. As in the NGMVPN case, if there is a bug in software that is strictly protocol related, you might catch it in an emulation. But many bugs exist only on certain hardware platforms, or in versions of software that don’t run virtually, etc.
As for digital twins, I do think learning to use CML (of course I’m Cisco-centric) or similar tools is very worthwhile. Preparing for major changes offline in a virtual environment is a fantastic way to prep for the real thing. Don’t forget, though, that things never go as planned, and thank goodness for that, as it gives us all job security.