Whither SDN? – SubnetZero

In a moment of curiosity, I looked at the Wikipedia definition for Software-Defined Networking: “Software-defined networking (SDN) is an approach to network management that uses abstraction to enable dynamic and programmatically efficient network configuration to create grouping and segmentation while improving network performance and monitoring in a manner more akin to cloud computing than to traditional network management.” Clear as a freaking bell, eh? Before the end of the sentence I was about to lock myself in the garage with the car running. It was then that I remembered I have an EV. Damn you, Elon!

The only thing worse than AI slop is human slop. I mean, who writes something like this? Do they even know what it means? Of course not! Nobody knows what it means. I thought about feeding it into ChatGPT for a translation, but I didn’t want to cause a nuclear-scale meltdown of their data centers as the LLM tried to parse that garbage. (Wikipedia helpfully notes that “This article has multiple issues.”) Still, the sentence is a pretty nice representation of the thinking of the SDN era, an era which might finally be coming to a close now that the marketeers have AI to talk about.

I first heard about SDN shortly after I was hired as the network architect for Juniper Networks corporate IT. A bunch of MBA-types were telling me I needed to implement SDN internally. I’d ask them what the heck it was, and I’d get an answer more-or-less like the above. One of the aforementioned MBAs pointed me to a video about a company called Nicira. I can’t find the video now, but from what I recall it was a slick marketing video (read, bullshit) which talked about “slicing” the network into “pools” of resources, or something like that. It made total sense to the MBAs, but no sense to me. How exactly does one “slice” a network? What is a “pool” of networking, actually? Why should I care?

What could software-defined networking be? My first thought was that networking is, by definition, software-defined. BGP is software. OSPF is software. IOS, or Junos, or EOS is software. Then I was confronted with claims like this one, the next sentence in the Wikipedia article: “SDN is meant to improve the static architecture of traditional networks…” Static architecture? What’s static about traditional networking? Most places that have more than four devices use some sort of dynamic routing protocol. Even if they don’t, they have switches which use dynamic MAC learning. Or are we saying the “architecture” is static? So the architecture of the network is made dynamic by SDN? We change the architecture on the fly?

Poking around in the original papers on SDN, I found it had something to do with separating the control plane and data plane. OK, but that’s done by every network device anyways. The FIB is programmed based on information in the RIB. No, no, what they mean is that the control plane is actually physically located somewhere else. The packet-forwarding devices are just dumb boxes, programmed by a central controller.

I sat down with Kireeti Kompella, a man who certainly understands networking and is also a very nice dude. “I’m just an ignorant network engineer,” I said, “but it seems to me that if you put the control plane on a physically separate system, you introduce a serious problem. Now you need to rely on either in-band communication with the controller, where devices talk to the controller over the network it is itself provisioning. Or you have to build and maintain a separate network for controller-to-device communication. And, you have to deal with the issue of this network going down.” Kireeti nodded. “That’s exactly the problem,” he told me.

Of course, nothing is really new in networking. We’ve dealt with this problem when using techniques like BGP route reflectors. BGP route reflectors, however, are communicating with the local BGP process on the device. The control plane still lives on the device. This is different from the idea of a “dumb” device fully programmed from a remote piece of software.

That very concept introduces another problem: The reason routing protocols like OSPF work so well is because the devices have local knowledge. A router knows right away when a link goes down and quickly shifts to a redundant link if it has one. That’s why we do it that way to begin with. How does moving it all to a controller make sense?

Meanwhile, the meaning of “SDN”, picked up by MBA-marketing types, shifted to include all sorts of things. Large-scale data centers were struggling with two problems: the limited VLAN space and VM mobility. 4096 (more-or-less) VLANs are allowed by 802.1Q, and with virtualization, there was a need to provision any VLAN on any switch at any time, and deal with the possibility of a VM moving somewhere and needed the VLAN to move too. Technologies like EVPN/VXLAN were wrapped into the SDN umbrella. SDN came to mean overlays, now. I’m not entirely sure what overlays have to do with separating control and data planes. Anyways, it spread outside of the DC. From the Wikipedia article on SD-WAN we learn that it uses “uses software-defined networking technology, such as communicating over the Internet using overlay tunnels.” Overlays. But, erm, communicating over the Internet using tunnels was happening long before anyone was talking about SDN. Isn’t that what GRE is? Didn’t we do that with DMVPN? Was that SDN? We didn’t call it that.

When I got to Cisco, I learned we were working on something called Software-Defined Access. If we had SDN in the WAN, we were going to have it in the campus too. SDA is a bunch of technologies bundled together, but it’s basically the aforementioned VXLAN overlay in the campus, with LISP as the control plane instead of BGP, TrustSec for segmentation, and VRF capability. I will leave the debate on the merits of SDA for another time, including the choice of LISP, which I suspect a number of my readers do not like. LISP does use a “control plane node” (MS/MR), which is kind of like a route reflector, but not really. But in SDA the “controller” is actually Catalyst Center (formerly DNA Center), which is not really a controller at all. It’s more of an NMS. It doesn’t participate in the control plane. (Note to marketeers: never name your product with a word that is spelled (spelt?) differently in different countries. “Center” is as grating to British English speakers as “Centre” is to Americans.) This was different from the sense of “controller” in the original SDN documents, which made it out to be the control plane for the network.

(By the way, when I talk to my wireless colleagues, they have even a third meaning for the word “controller”. Fantastic.)

Of course, then there was OpenFlow. One of the wave of “Open-” products that hit us in the 2000’s and 2010’s. We had OpenStack, OpenFlow, OpenConfig, OpenDaylight, OpenTable… OK, just kidding on the last one, but we had so many “Opens” I couldn’t remember which did what. At Cisco we had a push to implement OpenFlow in our switches by…one customer. A big customer. With lots of money. OpenFlow was the “true” implementation of SDN, I suppose, the one closest to the vision. I never configured it, so I don’t really know how it works. (You all know what I mean.) Whatever, as far as I know, it went the way of the dinosaur. (Just wait for the commenter who tells me his entire network is still OpenFlow, damnit!) I don’t even remember if we implemented it.

Given the jumble of stuff thrown under the mantle of “SDN”, I guess we can understand why Wikipedia has such a tortuous definition. SDN is, ya know, separation of the control plane and data plane (sometimes, but usually not), with overlays, segmentation, an external controller (really an NMS usually), with dynamic control of traffic (policy-based routing, but don’t call it that), and, you know, performance/assurance in a closed feedback loop. Or something like that.

Better yet, let’s just all move on to AI. At least we know what that is.

Leave a Reply Cancel reply