When I worked at the San Francisco Chronicle, I started a project to bring Internet connectivity to a number of sites that had only limited mainframe circuits. To do this I decided to get DSL lines and run IPSec over them, a relatively new way of doing things for the time. It was a lot cheaper than the Frame Relay we used at larger sites.
After setting up connectivity at one of our sites, the local office manager called me. Web pages, he said, were only loading partially. Some of the text and none of the images would show up.
Everyone blamed the network for everything, so I punted him to desktop support. I could ping across the tunnel, I could send traffic just fine, the latency was minimal, and nothing was obviously wrong. The network is usually up or down, but web pages don’t partially load when everything else is working. Degraded service might cause the pages to load slowly, but not partially.
The desktop guys told me it was my problem. We had a constant battle, as nine times out of ten they blamed the network, and nine times out of ten it was not the network. The office manager was getting angry, so I decided I would do some investigation on site and prove to the desktop guys that they were wrong.
I went to the office and fired up my laptop. Pages were partially loading for me too. Hmmm. I did what every network engineer does and fired up a packet sniffer.
I could see the TCP handshake succeeding, and the browser requests and data exchange. It looked normal, but why wasn’t the browser displaying the images? I tried another browser and saw the same thing.
As I examined the sniffs, something hit me. All the packets were being sourced with the Do not Fragment (DF) bit set in the header. Could it be that the IPSec/GRE headers were causing the packets to be large enough to require fragmentation? And why was Windows setting the DF bit anyways?
As I wasn’t a desktop guy, I left the latter question alone. I jumped on the router and built a routing policy which cleared the DF bit on incoming packets. The pages started loading fine. I left the policy in place and hoped that there would not be any unanticipated consequences. I never saw any.
Sometimes, it is, indeed, the network.
2 Comments
Oh my. Another broken PMTUD. As a good network doctor you will fix the root cause, not only the symptoms. Here it is ICMP “packet too big” messages not reaching the clients (or not beeing genereated) and possibly MTU settings on tunnel interfaces. If at all possible one SHOULD NOT mangle any headers.
Indeed, I was never comfortable with the solution, but it was the first time I had dealt with IPSec/GRE and it worked after I cleared the DF bit. I never got another complaint. Please forgive me, I was young and I needed the money 🙂