Computer Whodunit Detective Story – the Conclusion

­

In part one of our computer detective saga, the story opened with a few users unable to access their emails. Similar to a Hollywood detective story, we followed the clues through several unexpected twists and turns, with each clue answering questions and generating new questions.  Continuing in the style of great whodunit detective mysteries, we eventually uncovered the culprit, a rogue DHCP server.  This changed everything.

And now the conclusion.

DHCP – Dynamic Host Control Protocol – is the reason we can connect our laptops and tablets and smartphones to the Internet.  DHCP servers assign all the attributes our devices need to enable communications.  Think of the Internet as similar to the telephone network, but with one important difference.  In the telephone network, your phone number stays the same no matter where your phone travels. On the Internet, an IP Address defines your device.  But unlike phone numbers, IP Addresses change, depending on where your device is located.  That’s why we need DHCP servers, to assign IP Addresses and other attributes to devices when they attach to an office network or the Internet.

Here is how DHCP works.  When you connect your device to a network, your device sends a broadcast to anyone on the local network who will listen.  It’s essentially a cry for help.  (Help!  Load me with what I need so I can talk to the world.)  The DHCP Server listens to the broadcast and downloads an IP Address and other attributes to the requesting device.  This is called an IP Address lease, and the lease expires after a settable amount of time, called a TTL (Time to Live).  Once the device acquires its IP Address lease, it can interact with the world.

DHCP is a thing of beauty when set up properly and works so well, only a few hard-core IT people think about it anymore.  Except when things go wrong.  And one of the worst things that can go wrong is a rogue DHCP Server wreaking havoc on the network.  When this happens, random devices get the wrong attributes and lose all ability to communicate.  Depending on how long the lease TTLs are set, sometimes the passage of a few hours can cure the problem, or sometimes make it worse.  The problem can “hop” from device to device as leases expire and new leases come online.  Sometimes devices can end up with duplicate IP Addresses that come and go and interfere with communications.  This can be maddening to troubleshoot.

The usual culprit in an office network is a wireless router somebody brought in from home.  This happens all the time as end users decide they want to build their own private wireless networks, but don’t think about the consequences to everyone else as their wireless router hands out home IP Addresses to random devices across the company network.

Obviously, the cure for a rogue DHCP server is to find it and get rid of it.   The challenge is how to find it?

Enter structured cabling.  Essentially, a structured cable plant runs network cables from stations all over the building to a central patch panel in the server room.  Each cable is labeled, preferably with the labels on both ends of the same cable matching.   All buildings should have a structured cabling.  Unfortunately, many don’t.  Fortunately, this one did.  And that proved to be a tremendous aid finding my rogue DHCP server.

Instead of walking the entire building and looking for a device that looked out of place, I set up a laptop near the patch panel and assigned the laptop a hard IP Address to fit the rogue DHCP server scheme.  After warning everyone their network connections may be disrupted briefly, I set up the laptop to continuously ping the rogue DHCP server IP address while I disconnected and reconnected each network cable.

The idea – one of those cables had to lead to the rogue DHCP server.  I would find the cable leading to my rogue DHCP server by watching for pings to stop responding when I disconnected that cable.  Once I found the correct cable, I could walk to the other end of that cable with a hammer and put the rogue DHCP Server on the other end out of its misery.

I eventually found it, chased it to the other end of the cable, and disconnected it.  It turned out, my friend James brought in a wireless router over a weekend to help with some work he needed to do.  He forgot to disconnect it and that was why my users started complaining on Monday morning.

The moral of the story?  These things happen and that’s why good troubleshooting techniques are invaluable.