The Chinese may now have personal information on 4 Million US Government employees

Yet another sensational data breach headline – not even shocking anymore.  Yawn.  But listening to the story on the radio on the way home last night after being slaughtered in softball again, I started thinking.  And I dug a little deeper into the story when I got home.  I was shocked.

The systems penetrated belong to the US Government Office of Personnel Management.  Yep, that’s the United States Federal Government Human Resources Department.  It holds personal information for everyone who works for the US Federal government.  It’s the agency that hands out security clearances.  Think about this.  Let it sink in.

The Chinese broke into the system that US Government investigators use to store information about background checks for people who want security clearances.  That’s right.  If you applied to the US Government for a security clearance, it’s a good bet the Chinese know a lot about you now.  Which means you’ll probably be the target of some finely crafted spear phishing campaigns for the next several years.

And that’s only one system out of 47 operated by the Office of Personnel Management (OPM).  It’s not the only one the Chinese penetrated.

Update:  According to this Washington Post article, (PDF here in case the link breaks) the Chinese breached the system managing sensitive information about Federal employees applying for security clearances in March 2014.  The latest OPM breach targeted a different data center housed at the Interior Department.

Update June 12, 2015:  The original reports were bad.  Now it’s even worse.  It now seems the Chinese have detailed information on every US Federal employee.  14 million, not 4 million.  And people may die directly because of this breach. But even now, we don’t know the extent of the damage.  This article from Wired Magazine sums it up nicely.

Reactions from high government officials were typical. They all take the problem seriously.  Bla bla bla.  According to the Wall Street Journal:

“We take very seriously our responsibility to secure the information stored in our systems, and in coordination with our agency partners, our experienced team is constantly identifying opportunities to further protect the data with which we are entrusted,” said Katherine Archuleta, director of the Office of Personnel Management.

Here’s another one, from the New York Times:

“The threat that we face is ever-evolving,” said Josh Earnest, the White House press secretary. “We understand that there is this persistent risk out there. We take this very seriously.”

This one from the same Washington Post article is my favorite:

“Protecting our federal employee data from malicious cyber incidents is of the highest priority at OPM,” Director Katherine Archuleta said in a statement.

Do I really need to ask the question?  Katherine, if it’s such a high priority then why didn’t you address the problem?

As I mentioned in a blog post way back in Feb. 2014, about dealing with disclosures, we’ve heard lots of noise about this breach but very little useful information.  Here’s what we do know.  I want to thank David E. Sanger, lead author of the New York Times article, “U.S. Was Warned of System Open to Cyberattacks,” for sending me a link the 2014 Federal Information Security Management Act Audit report.  In case that link breaks, here is a PDF.

We know the Chinese penetrated the OPM in fall 2014 and stole at least 4 million records over the next six months.  That’s it. As usual, nobody I can find is forthcoming with details.

The report from the Office of Inspector General (OIG) gives us some clues.  Apparently, the various program offices that owned major computer systems each had their own designated security officers (DSO) until FY 2011.  The DSOs were not security professionals and they had other jobs, which means security was a bolted on afterthought.  In FY2012, OPM started centralizing the security function.  But by 2014, only 17 of the agency’s 47 major systems operated under this tighter structure.

All 47 major systems are supposed to undergo a comprehensive assessment every three years that attests that a system’s security controls meet the security requirements of that system.  It’s a rigorous certification process called Authorization.  Here’s what the report said:

“However, of the 21 OPM systems due for Authorization in FY 2014, 11 were not completed on time and are currently operating without a valid Authorization (re-Authorization is required every three years for major information systems). The drastic increase in the number of systems operating without a valid Authorization is alarming, and represents a systemic issue of inadequate planning by OPM programming offices to authorize the information systems that they own.”

Remote access also had problems.  Apparently the VPN vendor OPM uses claims the ability to terminate VPN sessions after an idle timeout.  But the idle timeout doesn’t work and the vendor won’t supply a patch to fix it.

Identity management was also weak.  Although OPM requires multi-factor authentication to enter the network, none of the application systems do.  So if a Chinese bad guy penetrates the network, he apparently has free reign to everything in it once inside.  And since OPM had no inventory of what systems it owned or where they were or their use, OPM had no way to know the Chinese were plundering their data.

It adds up to a gigantic mess.  And an embarrassment, which probably explains why nobody wants to talk about details.

Wonderful.  So what can a small IT contractor from Minnesota offer the multi trillion dollar United States Federal Government to address this problem?  Here are some suggestions from an outsider who wrote a book about data breaches.

Three attributes will keep our systems safe.  Sharing, diligence, and topology.

Sharing drives it all.  So first and foremost – move from a culture of hierarchy, secrecy, and “need to know” to a culture of openness, especially around security.  What does that even mean?  For an answer, check out the new book by Red Hat CEO Jim Whitehurst, “The Open Organization,” published by the Harvard Business Review.

The Chinese, and probably others, penetrate our systems because a government culture of secrecy and “need to know” keeps our teams isolated and inhibits collaboration and incentives for excellence.  It’s a traditional approach to a new problem the defies tradition.  I’ll bet the Chinese collaborate with each other, and probably also with the North Koreans.

Instead of a closed approach, adopt an open approach.  Publish source code, build communities around each of those 47 systems, and share them with the world.  To protect it better, share how it all works with the world.

And when breaches happen, don’t tell us how you take security seriously.  You’re supposed to take security seriously.  It’s your job.  Tell us what happened and what steps you’re taking to fix the problem.  Instead of hiding behind press releases, engage with your community.

And use open source tools for all your security.  All of it.  Firewalls, VPN systems, IDS/IPS (Intrusion detection/Intrusion prevention systems), traffic analyzers, everything.  Breaches occur with open source software, just like proprietary software, but when they happen, the open source community fixes them quickly. Why? Because the developers’ names are on the headers and they care about their reputations.  You won’t need to wait years for a VPN patch in the open source world.

Openness doesn’t mean granting access to everyone.  Openness means building communities around the software systems OPM uses and accepting patches and development from the community.  Community members are compensated with recognition and opportunities for paid engagements.  OPM is rewarded with hardened, peer reviewed software driven by some of the smartest people on the planet.

When teams move away from hierarchy to an open culture, diligence and topology will follow.  There is no substitute for diligence and no technology to provide it. Teach everyone to be diligent and practice it often with drills.  Reward the cleverest phishing scheme or simulated attack and reward the cleverest defense.

And topology – put layers of security in front of key databases.  Put in appropriate access and authorization controls for key databases to ensure personal information stays personal.  Consider physically segregating these database systems from the general network and setting up a whitelist for their interactions with the world.

None of this proposed culture shift needs to cost a fortune.  And in fact, in this era of doing more with less, might save taxpayer money by igniting passion at the grass roots of the OPM IT staff.

Am I proposing radical change to a government that resists change?   Yup.  So why do it?  I’ll answer that question with my own question – given the recent headlines and your own Inspector General audit reports from the past several years, how’s the current method working out?

What is redundancy anyway?

I’ve been in the IT industry my entire adult life, so sometimes I use words and just assume everyone thinks they mean the same thing I think they mean.  I was recently challenged with the word, “redundancy.”

“What does that even mean?” asked my friend.

“It means you have more than one.”

“So what?”

“So if one breaks, you can use the other one.”

“Yeah, everyone knows that, but what does it mean with IT stuff?”

Seems simple enough to me, but as I think about it, maybe it’s not so simple.  And analyzing how things can fail and how to mitigate it is downright complex.

Redundancy is almost everywhere in the IT world.  Almost, because it’s not generally found in user computers or cell phones, which explains why most people don’t think about it and why these systems break so often.  In the back room, nearly all modern servers have at least some redundant components, especially around storage.  IT people are all too familiar with the acronym, RAID, which stands for Redundant Array of Independent Disks.  Depending on the configuration, RAID sets can tolerate one and sometimes two disk failures and still continue operating.  But not always.  I lived through one such failure and documented it in a blog post here.

Some people use RAID as a substitute for good backups.  The reasoning goes like this:  “Since we have redundant hard drives, we’re still covered if a hard drive dies, so we should be OK.”  It’s a shame people don’t think this through.  Forget about the risk of a second disk failure for a minute.  What happens if somebody accidentally deletes or messes up a critical data file?  What happens if a Cryptolocker type virus sweeps through and scrambles everyone’s files?  What happens if the disk controller in front of that RAID set fails?

Redundancy is only one component in keeping the overall system available.  It’s not a universal cure-all. There will never be a substitute for good backups.

Virtual environments have redundancy all over the place.  A virtual machine is software pretending to be hardware, so it’s not married to any particular piece of hardware.  So if the physical host dies, the virtual machine can run on another host.  I have a whole discussion about highly available clusters and virtual environments here.

With the advent of the cloud, doesn’t the whole discussion about server redundancy become obsolete?  Well, yeah, sort of.  But not really.  It just moves somewhere else.  Presumably all good cloud service providers have a well thought out redundancy plan, even including redundant data centers and replicated virtual machines, so no failure or natural disaster can cripple their customers.

With the advent of the cloud, another area where redundancy will become vital is the boundary between the customer premise and the Internet.  I have a short video illustrating the concept here.

I build systems I like to call SDP appliances.  SDP – Software Defined Perimeter, meaning with the advent of cloud services, company network perimeters won’t really be perimeters any more.  Instead, they’ll be sets of software directing traffic to/from various cloud services to/from the internal network.

Redundancy takes two forms here.  First is the ability to juggle multiple Internet feeds, so when the primary feed goes offline, the company can route via the backup feed. Think of two on-ramps to the Interstate highway system, so when one ramp has problems, cars can still get on with the other ramp.

The other area is redundant SDP appliances. The freeway metaphor doesn’t work here. Instead, think of a gateway, or a door though which all traffic passes to/from the Internet.  All gateways, including Infrasupport SDP appliances, use hardware, and all hardware will eventually fail.  So the Infrasupport SDP appliances can be configured in pairs, such that a backup system watches the primary. If the primary fails, the backup assumes the primary role. Once back online, the old primary assumes a backup role.

Deciding when to assume the primary role is also complicated.  Too timid and the customer has no connection to the cloud.  Too aggressive and a disastrous condition where both appliances “think” they’re primary can come up.  After months of tinkering, here is how my SDP appliances do it.  The logic is, well, you’ll see…

If the backup appliance cannot see the primary appliance in the private heartbeat network, and cannot see the primary in the  internal network, and cannot see the primary in the external Internet network, but can see the Internet, then and only then assume the primary role.

It took months to test and battle-harden that logic and by now I have several in production.  It works and it’s really cool to watch.  That’s redundancy done right.  If you want to find out more, just contact me right here.