Business Continuity and Disaster Recovery; My Apollo 13 week

I just finished my own disaster recovery operation.  There are still a few loose ends but the major work is done.  I’m fully operational.

Saturday, Dec. 20, 2014 was a bad day.  It could have been worse.  Much worse.  It could have shut my company down forever.  But it didn’t – I recovered, but it wasn’t easy and I made some mistakes.  Here’s a summary of what I did right preparing for it, what I did wrong, how I recovered, and lessons learned.  Hopefully others can learn from my experience.

My business critical systems include a file/print server, an email server, and now my web server.  I also operate an OpenVPN server that lets me connect back here when I’m on the road, but I don’t need it up and running every day.

My file/print server has everything inside it.  My Quickbooks data, copies of every proposal I’ve done, how-to documentation, my “Bullseye Breach” book I’ve been working on for the past year, marketing stuff, copies of customer firewall scripts, thousands of pictures and videos, and the list goes on.  My email server is my conduit to the world.  By now, there are more than 20,000 email messages tucked away in various folders and hundreds of customer notes.  When customers call with questions and I look like a genius with an immediate answer, those notes are my secret sauce.  Without those two servers, I’m not able to operate.  There’s too much information about too much technology with too many customers to keep it all in my head.

And then my web server.  Infrasupport has had different web sites over the years, but none were worth much and I never put significant effort into any of them.  I finally got serious in early 2013 when I committed to Red Hat I would have a decent website up and running within 3 weeks.  I wasn’t sure how I would get that done, and it took me more like 2 months to learn enough about WordPress to put something together, but I finally got a nice website up and running.  And I’ve gradually added content, including this blog post right now.  The idea was – and still is – the website would become a repository of how-to information and business experience potential customers could use as a tool.  It builds credibility and hopefully a few will call and use me for some projects.  I’ve sent links to “How to spot a phishy email” and other articles to dozens of potential customers by now.

Somewhere over the past 22 months, my website also became a business critical system.  But I didn’t realize it until after my disaster.  That cost me significant sleep.  But I’m getting ahead of myself.

All those virtual machines live inside a RHEV (Red Hat Enterprise Virtualization) environment.  One physical system holds all the storage for all the virtual machines, and then a couple of other low cost host systems provide CPU power.  This is not ideal.  The proper way to do this is put the storage in a SAN or something with redundancy.  But, like all customers, I operate on a limited budget, so I took the risk of putting all my eggs in this basket.  I made the choice and given the cost constraints I need to live with, I would make the same choice again.

I have a large removable hard drive inside a PC next to this environment and I use Windows Server Backup every night to back up my servers to directories on this hard drive.  And I have a script that rotates the saveset names and keeps 5 backup copies of each server.

Ideally, I should also find a way to keep backups offsite in case my house burns down.  I’m still working on that.  Budget constraints again.  For now – hopefully I’ll be home if a fire breaks out and I can grab that PC with all my backups and bring it outside.  And hopefully, no Minnesota tornado or other natural disaster will destroy my house.  I chose to live with that risk, but I’m open to revisiting that choice if an opportunity presents itself.

So what happened?

20141225_060536Annotated_48pt

The picture above summarizes it.  **But see the update added later at the end of this post.**  I walked downstairs around 5:30 PM on Saturday and was shocked to find not one, but two of my 750 GB drives showing amber lights, meaning something was wrong with them.  But I could still survive the failures. The RAID 5 array in the upper shelf with my virtual machine data had a hot spare, so it should have been able to stand up to two failing drives.  At this point only one upper shelf drive was offline, so it should have been rebuilding itself onto the hot spare.  The 750 GB drives in the bottom shelf with the system boot drive were mirrored, so that array should (and did) survive one drive failure.

I needed to do some hot-swapping before anything else went wrong.  I have some spare 750 GB drives, so I hot-swapped the failed drive in the upper shelf.  My plan was to let that RAID set rebuild, then swap the lower drive for the mirror set to rebuild.  And I would run diagnostics on the replaced drives to see what was wrong with them.

I bought the two 2 TB drives in slots 3 and 4 of the lower shelf a few months ago and set them up as a mirror set, but they were not in production yet.

Another note.  This turns out to be significant.  It seems my HP 750 GB hotswap drives have a firmware issue.  Firmware level HPG1 has a known issue where the drives declare themselves offline when they’re really fine.  The cure is to update the firmware to the latest version, HPG6.  I stumbled onto that problem a couple months ago when I brought in some additional 750 GB drives and they kept declaring themselves offline.  I updated all my additional drives, but did not update the drives already in place in the upper shelf – they had been running for 4+ years without a problem.  Don’t fix it if it ain’t broke.  This decision would bite me in a few minutes.

After swapping the drive, I hopped in the car to pick up some takeout food for the family.  I wouldn’t see my own dinner until around midnight.

I came back about 1/2 hour later and was shocked to find the drive in the upper shelf, slot 4 also showing an amber light.  And my storage server was hung.  So were all the virtual machines that depended on it.  Poof – just like that, I was offline.  Everything was dead.

In my fictional “Bullseye Breach” book, one of the characters gets physically sick when he realizes the consequences of a server issue in his company.  That’s how I felt.  My stomach churned, my hands started shaking and I felt dizzy.  Everything was dead.  No choice but to power cycle the system.  After cycling the power, that main system status light glowed red, meaning some kind of overall systemic failure.

That’s how fast an entire IT operation can change from smoothly running to a major mess.  And that’s why good IT people are freaks about redundancy – because nobody likes to experience what I went through Saturday night.

Faced with another ugly choice, I pulled the power plug on that server and cold booted it.  That cleared the red light and it booted.  The drive in upper shelf slot 4 declared itself good again – its problem was that old HPG1 firmware.  So now I had a bootable storage server, but the storage I cared about with all my virtual machine images was a worthless pile of scrambled electronic bits.

I tried every trick in the book to recover that upper shelf array.  Nothing worked, and deep down inside, I already knew it was toast.  Two drives failed.  The controller that was supposed to handle it also failed. **This sentence turns out to be wrong.  See the update added later at the end.**   And one of the two drives in the bottom mirror set was also dead.

Time to face facts.

Recovery

I hot swapped a replacement drive for the failed drive in the bottom shelf.  The failed drive already had the new firmware, so I ran a bunch of diagnostics against it on a different system.  The diagnostics suggested this drive really was bad.  Diagnostics also suggested the original drive in upper slot 2 was bad.   That explained the drive failures.  Why the controller forced me to pull the power plug after the multiple failures is anyone’s guess.

I put my  2 TB mirror set into production and built a brand new virtualization environment on it.  The backups for my file/print and email server virtual machines were good and I had both of those up and running by Sunday afternoon.

The website…

Well, that was a different story.  I never backed it up.  Not once.  Never bothered with it.  What a dork!

I had to rebuild the website from scratch.  To make matters worse, the WordPress theme I used is no longer easily available and no longer supported.  And it had some custom CSS commands to give it the exact look I wanted.  And it was all gone.

Fortunately for me, Brewster Kahle’s mom apparently recorded every news program she could get in front of from sometime in the 1970s until her death.  That inspired Brewster Kahle to build a website named web.archive.org.  I’ve never met Brewster, but I am deeply indebted to him.  His archive had copies of nearly all my web pages and pointers to some supporting pictures and videos.

Is my  website a critical business system?  After my Saturday disaster, an email came in Monday morning from a friend at Red Hat, with subject, “website down.”  If my friend at Red Hat was looking for it, so were others.  So, yeah, it’s critical.

I spent the next 3 days and nights rebuilding and by Christmas eve, Dec. 24, most of the content was back online.   Google’s caches and my memory helped rebuild the rest and by 6 AM Christmas morning, the website was fully functional again.  As of this writing, I am missing only one piece of content.  It was a screen shot supporting a blog post I wrote about the mess at healthcare.gov back in Oct. 2013.   That’s it.  That’s the only missing content.  One screen shot from an old, forgotten blog post.  And the new website has updated plugins for SEO and other functions, so it’s better than the old website.

My headache will no doubt go away soon and my hands don’t shake anymore.  I slept most of last night.  It felt good.

Lessons Learned

Backups are important.  Duh!  I don’t have automated website backups yet, but the pieces are in place and I’ll whip up some scripts soon.  In the meantime, I’ll backup the database and content by hand as soon as I post this blog entry.   And every time I change anything.  I never want to experience the last few days again.  And I don’t want to even think about what shape I would be in without good backups of my file/print and email servers.

Busy business owners should  periodically inventory their systems and update what’s critical and what can wait a few days when disaster strikes.  I messed up here.  I should have realized how important my website has become these past several months, especially since I’m using it to help promote my new book.  Fortunately for me, it’s a simple website and I was able to rebuild it by hand.  If it had been more complex, well, it scares me to think about it.

Finally – disasters come in many shapes.  They don’t have to be fires or tornadoes or terrorist attacks.  This disaster would have been a routine hardware failure in other circumstances and will never make even the back pages of any newspaper.

If this post is helpful and you want to discuss planning for your own business continuity, please contact us and I’ll be glad to talk to you.  I lived through a disaster.  You can too, especially if you plan ahead.

Update from early January, 2015 – I now have a script to automatically backup my website.  I tested a restore from bare virtual metal and it worked – I ended up with an identical website copy.  And I documented the details.

Update several weeks later. After examining the RAID set in that upper shelf in more detail, I found out it was not RAID 5 with a hot spare as I originally thought.  Instead, it was RAID 10, or mirroring with striping.  RAID 10 sets perform better than RAID 5 and can stand up to some cases of multiple drive failures, but if the wrong two drives fail, the whole array is dead.  That’s what happened in this case.  With poor quality 750 GB drives, this setup was an ugly scenario waiting to happen.

Please pardon my mess

Green is good.  The multiple amber lights from Saturday were bad.  Really bad.
Green is good. The multiple amber lights last Saturday were bad. Really bad.

By Jan. 4, 2015, I now have an automated backup script in place.   Every day at 4 PM, I backup the website and supporting files to my Windows server.  From there, it goes to an external backup system every night.  I built another test VM and restored my backups to it and viola – another identical copy of the Infrasupport website.  So if another outage like this hits, I’ll be ready.

***********************

As of 6AM Thursday, Christmas morning, Dec. 25, 2014, the website is fully back online.  It’s all there.  All the blog posts.  All the pages.  Everything. Except one missing picture I haven’t been able to find yet.  It was a screenshot I took with a blog entry I wrote 14 months ago about the disaster at healthcare.gov and I have no idea what I did with it after copying it to the website.  As far as I can tell, that’s the single, one and only missing piece of content.  55 pages, 25 blog posts, 77 pieces of media.  All recovered by hand either from archive.org, Google caches, or memory.   And I’m missing one screen shot from one forgotten blog post.  Not bad.

Merry Christmas.  Maybe now I can get some sleep.  Backups first this time.  Then sleep.

***********************

As of 4AM Dec. 24, the most important web pages should be back online. The submenu items and some of the blog entries are still missing.

***********************

Pardon the mess while I rebuild the Infrasupport website.  It was a casualty of a catastrophic hardware failure on Saturday night, 12/20/2014.  I’ll have it back up with updated versions of WordPress and the new Responsive Mobile theme from Cyberchimps in a few days.  Watch for minute by minute updates to menus and content while I recover it all and a new blog post with details once I get it back in shape again.  In the meantime, a hearty thanks to the folks at archive.org.  Until I’m fully recovered, you can find a copy of most of the content here.

– Greg Scott
Dec. 22, 2014

Updated Dec. 24, Dec. 25, 2014, and January 6, 2015.

We take your privacy seriously. Really?

By now, we’ve all read and digested the news about the December 2013 Target breach.  In the largest breach in history at that time and the first of many sensational headlines to come, somebody stole 40 million credit card numbers from Target POS (point of sale) systems.  We’ll probably never know details, but it doesn’t take a rocket scientist to connect the dots.   Russian criminals stole credentials from an HVAC contractor in Pennsylvania and used those to snoop around the Target IT network.   Why Target failed to isolate a vendor payment system and POS terminals from the rest of its internal network is one of many questions that may never be adequately answered in public.  The criminals eventually planted a memory scraping program onto thousands of Target POS systems and waited in Russia for 40 million credit card numbers to flow in.  And credit card numbers would still be flowing if the banks, liable for fraudulent charges, hadn’t caught on.  Who says crime doesn’t pay?

It gets worse – here are just a few recent breach headlines:

  • Jimmy John’s Pizza
  • Dairy Queen
  • Goodwill Industries
  • KMart
  • Sally Beauty
  • Neiman Marcus
  • UPS
  • Michaels
  • Albertsons
  • SuperValu
  • P.F. Chang’s
  • Home Depot

And that’s just the tip of the iceberg.  According to the New York Times:

The Secret Service estimated this summer that 1,000 American merchants were affected by this kind of attack, and that many of them may not even know that they were breached.

Every one of these retail breaches has a unique story.  But one thing they all have in common; somebody was asleep at the switch.

In a few cases, the POS systems apparently had back doors allowing the manufacturer remote access for support functions.  Think about this for a minute.  If a manufacturer can remotely access a POS system at a customer site, that POS system must somehow be exposed directly to the Internet or a telephone line.  Which means anyone, anywhere in the world, can also remotely access it.

Given the state of IT knowledge among small retailers, the only way that can happen is if the manufacturer or somebody who should know better helps set it up.  These so-called “experts” argue that the back doors are obscure and nobody will find them.  Ask the folks at Jimmy John’s and Dairy Queen how well that reasoning worked out.  Security by obscurity was discredited a long time ago, and trying it now is like playing Russian Roulette.

And that triggers a question.  How does anyone in their right mind expose a POS system directly to the Internet?  I want to grab these people by the shoulders and shake as hard as I can and yell, “WAKE UP!!”

The Home Depot story may be the worst.  Talk about the fox guarding the chicken coop!  According to several articles, including this one from the New York Times, the very engineer Home Depot hired to oversee security systems at Home Depot stores was himself a criminal after sabotaging the servers at his former employer.  You can’t make this stuff up.  Quoting from the article:

In 2012, Home Depot hired Ricky Joe Mitchell, a security engineer, who was swiftly promoted under Jeff Mitchell, a senior director of information technology security, to a job in which he oversaw security systems at Home Depot’s stores. (The men are not related.)

But Ricky Joe Mitchell did not last long at Home Depot. Before joining the company, he was fired by EnerVest Operating, an oil and gas company, and, before he left, he disabled EnerVest’s computers for a month. He was sentenced to four years in federal prison in April.

Somebody spent roughly 6 months inside the Home Depot network and stole 56 million credit card numbers before the banks and law enforcement told Home Depot about it.  And that sums up the sorry state of security today in our corporate IT departments.

I’m picking on retailers only because they’ve generated most of the recent sensational headlines.  But given recent breaches at JP Morgan, the US Postal Service, the US Weather Service, and others, I struggle to find a strong enough word.  FUBAR maybe?  But nothing is beyond repair.

Why is security in such a lousy state?  Home Depot may provide the best answer.  Quoting from the same New York Time article:

Several former Home Depot employees said they were not surprised the company had been hacked. They said that over the years, when they sought new software and training, managers came back with the same response: “We sell hammers.”

Great.  Just great.  What do we do about it?

My answer – go to the top.  It’s up to us IT folks to convince CEOs and boards of directors that IT is an asset, not an expense.  All that data, and all the people and machines that process all that data, are important assets.  Company leaders need to care about its confidentiality, integrity, and availability.

That probably means spending money for education and training.  And equipment.  And professional services for a top to bottom review.  Where’s the ROI?  Just ask some of the companies on the list of shame above about the consequences of ignoring security.  The cost to Target for remediation, lost income, and shareholder lawsuits will be $billions.  The CEO and CIO lost their jobs, and shareholders mounted a challenge to replace many board members.

Granted, IT people speak a different language than you.  Guilty as charged.  But so does your mechanic – does that mean you neglect your car?

One final plug.  I wrote a book on this topic.  It’s a fiction story ripped from real headlines, titled “Bullseye Breach.”  You can find more details about it here.

“Bulls Eye Breach” is the real deal.  Published with Beaver’s Pond Press, it has an interesting story with realistic characters and a great plot.  Readers will stay engaged and come away more aware of security issues.  Use the book as a teaching tool.  Buy a copy for everyone in your company and use it as a basis for group discussions.

And why do I want a tablet again?

­

I parted with some money 2 years ago and bought myself a tablet.  My wife – the most un-high-tech person on this planet said, “Greg, you need to get one of those.  The world is changing and you need to know what’s going on.”  Just one many reasons why love her.

So I bought the tablet and a little keyboard that somehow connects to it.  I brought it home, and …

Well, it pretty much sat here for the past two years.  I used it a little bit.  It has a larger screen than my cell phone so I can look at email when I’m traveling.  But the problem is, when I’m traveling and I want to use the tablet to look at email, I have to dig the tablet out of my backpack, power it up, and wait.  If I have to dig stuff out, I may as well dig out my reading glasses and use my cell phone for email.  Or if I’m stationary for a while, why not use my laptop for email.  It’s a richer experience anyway.

It didn’t help that a few months after I bought this tablet, it lost its 4G cellular connection and I had to send it in for repair.  But a curious thing when it came back – I was in no hurry to load any apps on it.

And now the latest – I charge it up under a bedside table, next to my cell phone.  I like my electronics close by – it’s an occupational hazard.  I haven’t used it in something like 3 months.  Until a few days ago, when I wanted to use it as an ebook reader.  So I turned it on and… it hung. It won’t power up.  The battery is fully charged, but it hangs at boot time.  Fiddling with buttons to reset it back to factory settings are useless.  It has a hardware problem, probably stemming from when my 2 1/2 year old youngest grandson jumped on it while trying to climb on his grandpa.

So for the past few days, I’ve been struggling to figure out the right replacement strategy.  There are some **nice** tablets out there.  I can spend less than $200 for something decent, all the way up to $1000+ for a top of the line model, with something like 800 or so price points in-between.   The available tablet choices are amazing.

But here’s my dilemma – why?  After two years, the only app I care about that the tablet does better than any other electronic device is read ebooks.  Everything else either works better on a full fledged PC, or the portability of the phone trumps the larger screen size of the tablet.

I may not be alone in my dilemma.  Apparently, the decline in PC sales is reversing as tablet sales decline.  And unless I can find a compelling reason to invest in a nice tablet, I’ll probably just get a cheap one to use as a book reader.

Oh – and that keyboard I bought 2 years ago?  I never unpacked it out of its cardboard box.

I’m interested in your thoughts around this.  I’ve been buried with spam comments, so use this Contact Us link to send me what you think and I’ll post it.

Mostly Bad Telemarketing

­

I’ve taken thousands of telemarketing cold calls over the years.  Many are deceptive, most are awful.  And a few are good.

Here is a common deception.  Many calls originate in call centers in India or the Philippines.  Callers use IP phones and connect via the Internet to a system in the US, which generates a caller-ID in my local area code.   When my phone rings, I see a caller ID that appears local, which makes me want to answer the phone because I think it might be a potential customer who wants to buy my goods and services.

The conversations generally start something like this:

Hello, this is Greg Scott.  (I always answer my phone this way.)

(long pause, sometimes with a series of clicks)

Yes, hello, I am trying to contact Agreg a Scote, uhm, with Infrasupport-a-tech company?

Well, yes, this is Greg Scott

Ah, good morning, sir Greg.  I am calling  because… (and we’re into the script flowchart).

Why do I dread these calls?  After all, the caller is courteous.  And the company she represents is only trying to find customers.  What’s not to like?

Well, plenty.  First, the call is an interruption.  I have to stop what I’m doing, switch gears, and make a decision whether to answer the phone.  After I answer the phone, I have to focus on the caller’s message instead of what I was working on before the phone rang.   I understand callers are trying to find customers and this is part of business.  I’ve done cold calls myself.  But since callers know they interrupted me, they should respect me and my time.

That leads to the next problem.  From the very first ring, overseas callers with automated dialers and IP phones have already disrespected me and my time.  Why is this company trying to fool me into believing it’s a local call?  How am I supposed to trust a company that tries to deceive me with the very first contact?  Why would I ever consider buying anything from such a company?

Focusing on the caller’s message is also challenging.  I speak English as a native language, my hearing is not what it used to be, and I have a terrible time understanding the thick accent on the other end of the phone.   And I am willing to bet, nobody in Bangalore, India is named Gary.  Or Bob, John, Ted, Mary, or any other common English name.

Here are two questions I want to ask these telemarketing firms – not the callers trying to do their jobs, but the boneheads who manage the callers.  If I tried to speak your native language and you heard my American accent, how much time would you need to figure out your language is not my first language?  If I adopt a telephone name native to your language, would it make a difference?   The obvious answers to those questions are about one second and no.  So why do you think I will believe your caller speaks English as a native language simply because you gave him an American telephone name?

The problem is compounded by poor sound quality.  After the packets containing the sound from these calls bounce around dozens of IP routers before flying across the public telephone network to my phone, the sound is often garbled, muffled, and distorted.  Combined with a thick accent, it is always difficult and often impossible to figure out what callers are saying.

For companies using these services – if you have such a low regard for me as a potential customer, what kind of service can I expect if I buy your product or service?

And it get worse.

Lately, I’ve taken dozens of calls from machines pretending to be people.  Here is a typical call, synthesized from many:

The phone rings, showing a local caller ID.

Hello, this is Greg Scott

(Long pause – this is always the dead-giveaway.)

Why hello!  This is Nancy and I have an exciting offer for you!

Really – wow, thanks Nancy.  Are you a real person?

(Pause)

(Laughing)  Well of course I’m real, why do you ask?

Well, Nancy, you sounded like a machine.

Oh no, I can assure you, I’m a real person.  I’d like to talk to you about a great line of credit we offer.  If you’re interested, I’ll connect you to my manager and he can cover details with you.

Ah – thanks Nancy.  By the way, who won the baseball World Series last year?

I’m sorry, could you repeat that?

Yes – who won the World Series last year?

I’m sorry, but we’re not allowed to give out personal information.

What’s personal about that?

Thank you.  Goodbye.

Let’s see, what was wrong with this call?  After all, it solved the sound quality problem and the caller’s native language matches mine.  I can visualize a team of misguided engineers,  proud of their creative masterpiece, presenting the slideshow bullet items to a bunch of boneheaded executives in a boardroom and congratulating themselves on solving their telemarketing problem.

Here is my question for the clowns who dream up this stuff.  If your time and money is too valuable to use a real person fluent with my language to make real phone calls, why do you think my time and money is any less valuable?  Do you really think I will buy anything from you when you use a machine to waste my time and lie to me?

So after griping about bad calls, what about somebody who did it right?  Well, it happened one day last year when a nice lady from a training company called me.  Let’s call her Dee and her company, Training Inc.  These are both fictional names.  Dee did everything right.  It was obvious she looked at my website before she made contact because she tailored her pitch to meet my unique circumstances.  She asked me a bunch of questions about how I run my business.  She asked me about my training goals.   She was personable.  She spoke the same native language as me.  Instead of trying to fool me into thinking she was from this area, the caller-ID was from a different state.

I liked Dee.  We connected.  I don’t have any business for her right now, but when the time comes and I am able to send business her way, I will do so.  In fact, I liked Dee so much, I spent most of a Saturday updating and fixing my broken Exchange Server indexing so I could find her contact information.

If you are a telemarketer and happen to read this blog entry, first, thanks for reading.  Your company probably needs IT support, so just click or tap right here to contact us.  But more importantly – if you’re spending money for overseas call centers with cheap IP phones, bad connections, and fake caller IDs, or if you’re trying to use machines pretending to be people, save your money.  Nobody in their right mind will buy anything from you when you approach them this way.  Instead, find somebody like Dee who will represent you properly.

Even in today’s high-tech, 24 hour, over stressed environment, the old-fashioned rules still apply.

How to do bad customer service and destroy your reputation

­

I survived one of the worst customer service experiences ever this week.  We can all draw some lessons from this story.

The end user customer operates branch sites across the Midwest USA and uses Infrasupport firewalls to connect to the Internet and main office in the St. Paul, MN. area.  This branch site is in southern Illinois and uses a nifty new twist on my network failover system.

The firewall has a wireless LAN (WiLAN) card and several wired network interface cards (NICs). The new twist – connect a wired NIC to a low cost DSL connection. When the DSL modem drops, failover to the WiLAN connection and route through a cell phone carrier. And when the DSL connection comes back, fail back to the DSL wired connection. It works – this is the lowest cost redundancy anyone can buy.

This site opened about 6 weeks ago and the DSL connection wasn’t ready.  No problem.  We routed via WiLAN over the cell phone carrier and waited more than a month for Ma Bell Internet to provision the DSL service.

Ma Bell Internet is a fictitious name. As are the names of everyone else in this story other than me. Other than the names, every detail is true. As one person involved said, “you can’t make this stuff up.”

The story starts last Wednesday after I did the appropriate adjustments on the firewall to accommodate the new DSL connection.  Tom, the end user customer at the site, connected the patch cable from the DSL modem to the firewall and nothing worked.  Our troubleshooting pointed to a misconfigured or bad DSL modem.

We logged a service call with Ma Bell Internet, which triggered a week long comedy of errors. We called on Thursday and talked to – I’ll call her Misty – from tech support.   Here is a piece of our conversation:

Greg: You have a gateway at this IP Address. We can’t ping it from the site.

Misty: I don’t see why you can’t ping it. I can ping it from here.

Greg: Right – I don’t know why we can’t ping it. That’s why we’re calling. Nobody can route through that DSL modem and we can’t ping it.

Misty: I don’t understand why you can’t ping that modem. I can ping it from here, you should be able to ping it from there.

Greg: Right – we should.  And if it answered it would be even better.

Misty: So how come you can’t ping it?

Greg: Routers have two sides. We’ll call them the inside and outside. You’re on the outside and you can ping it. The site is on the inside and can’t ping it. Seems to me, something is wrong with the inside of that router.

Misty: Well if you can’t ping it, there’s not much I can do.

Greg: Yes there is!  It’s your router.  You need to fix it.

Misty: I tried connecting to it from here remotely to check its configuration but I’m not able to.

Greg: So – doesn’t that suggest to you something is wrong?

Misty: No. We connect those the way we’re ordered to connect them. If the order said no remote diagnostics then we wouldn’t have turned that on.  And I can ping it.

Greg:  So did the order for this one say no remote diagnostics?

Misty:  I don’t know what the order said – I don’t have it.

Greg:  So how do you know the order didn’t call for remote diagnostics?

Misty:  Because I’m not able to connect to it.

Greg: (Exercising patience)  OK, so how do we fix this?

Misty: You can run remote diagnostics from your site. Just hook a real computer up to it and connect to its website.

Greg: The problem with that is, we can’t access the modem. If we could access the modem we wouldn’t need to call you. And the only computer I have onsite is my firewall. Everything else is thin clients.

Misty: If you’re unwilling to run remote diagnostics, I can’t do much to help you.

Greg: Yes you can. Send somebody out there to fix the modem!

Misty: Maybe you can borrow a laptop from somebody and try the diagnostics.

Greg: There are no laptops to borrow at this site. We have my firewall – with no graphics so connecting to a GUI website won’t work anyway – and some thin clients. That’s it.

Misty: I can dispatch a technician, but it will be billable.

Greg: What??? Why is it billable?

Misty: Because you’re unwilling to run remote diagnostics.

Greg: No, not unwilling.  Unable.  We are unable to connect to this modem.  If we can’t connect to it, how do we run diagnostics on it?

Misty: If I dispatch a technician and he finds a building wiring problem or something else not from us, we’ll have to charge you.

Greg: Well of course. The wiring is a 20 foot cat5e patch cable.  That’s it.  That’s the building wiring.  One patch cable.  Send somebody out.

That was on Thursday.  I asked for somebody onsite Friday.  But with nobody available Friday, I had to settle for Monday.  My phone rang on Friday and I confirmed it – send somebody Monday, preferably Monday morning.

Monday morning came and I learned Ma Bell sent somebody to the site on Saturday.  Of course nobody was at the site on Saturday.  After some Monday phone calls, we scheduled another visit for Wednesday – nobody from Ma Bell was available Tuesday.  So now were were a week into this issue.  After opening the support ticket on Thursday, the soonest possible resolution would not happen until next Wednesday.

Wednesday morning about 9:45 my phone rang.  It was the onsite Ma Bell support technician.  Let’s call him BA.  You’ll see why in a minute.

BA told me the customer became angry and sent him upstairs to call me.  I apologized and said there was a systemic problem at Ma Bell Internet and he was the guy onsite who had to hear it all.   BA told me he also got mad at the customer and apparently started throwing boxes around the site.  And that was when emails started pouring into my inbox warning me about this technician and his attitude. Apparently, BA was angry the second he walked in the door and took out his frustrations on Tom, the end user customer.  Tom got tired of the abuse and sent BA upstairs to call me.

BA asked me a bunch of questions I didn’t know how to answer.  He wanted to know how to configure the router password and who to call for onsite tech support.  I asked BA – since he worked for Ma Bell Internet – if he shouldn’t know who to call for support?  BA launched into a diatribe about his employer and his frustrations about training and scheduling and his management.  Here is the portion I remember most vividly:

Greg: So how do you know these connections are good?

BA: We set them up dynamic and then connect them to the Internet. If they work, then we give them their static address and we’re done.

Greg: So how do you know they work when they’re static?

BA: Because they worked when they were dynamic.

Greg: Don’t you think you should test them when they’re static?

BA: That’s not what we do.

Greg: Maybe you should give the Ma Ball guys some feedback about…

BA: (Interrupting) They won’t listen.

Greg: How badly do they want to keep customers?

BA: They won’t listen. They’re managers and they don’t care!

Greg: OK. Well for now, we need to fix this modem. I need it to to have the static IP Address assigned to it, no NAT, no DHCP, and no firewall rules…

BA: (Interrupting again) Hold on there – Maybe those were English words but I have no idea what any of that stuff means. I’m not an IT guy.  I don’t know why customers keep trying to get me to solve their IT problems. I’m not an IT guy!

Greg: Well, that’s OK.  I’m an IT guy so I can cover that part.  Hang on a second – (recalling my notes from last week – fortunately I still had my scrap of paper handy and near the top of my pile of scraps of paper with notes from countless other customer engagements) I have the Ma Bell Internet toll free number right here.

I called the number and tried to conference us all together.  The conference call didn’t work.  I told BA I needed to hang up to clear my messed up conference call and I would call him right back and try again.

We hung up and I called the toll free number again. This time it worked and I pressed the phone buttons to talk to somebody in provisioning.  Provisioning sent me to Tech Support and I talked to a helpful (finally!) lady I’ll call Ingrid.

I told Ingrid that BA was standing by onsite and we needed to configure this modem.  Ingrid said they can’t disturb technicians who are onsite working.  I said he’s waiting for us to call right now.  I put Ingrid on hold, called BA onsite and his phone immediately went to voicemail.  And an email came into my inbox from Tom – BA was on the phone with somebody else.

I came back to Ingrid and explained BA was on the phone with somebody else and his phone went to voicemail.  And then my phone beeped.  It was BA.  I put Ingrid on hold again, told BA to hang up and stand by and we would conference him in.  Back to Ingrid – but my cell phone was now tied up because BA’s inbound call tied up my conferencing capability even after BA hung up.  Ingrid would have to do it.  So Ingrid tried to call BA.  Ingrid reported BA’s phone answered and disconnected.  She tried again, same problem.

My cell phone beeped again.  It was BA, reporting that somebody tried to call him twice.  He answered but could not hear anyone so he hung up.  I could hear frustration rising in BA’s voice.  So I explained to BA that Ingrid was trying to conference us all together and to stand by.  ”Hang in there, we’ll make this happen.  I promise.”

Back to Ingrid.  I told Ingrid what BA told me and Ingrid said she would hang up with me and try to call BA and conference me in.  I told Ingrid what we needed with that modem.  Give it the block of static addresses it’s supposed to have, turn off DHCP, turn off NAT, turn off all firewall rules.  Bless her heart, Ingrid knew what I was talking about.  She repeated it and we hung up.

Whew!

My phone rang a minute later.  It was Pat from Ma Bell Internet.  Pat had BA on the line and wanted to conference me in.  I don’t know what happened to Ingrid.  I said, “yes, absolutely!”

Finally – we had all the right people together on the same call at the same time. Now we could get to work.

Meantime, my email inbox chirped with requests from the customer main office for status updates.  Like most IT people, I can type and talk at the same time.  It’s a skill we all learn sooner or later.  So I updated the customer via email while I talked to Ma Bell Internet on the phone.

Pat asked if I wanted the modem to be a bridge or router. I told her I didn’t care, as long as this site could get to the Internet.  So Pat decided to try setting it up as a bridge.  Pat talked BA through the steps and suddenly, nobody from inside or outside could ping that gateway address anymore.  Woops.

So we had to configure it as a router.  BA groaned – “so that means I have to type in that long password string again?”

“Yes”, I said. “Sorry. I wish there was another way to do it, but you’re there onsite and I need your hands and eyes.”

Pat talked Mike through the steps to reset the modem and configure it again.  And after all that – after trying to teach a telephone support technician what routers do, after a week of bungled appointments waiting for Ma Bell Internet to send somebody, after reassuring an onsite technician with a bad attitude, after juggling conference calls that refused to conference,  after all that, here was the problem. This was why the site could not route to the Internet over that modem.

BA: And I’m setting the subnet mask to 255.255.255.0, right?

Greg: (Listening passively until now) NO!

Pat: Uhm,  no, change that last octet to 248. So the subnet mask should read 255.255.255.248.

BA: Oh – I always set it to 255.255.255.0. I don’t even know what it means.

Greg: Well, it’s important to get that one right.

Pat: (Didn’t say anything.)

And viola – it all worked. I started up an SSH session into my nifty onsite firewall and was ready to thank everyone for getting this up and running when everything dropped again.

After some more troubleshooting, Pat said, “I see some upstream errors here.” I said, “I’m pinging in another window and I notice the ping times zoom up from around 60 ms to more than 200 ms when my SSH session drops.”

Pat said, “Hold on a minute, I’ll bring somebody else in.” Pat put us on hold, leaving BA and I together.

After a few seconds, BA said, “Where did she go? Didn’t she say she was getting somebody?  What’s going on here?”

“Just hang in there.  Give her a minute to come back.”

A few seconds later, a man I’ll call Galen came on the line. Galen and Pat confirmed something unusual was going on.

And then BA asked us all to wait a minute. “I’m taking the modem offline for a minute.”

Pat said, “Oh – now I don’t see the modem anymore!” I said, “yes, BA said he was taking it offline for a minute.”

A few seconds later, the modem came back online.  BA explained he pulled the telecom cabling out of the punchdown block and punched the wires down again.

And after BA punched down the wires again, the connection stayed solid.  After watching for about 2 minutes, BA said, “I’m outta here!” and left.  Galen, Pat, and I stayed together for another few minutes and then I agreed Ma Bell Internet could close this case.

Wow!

What lessons can we extract from this experience? I named the onsite technician BA because he really did have a bad attitude.  He had a bad attitude because his company is dysfunctional and he handled the stress poorly, making a touchy situation even worse.  How do I know Ma Bell Internet is dysfunctional?  Look at the evidence.  Poorly trained telephone support technicians (Misty), broken scheduling, poor communications, overburdened and under trained onsite technicians.  This is fixable if the managers at Ma Bell Internet want to fix it.  If not, plenty of other Internet providers will eat their lunch.

Spying – The pot calling the kettle black

­

Sometimes when high tech meets international politics, reality really is stranger than fiction.

First, a few enlightened members of our US Congress accused Chinese telecom equipment giant, Hauwei, of spying for the Chinese government. Here is one of many press articles, this one from October, 2012.  Here is another article from 2011.  Apparently, much of the fear on this side of the Pacific about Hauwei is because Hauwei founder and CEO, Ren Zhengfei was once a telecom technician in the Chinese People’s Liberation Army.  The company CEO served in his own country’s military years ago.  Therefore, today’s Chinese government will use equipment from his company to spy on the United States.

I wonder how many American CEOs once served in the US military?  Does it follow that their companies therefore spy on China?

This article from July, 2013 might be one of the best.   Quoting the first sentence in the article:

Former Central Intelligence Agency chief Michael Hayden said that at a minimum, Huawei had provided Chinese officials with “intimate and extensive knowledge of the foreign telecommunications systems.

Farther down, we see this nugget:

Hayden currently serves on the board at Motorola Solutions, and is a principal at security consultancy Chertoff Group.

Yup, that’s the same former Homeland Security Director, Michael Chertoff, who oversaw the US Government’s not-so-brilliant response to hurricane Katrina back in 2005.  Now he runs a consulting company, advising governments and big business how to keep their infrastructure safe.  And Michael Hayden works for him.

As for Motorola Solutions, here is how that company describes itself, from its own website at http://www.motorolasolutions.com:

Motorola Solutions provides business- and mission-critical communication products and services to enterprises and governments.

I should disclose a few things before going any further with this.  First, I am an American and proud of it.  By an accident of birth, I am blessed to live in the best country in the world.  I want the United States to compete fiercely and win all the competitive battles.  I don’t like Chinese counterfeiting, I don’t like spam relayed from Chinese email relay services, and I don’t want anyone spying on me.

I like to think I’m one of the good guys.  I want my country to also be one of the good guys.

I also like level playing fields.  I regularly go up against entrenched companies – American and foreign – and it frustrates me beyond belief when I offer superior solutions but lose because the entrenched competition successfully introduces FUD with the potential customer.  Introducing FUD – Fear, Uncertainty, and Doubt – is a time honored tradition in the high tech marketplace.  The conversations start something like this:

Mr. Customer, are you sure you want to look at this new solution?  You have a lot riding on this project, and even though this new upstart might offer some advantages and they’re less expensive than we are, is it really worth the risk?  After all, we’ll be adding that capability sometime in the next 20 years so they don’t really have any advantage anyway.  Doesn’t it make more sense to stick with us and what you already know?

And bla bla bla…

FUD is often no more than a line of BS, but fear is a powerful motivator.  FUD works – that’s why entrenched incumbents use it.

So now, along comes Hauwei, a Chinese company, and the guy who sits on the board of a direct US competitor accuses Hauwei of spying for the Chinese.  And he made his accusations nearly a year after a US Presidential Commission spent 18 months investigating Hauwei and found no evidence to support the accusations.  Read the details right here.

What’s really going on here?  Hayden and his boss are spreading FUD, wrapped up in the US flag and national security.   But it’s not really about national security.  It’s about keeping a competitor out of the US marketplace.  It’s good old fashioned protectionism mixed with a 21st century high tech twist.  It was never about national security, it’s about money.

And now it gets better.

Because the NSA – the organization Hayden used to run – could not keep its own secrets, we find out the NSA hacked into the Hauwei internal network and spied on Hauwei.  That’s the pot calling the kettle black.

Instead of Hauwei spying on us, we spied on Hauwei.  And got caught.

In what universe is it possible the Chinese are the good guys in this episode?

Why we all should care about net neutrality

­

Many people will see the words, “Net neutrality” and groan about yet more tech gobbledygook and geeks who spend too much time pretending to be Mr. Spock and watching Star Trek re-runs.  Nobody on Main Street cares about net neutrality, right?  Isn’t this all just an arcane concept that never intersects with real people on Main Street?

Well, not so fast.

The real story – behind all the tech jargon – is as old as the first antitrust issue ever to come before the US Government more than 100 years ago.  And it will effect everyone who connects to the Internet, which is pretty much everyone these days.  For people who think tech is only for weenies, think money.  $Billions in money.  And all of it comes from your pocket.

Net neutrality means Internet Service Providers (ISPs) are supposed to treat all Internet traffic equally, end to end.  Every data packet should be treated equally to all other data packets, regardless of source or destination.  ISPs should be neutral carriers and not make judgments about favorable or unfavorable traffic.

Here is the issue.  Without net neutrality, large ISPs will have the legal right to mess with your traffic.  Large players will have monopoly power and will control your access to services you care about.

And what happens when any monopoly player offers its own, competing services?  Forget high tech for a minute.

Let’s say Alice runs a restaurant.  But Bob controls all the streets in town.  If Charlie wants to eat at Alice’s restaurant, Charlie has to travel over Bob’s streets to get there. What happens if Bob’s sister, Doris, opens a restaurant that competes with Alice?  Bob wants to make sure money stays in the family, so Bob sets up toll booths for all travelers on his streets. But people who eat at Doris’s new restaurant get their tolls refunded, courtesy Bob.  Of course, this puts Alice at a competitive disadvantage, so Alice eventually closes.  Before long, Bob controls all the restaurants in town.

Now back to high tech.  Today’s large cable companies offer bundles that include phone service, Internet service, cable TV, and premium services such as movies on demand.  These companies control both distribution and content.  They control many of the streets and some of the restaurants. They want to control all the streets so they can encourage you to eat at their restaurants.

If any single ISP becomes your only choice to connect to the Internet, that ISP controls your access to the services you care about.  ISPs can exercise that control with pricing and surcharge gimmicks, much like the antitrust monopolies of old.  But today’s ISPs also have even more powerful tools.   They can prioritize traffic or play other quality of service games, to treat traffic badly they don’t want to carry.

Today’s familiar services such as Amazon, Netflix, Hulu, Facebook, Google, LinkedIn, and others, at their core, are elaborate websites.  The path from your house or business to those services runs through the Internet.  Without net neutrality, ISPs can grant or deny or regulate or tax access to these services as they see fit.  If an ISP decides it wants to offer, say, retail services, what access policies will it set up for Amazon?  Let’s say you put your business in the cloud, but your ISP offers a competing cloud service.  What quality of service will your ISP give you?

This is not hypothetical.  Comcast, for example, blocks traffic coming from email servers located in home networks.   More ominous, thousands of Netflix users are complaining about bad Netflix movie quality when connected to Comcast.  Comcast counters that it has a right to prioritize traffic as it sees fit because it wants to protect occasional Internet users from heavy downloaders.  Following that line of reasoning, I wonder if Comcast prioritizes its own Movies on Demand service similarly to Netflix, which competes with its own service?

Net neutrality is under constant attack.  If open access to Internet services is important to you – and it should be – then familiarize yourself with the details around net neutrality and make your voice heard.  Your livelihood may depend on it.

What is the right way to deal with IT security vulnerability disclosures?

With all the IT security issues in the news lately, suddenly IT security is everyone’s problem.  One natural question behind the headlines is, what is the right way to handle IT security vulnerabilities?

Here are some thoughts.

To keep things simple, let’s limit this discussion to three major players.  The real world is more complicated, but this is enough to illustrate the concepts. The first player is Bob, leader of an organization.  Next is Ingrid who discovers a security vulnerability.   And, of course, Trudy, the evil intruder we all love to hate.  Trudy spends most of her waking hours probing the Internet, looking for weaknesses she can exploit and secrets she can steal.

Let’s say Bob’s business operates a website and Ingrid finds a security vulnerability that exposes sensitive information about Bob’s customers.  How should Ingrid proceed?

Here is a blog post I put together a few months ago with an example of what happens when players proceed the wrong way.

This is what should happen.  When Ingrid finds the vulnerability, she realizes Trudy is already trying to exploit the weakness to steal personal information from Bob’s customers.  The race is on to fix the problem before Trudy exploits it for her own evil purposes.  And Trudy has a head start.

Ingrid has an ethical duty to immediately inform Bob about the problem and make Bob aware of the potential consequences.  Bob, always skeptical about gloom and doom warnings, listens to Ingrid because Ingrid makes a coherent and credible presentation about the problem.  Bob heeds the warning, fixes the problem, and quickly informs his customers and takes remedial action.  A newspaper or popular blog eventually publishes the story, giving credit to Ingrid for her dedication.  Evan, an executive from an influential software company, reads the story and offers Ingrid a job as Director of IT Security.   Everyone lives happily ever after, except Trudy, who was denied the opportunity to steal from somebody.

That’s how things should work.  But it doesn’t always happen that way.

Let’s say Ingrid presents the problem to Bob, but Bob ignores the warnings.  Now what?  Trudy is out there.  When Trudy finds Bob’s vulnerability, she will exploit it and steal from Bob’s customers.  Trudy might even drive Bob out of business.  How does Ingrid respond if Bob fails to respond?

Let’s say Bob uses software from a company named, say, Orange Computer, and Ingrid finds a security problem with that software.  Ingrid contacts the right people at Orange, but Orange sits on the problem and does nothing.  Trudy is out there.  If Orange fails to address the problem, Trudy will exploit it.  What does Ingrid do?

Ingrid’s only course of action in this case is to follow a best practice called responsible disclosure.  After trying to warn Bob.  After contacting Orange.   After taking all reasonable steps to inform the right people, and after waiting a reasonable amount of time for a response, and as a last resort, Ingrid has a duty to disclose the problem publicly.  Ingrid must assume Trudy and her friends are already quietly exploiting the problem, and Trudy will hurt too many people if Ingrid fails in her duty.

Ingrid also has a duty to protect herself.  She should document her attempts to contact Bob and the people at Orange Computer as appropriate because when the problem becomes public, it will ignite a firestorm of controversy with Ingrid in the middle.   This will create an opportunity for Ingrid to educate the public and a threat from people who blame the messenger for creating the problem.

Politicians will weigh in with uninformed opinions and instant experts hungry for publicity will offer canned analysis for gullible press outlets hungry for sensational stories.  The noise will be deafening; real information will be scarce.

Amid all the noise, what about customers, the people who use software from Orange Computer and the people who use Bob’s website.  How do they respond?

Customers should do independent homework and look for the real story.  Security vulnerabilities happen all the time.  Is this one just another sensational story or is it real?  What are the prudent steps to protect against it?  What are the plans from Bob and/or Orange Computer to address the problem?  What are the consequences of not addressing the problem?  Customers need to find credible answers to these questions and make informed choices on how to respond.

After the initial disclosure shock wears off, some other questions are appropriate. Who is Ingrid?  What were her motives?  How did she find the problem?  Before the problem went public, what steps did Ingrid take to contact the right people?

That scenario assumes Ingrid discloses the vulnerability responsibly.   What if Ingrid wants to make a name for herself and she discloses the vulnerability without first informing Bob?  In this case, Ingrid is really a bad guy disguised as a good guy and trying to gain notoriety at the expense of Bob’s company.

Bob learns about the problem on the TV news along with the rest of the world and his company phones start ringing a few seconds later as press outlets everywhere look for comments and controversy.   What does Bob do?

Bob faces multiple threats.  He faces a public relations threat from sensational press stories spawned by Ingrid’s improper disclosure.  Bob and his customers also face a material threat from Trudy, quietly exploiting the vulnerability at the expense of  Bob and his customers.

To meet the PR threat, Bob needs to get in front of a runaway public relations train and slow it down.  This is the time for visible leadership and Bob must get in front of the cameras and take charge.  Provide explanations and frequent progress updates, and answer questions honestly and directly to repair credibility with a skeptical public.

Simultaneously and behind the scenes, Bob must also immediately address the actual vulnerability because Trudy wants to steal from Bob’s customers.  This might mean bringing in outside experts, it may even mean temporarily suspending business.   It will cost money.  Probably lots of money.  But if Bob handles this crisis properly, it can also be an opportunity for Bob’s company to come out of it with more trust and more credibility than before.

What if  Bob himself is a bad guy?

In 2005, Mark Russinovich was Ingrid and multibillion dollar Sony Corporation was both Bob and Trudy when Sony compromised thousands of computers around the world by surreptitiously introducing a rootkit when anyone played a Sony BMG music CD on a Windows PC.   A rootkit is illicit software that modifies core system components and is designed to conceal itself from malware countermeasures such as antivirus products.  Bruce Schneir summarized the story here.  Mark Russinovich’s original blog post with details on his great detective work uncovering the problem here.

Russinovich found the problem and reported it publicly in his blog.   This was the right thing to do and Sony eventually paid millions of dollars to settle fines and class action lawsuits.

What if Bob is a government agency and Ingrid discovers a vulnerability or abuse of power?  Now the consequences might be global.  Scenarios like this have spawned long discussions over the generations about ethics and whistle-blowing.  Sometimes, Ingrid is a lonely crusader pursuing justice against powerful forces.  Other times, Ingrid is an egomaniac, pursuing her own interests at the expense of everyone else.  And Trudy is always out there, ready to strike at every opportunity.  Ingrid has a duty to proceed with caution and carefully weigh the consequences of any action.

If you find yourself in a position similar to my hypothetical Ingrid, how do you decide what to do?  Who is harmed, who is helped if you disclose the vulnerability?  And who is harmed, who is helped if you do not disclose it?  If you take action, are you serving justice or your own ego?  Confide in a few people you trust and make your choice based on honest answers to those questions.  Do it responsibly.   Careers and lives may depend on the choices you make.

What should a small business IT security system look like?

­

Given the recent security breaches all over the news, what would a good Main Street business security solution look like and how much would it cost?  After all, if organizations such as the NSA and large retailers such as Target can’t keep their secrets safe, what chance does Main Street business have?

A pretty good one actually. Keep reading.

First, an assumption. No piece of equipment is hacker proof.  You must assume bad guys want to get inside your devices and use your equipment and your network for their own evil purposes.  They have specs for everything you own and probably know more about the internal workings of your equipment than you’ll ever hope to learn. They’re smart, they’re greedy, they collaborate, and they want what you have.

That’s the nature of the threat.  Here are the pieces to deal with it.

It starts at the firewall.  You need a real firewall with provision for multiple LANs.  A real firewall is a router with multiple segments and some rules to regulate how each segment interacts with the other segments.  Most credible DSL and cable modems can accommodate firewalls behind them if configured properly.  Here is a PDF file you can download with some firewall frequently asked questions.

Your firewall will have at least one public, Internet facing segment.  It might have more public segments if you want multiple Internet feeds from multiple providers so you always have a path out if one feed drops.  Multiple Internet feeds is probably overkill for a business like a Chinese takeout restaurant, unless that restaurant depends on, say, a website to operate hour by hour.

You may choose to have an HA (highly available) firewall system with redundancy at your boundary that can juggle multiple Internet feeds and do automated failover routing in case an Internet feed goes offline.  This may also be overkill for that Chinese food takeout restaurant.  It may not be overkill for a multiple site retail operation that depends on the HQ site always being available.  Start small and scale as the business grows.

It will have a “people” segment where you put your employee computers.  This is where you put in the typical rules you see in most business networks. You’ll want a credible antivirus solution on all your workstations in this segment.  It can also become elaborate. You can put in web filtering appliances to regulate which websites your users visit, for example. If you choose to host your own email or web server(s), you can put in rules to accommodate those, and rules to accommodate spam filtering. This is overkill for small operations and a logical growth path for larger businesses.

If you’re a retailer, your firewall will also need a POS segment for your Point of Sale systems.  A simple POS terminal might interact only with your credit card processors.  Credit card processors all have IP Addresses, so your firewall will have rules to allow anything in the POS network to interact only with those IP Addresses.  The firewall will also have a rule blocking anything between your “people” segment and POS segment.

If your POS network is more sophisticated, those POS systems might need to interact with, say, a database server.  That database server, in turn, may need to access servers in your “people” network.  In this case, carefully construct firewall rules to accommodate this traffic and log attempts at any other traffic.  This is overkill for that Chinese restaurant, but might be essential for a franchise of Chinese restaurants or a sophisticated retailer with, say, a loyalty program.

Maybe you want to offer wifi as a convenience for your customers. This is tricky to do properly because of the nature of wireless and because you don’t want your customer wifi to mingle with your employee wifi in your stores.  Isolate the customer wifi from your employee wifi and all your other segments.  The wifi segment is only a convenience for your customers to get to the Internet.  Nothing crosses the border between the customer wifi into the “people” segment or the POS segment.

And there you have it in a few short paragraphs.  A topology that does a wonderful job of enabling your business, serving your customers, and keeping bad guys out.  Total investment includes a properly built firewall and either a few physical network switches or a smarter switch with VLAN capability.  Budget a cost of about $4k to start. The actual cost might be a little less for small operations, probably more for larger operations.  The antivirus subscriptions and other support subscriptions will also cost some op-ex each year.

I would love to hear some feedback on this.  Contact us and let me know what you think.