Backups

Alright, so back on the theme of changes we were required to make because of SAS 70…a fairly major change for us was that of backups. Don’t get me wrong, we were doing backups before, and we were doing a pretty good job of it if I do say so myself. However, as usual, it wasn’t quite formal enough for the auditor’s liking and was missing some desperately needed notifications.

Basically, prior to the audit, our backup plan was the following…

  1. SQL Server did a full backup every night at midnight and transaction log backups every 10 minutes.
  2. A process on another server checked on a regular basis (approximately every 10 minutes) for new backups and would download them from the SQL server, compress, and encrypt them.
  3. This same process would also copy over the file data store associated with our application using rsync (files in the datastore are already compressed and encrypted).
  4. A process running at our main office (separate facility from our production network) would run every hour and rsync the directory on the 2nd server which contained a copy of all the file storage as well as the compressed and encrypted SQL backups.

That process worked great for us, but again, it lacked the logging the auditors wanted and also didn’t do any OS/System level backups.  Honestly we didn’t, and still don’t see a need for OS/system level backups as we use a pretty generic configuration and can have a new install up and running on a new box as fast as we could ever restore this from backup (not to mention we’ve heard very FEW instances of any OS level backup of Windows ever working very well).  But the auditors want to see it so we decided to comply and check the box.

The solution we’ve found to make everyone happy is somewhat of a mixture.  We decided to use Bacula as our backup server as it’s open source and free and seemed to have all the features we needed.  It also worked across platforms which was a requirement.  The server installation and configuration was really fairly simple following all the instructions.  We also were able to get the clients installed without an issue and OS level backups working just great (it’s configured to do an ntbackup systemstate level backup on Windows prior to copying any files as suggested on the bacula site).

For the application backups described above, we’re taking more a phased and hybrid approach, mainly using Bacula simply as the scheduling/reporting mechanism.  For instance, the SQL backup that before was kicked off by our application is now a scheduled Bacula job that runs every 10 minutes with a full at midnight.  In reality, this backup doesn’t copy any data to the backup server (it’s configured to backup some dummy directory that’s just an empty directory).  All it does is run a script before it’s “backup” that is a batch file on the SQL server that does the backup using the sqlcmd command line tool.  Based on the exit value of this script it can tell us if the backup was successful or not.

The next piece implemented was replacing the cron job in number 4.  Basically, right now, rsync is kicked off by cron every hour to copy these files to our other facility.  We will be replacing this with a bacula job similar to the SQL backup.  With this job, Bacula itself will not copy any files over (it’s just an additive data store so doing a full backup every week or even every month just seemed like overkill as the files never change, there are just new files added) it will just run the rsync script and be able to report success or failure based off the exit code of the script.

Step 3 will eventually be eliminated.  Instead of being rsynced to the 2nd server in the production network and then rsync’d to the other facility, we will just rsync straight from the datastore to the other facility, basically consumed in the step described above.  Basically, the datastore is mounted as a share drive on the 2nd server so we’ll just rsync the shared mount instead of the copy of the shared mount.

Step 2 will be the last to be replaced (mainly because we haven’t been able to test bacula’s encryption much yet).  This job will be changed to an actual Bacula job that will copy data over.  It will be similar to how it exists now, when the bacula job kicks off, a script will run which will rsync the SQL Server backup directory to this 2nd server in the production network.  Bacula will then backup the rsync’d folder compressing and encrypting in the process.  So once we have adequately tested Bacula’s encryption and restoring encrypted files, we’ll be good to go with that.

As for the backup media, all backups are stored at our office facility (where the backup server is located), the data is simply transferred over our private connection to the production facility.  Instead of tapes, we opted to use 500GB USB hard drives which we rotate on a given schedule.  Basically, we have a daily pool of drives which get rotated daily and just contain incremental backups and other backups that only need to be stored for a week.  There is also a weekly pool which gets rotated once a week which contains full application backups and differential OS/System backups which get stored for a month.  There is also a monthly pool which mainly just holds the Full OS/System backups which get stored for two months.  We do have one other pool of hard drives for the rsync of the filestore.  Since this is an rsync, we didn’t want to get any one media out of sync too much, but always wanted to be sure to have a good copy, so we decided to use 3 drives rotated on a daily basis.  This way, there are always 2 drives in the safe and one active and one drive won’t get more than a couple days out of sync.  Each USB hard drive is one volume that corresponds to the asset tag on the drive.  This way when bacula requests a given volume for a restore we know exactly which drive we need to plug in.  As far as getting the drives to mount to the correct places (daily drives need to mount to /storage/daily, weekly to /storage/weekly, etc) I did the following:

  1. Created a configuration file which lists every drives “name”, the device id generated by the system (and subsequently linked to the device itself in the /dev/disk/by-id directory structure in Fedora 8), and the mount point for that device (so if it’s a daily, /storage/daily).
  2. I then created a perl script which reads this file, checks to see if the given device is plugged in by looking for the device id in the /dev/disk/by-id directory, and if so, it mounts the device to the mount point listed in the configuration file (and then chowns to give the bacula user write access since it mounts as root).  The unmount script simply unmounts any drives listed in the configuration file that are plugged in.

This script structure is very helpful in rotating the drives as now all I have to do is run the unmount script, rotate the drives, and then run the mount script, and all the drives are mounted to the correct directory so they will get used properly by bacula.

Hopefully, this will give you some ideas on how you could use bacula in your organization.  If you’re interested in more details on how I have the USB devices working or my general configuration I’d be happy to share.  We’re currently still tweaking the schedule/priority of the jobs, but all in all, it’s working very well.  Has anyone else had any luck with Bacula or any other backup software you’d like to share?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: