Mar. 31, 2007 at 8:13pm View comments

How To Keep Your Data Forever

We’ve all sustained painful data losses at some point or another. After a number of years, you start to realize that you’re tired of getting sucker punched, and that you have the skill and knowledge to solve the problem once and for all. I’m going to outline here a relatively simple strategy to keep a set of data alive for the rest of your life.

This article is meant to give broad strokes on the strategy. I’m going to gloss over things that may seem non-trivial, only because they are all well documented elsewhere. Nothing a little googling on your part won’t solve, and it will help keep the information here somewhat platform independent. I will get into more specific technical details about my implementations in later posts.

Step 1 - Buy 3 identical drives

Hard drives. Regular cheap hard drives. The key here is upgradability. Moore’s Law will help us later.

Buy 3 of the largest, cheapest drives you can afford. They really don’t have to be identical, just the same size. Usually when you find a good deal on a large drive it’s best to just jump on it and buy three. When I’m ready to buy, I check the Fry’s ad every week at the LA Times “newspaper ads” site (EDIT: Apparently the LA Times site doesn’t have the Fry’s ad anymore, try the OC Register). Prices on stuff like this go up and down, so take a 2-3 week average so you’re sure you’re getting the current best price.

Unless you have special needs, technology doesn’t matter here, just go for the largest and least expensive drives that will work with your hardware.

Step 2 - Set up your main computer

You’re going to put 2 of the drives you just bought into your main computer, nightsync’ed. Don’t use RAID. I know many of you will be tempted to, but it’s a waste of time and effort. I ran a RAID 1 system for a long time until I realized how dumb it was. You have to assimilate and understand the fact that user error accounts for a large percentage of data loss cases. It doesn’t matter how careful you are. RAID 1 cannot protect you there. Save yourself from yourself, don’t use raid for backups.

I’ll save the specifics of my nightsync config for another post, just try to find the best (simplest) synchronizing/backup app for your platform and you should be fine. Make sure it creates a perfect mirror of your data, and make sure it happens automatically every night.

Now that you have the two drives in your main machine working and synchronizing properly, you can already let a wave of satisfaction wash over you. You are now safe from probably 99% of data loss situations. Think about it. The only things that can threaten your files now are physical disasters (like a fire, or large amounts of beer), and theft.

Step 3 - Set up the offsite machine

This is where it gets a bit tricky. You need a machine somewhere else you can connect to. Drop that third drive into an old pc you have lying around (old cheap pentium 2’s are perfect for this), set up a weekly backup to it over ssh and put it at a friend’s house, or your parents’ house. Everyone’s situation will be different here, but the important thing is to have your data not only on 2+ disks, but also in two different locations. Since this is an additional level of redundancy, you can do the syncs less often, but they must happen automatically.

There are several small wrinkles to work out with this approach, but none of them are really prohibitive, and you’ll have fun flexing some technical muscle. You have to punch a hole in your friend’s router for the ssh transfers. You also have to work some kind of dyndns magic if they’re on a dynamic ip. Depending on your level of trust with your backup partner, you may also want to look into some encrypted filesystem options.

You can take your time with this step. Get it set up and watch it all working locally for a few weeks before you drop the second machine at the offsite location.

Step 4 - The infinite loop, your data becomes alive

Fast forward 2 years, you’ve run out of space on your main drive. Guess what? Your friend Gordon Moore has your back and hard disks are twice as large and have become abundantly cheap. You are probably even thinking of buying a new machine.

Remember that old machine you set up at your friend’s house? Run a final sync to it and then take its drive out and put it in your safety deposit box, or your safe at home, or wherever. Keep it safe. It marks an era of your life. You may never ever plug it in again, but you’ll feel good knowing you can if you have to.

Now for the cool part. You only have to buy 2 drives this time. Make sure they are exactly twice as large as your previous drives. These will replace the drives in your main machine, as the main drive and nightsync drive. What do you do with the old drives? Well, we have to offer an apology to our old friend, much maligned earlier in the article: RAID.

We’re going to use arguably the most ridiculous RAID configuration, JBOD, or Just a Bunch Of Disks, to arrange the two old drives into 1 logical drive in our now drive-less offsite backup machine. This will let us backup twice the amount of data as before, because the disks’ capacities are simply concatenated together to form a drive twice as large. We don’t need any kind of speed or efficiency here either so software RAID is perfectly fine.

Optionally if you are buying a new machine at this point you may just want to replace your offsite machine with the old main machine to avoid transplanting the old disks.

Copy all your old data to your new drives before you start working again and the cycle is complete. Repeat this step every time you run out of drive space and you will never again have to worry about losing data. EVER.

Additional insights

It used to bother me to think that I had to buy two drives instead of just one every time I wanted to upgrade. At first it seemed like such a waste of money. But when you think about how much more security that extra $100 or $200 buys you, it kinda boggles the mind. Recently Google released an analysis of drives that said that the average failure rate of drives over 2 years was about 8%. That means that if you keep a single drive in your machine for 2 years, you have a 1 in 12 chance of losing all your data from drive failure. That’s scary stuff to me. You drop a second drive in there and run a nightly sync and that risk literally goes away. Fall on a bad drive? Replace it under warranty (or not) and you’ve dodged point-blank bullet. I can’t think of a better or easier way to protect yourself from the worst thing that can happen to your computer.

Laptop users are a bit out of luck on that front, because you can’t physically put a second drive in to run your nightly syncs. The best thing to do if you use a laptop as your main machine is to build a cheap file server at home to act as the “main machine” from this article, and nightsync your laptop to it with ssh. That applies to multi-machine setups too. Have them all backup to a local file server which then runs the sync to the offsite machine. You will need bigger disks, but the result will be near optimal.

Online backup services like Amazon s3 are also an option for the offsite component of the strategy, but I prefer the method outlined here because it scales transparently as the data set gets larger. If you crunch the numbers based on a meager 200GB data set with weekly syncs, the online services just become too expensive. Also it feels good to have final control over all your data. Big companies make mistakes just like the rest of us.

There is another, less obvious benefit to keeping your data forever. You’ll start to notice subtle changes in your directory structures. You’ll start arranging data more elegantly, refined from years of experience with it, combined with the knowledge that it will never be erased unless you want it to be. You’ll start to eliminate clutter naturally, instead of starting over every time you get a new machine. Your project folders will take on a new meaning in your mind, no longer becoming forgotten cruft from dead ideas, but evidence of work done, points of reference for your future. You’ll think of it like carving your work into stone instead of writing in pencil on post-it notes.

Lastly, whatever you do, remember to keep things simple for yourself and have fun learning along the way.

blog comments powered by Disqus