Quick question: is your data backed up? If I walked into your house right now with a garden hose, grabbed your computer and hurled it out the window, how much data would you lose? If it would be any more than a day worth of work, you are in big trouble. In fact I might already be in your house getting ready to do this. Before you ask, the garden hose is there just for misdirection.
My point is that if you do not have backups you will lose data. No one likes to lose data. In fact, most people absolutely hate it. The UN has recently designated the following as the official data loss position:
It is kinda like fetal position, but caused by trauma of data loss. If you see anyone looking like this, be gentle with them – they are in a state of shock.
If you don’t have backups, I would really like to know why. Do you think there is nothing on your computer worth backing up? Well, you are wrong. There is always something irrecoverable on your hard drive. Be it that one elusive bookmark you use once a year to do that one thing, or the one and only picture you have of your ex from before he or she transformed into a giant spider. Maybe it’s that old college essay you wrote that you won’t miss until your kid needs to write one. Or maybe it’s that children’s story you wrote about a boy wizard or a hungry girl archer that you hope to one day publish (sorry, you might be a bit late on this, but I hear you can totally sell fanfics now). There is always a thing on your computer that you will miss when its gone. Frequently you won’t even know what it is until disappears forever.
Throughout my life, I have re-formatted and re-imaged a lot of computers for friends, family and coworkers. I have never, ever met a person who told me to wipe their hard drive without backing it up first, and did not regret that decision. Not a single soul out there has nothing worth preserving on their computer. If there are absolutely no files on your machine that you would like to keep, you might not be human. You are probably a robot, in which case I would implore you to stop spamming my internets (like seriously dude, what is with the robotkind and the penis enlargement emails).
Or maybe you are one of these folks who think data loss is something that happens to other people. Sorry to break it to you but you are dead wrong.
You Need Backups
You need backups because hard drives are designed to fail. Magnetic hard drives are probably the most vulnerable component of your computer. They are one of the very few devices that have moving parts. An average drive consists of one or more spinning platters and a read/write head assembly that glides above them. A sudden shock may cause the head crash into the surface of the platter at any time, typically resulting in catastrophic hardware failure. If the drive ever loses it’s hermetic seal, the dust buildup on the platter will at best cause read/write errors and at worst result in a head crash and impressive scratches throughout the surface. And even if nothing ever goes wrong, the moving parts of the hard drive are under constant stress, and will eventually wear out. The longer you use a drive the more likely it is that it will fail.

Disks usually don’t catch on fire like this. But the chance your disk will spontaneously combust is higher than the chance it will never fail.
Solid state drives are marginally better, but not by much. They are by no means exempt from the laws of entropy and the memory cells they use have a limited number of writes they can withstand before they cease to function. If anything, solid state drives fail more reliably and predictably than magnetic drives. But fail they do.
Unfortunately, most people never see their storage medium fail, because their computer suffers a catastrophic mechanical failure long before that. Especially laptops. Do you use a laptop? According to a recent poll I conducted, most people do. A laptop is a portable computer you lug around with you in a flimsy shoulder bag or a backpack. Do you know what happens when you take a computer outside?
If I had a quarter for every time a friend or relative of mine got their laptop stolen by a tiger, I would have… About buck fifty in my pocket right now. Tigers are assholes.
They are not the only cause of data loss. A lot of people simply sit on their laptops. Some drive over them with their cars. Others spill gallons of coffee directly onto the keyboard (despite the positioning trick I teach them). There are so many ways to damage a laptop just by being careless or absent minded that you should never take it’s physical integrity for granted.
Remember this: any data that has not been backed up, is only temporarily not lost.
You Need Automated Backups
I have met a few people in my life who claimed they do backups religiously. Back when I was a kid I used to believe in such fairy tales. But then I grew up, and realized that an overwhelming number of people “do backups” by dragging and dropping files to an external drive whenever a fancy strikes them. That’s most definitely not backup. Do you know what that is called?
Yeah, I know most of you here are probably to young to understand this reference. Trust me though, it’s kinda funny. I’d tell you to google it, but I know you wont.
The point is that backups need to be automated. They need to happen without your knowledge or intervention. They need to be a background process that kicks in regularly regardless of whether you are at your computer or dead in a ditch somewhere. Please don’t be dead in a ditch. Seriously, stay away from ditches in general. Nothing good has ever come out of ditching.
Any backups that are not automated are only temporarily not forgotten.
Procrastination is like a force of nature. You put away your backups once, twice, then three times. Next thing you know you are in your 40’s with three children, your wife has run off with an Alligator and there are Daleks living in your attic. You don’t want to end up like this. But if you do, just use the “Bad Wolf” cheat code to summon The Doctor. That’s for the Daleks though, not the backups. On that end you are royally screwed no matter what.
You Need Offsite Backups
Did I mention that backing up to an external hard drive is not sufficient? Well, it is not. Why? Let me use an animated gif to demonstrate this:
Right, I know what you are going to say. Nuclear explosions are extremely easy to survive by the means of a refrigerator as shown by that one Indiana Jones movie. You do have to keep in mind however that in time of emergency you have to figure out how to stick your entire family into a tiny kitchen appliance, and you may not have time to scurry around collecting external hard drives from every room in your house.
Statistically speaking, the external media people use for backups tend to be kept in the same room and/or carried in the same bag as the computer they are backing up. This means that any disaster, calamity or inter-dimensional rift that will affect the computer, will also likely destroy the backup media.
You can have the best, most regular and thorough backup scheme in the world, but if a tiger jumps out of the bushes and steals your laptop bag, and that bag contains all your backup disks then you are back to square one. Its like having no backup plan at all.
Someone may try to advocate backup disk rotation: you could for example always have current disk in your laptop back, yesterdays disk in your house, and a disk from 3 days ago in your car. You could even put a backup disk in a safety deposit box located in a different zip code once a week. But that won’t work. You know why? Because such rotation is manual.
You will eventually get tired of it, get bored, procrastinate, forget and BOOM! Daleks! If you can’t automate it, it’s not backups. It is willpower exercise and not much else. And one you are positioned to lose every time.
You Need Onsite Backups
You may think to yourself: I got it. I will go out and get me some Cloud Backup. I will get Mozy! I will get Crashplan! I will get Carbonite! And then I will never have to worry about backups again.
Wrong! Cloud backups are only useful when you can get to them. Why wouldn’t you be able to get to your cloud backups?
If you live in US like me, you have probably noticed that internet is complete and utter shit. We are currently behind Antarctica in terms of average broadband speeds. There are fucking penguins out there walking around with gigabit fiber cables hooked up directly to their cloacas (that’s where you install a router on a penguin – I don’t make these rules, geez) whereas over 10% of Americans still can’t get anything better than 56Kbps dialup in their areas.
I am lucky enough to live in a rather densely populated suburban area so I have a choice between using Comcast “best effort” connection and not using Comcast. Best effort of course means that on any given day they will make their best effort to ensure that you get some internet connectivity at some point during the day, but no promises. Also a kilobyte per second costs about as much as seven gallons of blood plasma, but we make do with what we have.
What I’m trying to say is that networks are unreliable. You are never guaranteed internet access. In emergency situations Internet always goes down and leaves you stranded. Don’t make Comcast of Verizon be you life boat. These guys are barely capable of streaming standard def Youtube clips without buffering at 4am in the morning when 90% of their customers are asleep. If you make them your life line, you are going to have a bad time.
Cloud backups are great when they work, but it is all to easy to get cut off from them for extended periods of time. Especially if you have deadlines to keep.
Also, sometimes shit like this happens:
This wasn’t really a problem a few years ago, but currently any US based (or based in a country that likes the US) hosting service can be raided, dismantled and sequestered as evidence in a criminal investigation of some sort. This does not necessarily need to be related to piracy. Terrorism and journalism are also potential causes for closure. Look at what happened to the Lavabit email service: it was forced into closure because the feds suspected it was being used by a whistleblower.
Granted, some services are more susceptible to this sort of closures than the others. Your best bet is to pick a NSA friendly service with a thick PRISM pipe back to Washington. For me that’s actually an additional layer of security. If all else fails, you can always try to Freedom of Information Act your lost data from the government. Though I’m told this doesn’t always work since officially we are not supposed to know about it. It has something to do with snow men… I don’t know. I don’t really pay attention.
In either case, while Cloud is super convenient it can be volatile. You should never rely solely on remote backups. Having both local and remote copies of your data is the only way to ensure your information is safe from both tigers, nukes, Comcast and government raids on data centers.
In Conclusion
Back. Your. Shit. Up.
Make sure it is automated. Data that is not backed up is as good as lost. Backups that are not automated are as good as forgotten. Data that is only backed up locally, will go down with your computer. Data that is only backed up remotely can be disconnected or deleted at a whim. If you want to have a slim chance of preserving your data, you must have it in as many places as possible. Back up locally and remotely at the same time. Have more than one backup plan.
Spread the word.
This is an excellent piece. I totally agree with everything it says. My “backup” is currently dropbox, iCloud, and TimeMachine. I also use several other hard drives for a manual “backup” for the really important stuff. Do you consider this to be sufficiently safe?
Dropbox adds some nice version control, but isn’t really a backup. Same with iCloud, except without the version control.
I use Crashplan, Time Machine, ‘throw the most important files on a thumb drive when I think about it,’ and a full system image that I update periodically. The last two are not serious backups, just an absolute last resort. I don’t consider even a single-tape backup system to be very good. The one day you forget to put in the new tape is the day before the server crashes. (Seen it with my own eyes.)
I’ve never encountered anyone who regretted having too many backups, but plenty of people on the verge of tears from not having enough.
And now for a grand horror story!
Recently my computing center experienced the jackpot of system failures. Our center does high performance computing in academia, so there’s quite a bit (1-2 PB) of expensive data we store (I suppose you could argue most centers store expensive data). Well, we had this nasty little combination of drives/controllers/firmware that caused a catastrophic loss of data on a couple disk trays that stored the home directories for about 75% our research groups. Supposedly this particular failure has only happened one other time in a production scenario, and we were just LUCKY enough to experience it. Thankfully we did weekly backups with nightly incrementals, but it took three weeks to finish restoring and verifying the data from tape, mostly because the center can’t afford redundant filesystems. And of course, it put research effort at a standstill while all that important data was tied up, which is pretty bad when all of your funding comes from research grants. But that isn’t the complete story.
See, this story is also about how I almost lost all of my personal data. After the aforementioned event, I was like, “Oh SHI— none of my data is backed up”, so I got down to business and set up automatic file syncs, remote backups, physically isolated copies, whatever. I had everything in place to do it except for a couple of drives I wanted to format to ext3 (so that I could get reasonable IO performance out of my raspberry pi file server). Well, one of those drives was where I had single copies of data, and it was late at night and I was being stupid, and accidentally turned that drive into EXT from NTFS and started formatting, thinking I was working on the other nonessential drive. Thankfully, I was alert enough to realize that I had made a major mistake and stopped it, but I had done enough damage that I couldn’t repair the partition. And of course, that was my only copy of certain data that I actually wanted. By this time it was already about 2:00 AM and I was in a panic, and ended up staying up the rest of the night recovering the data and shuffling stuff around on my main desktop so I had enough room to recover the data. Fortunately (and I’m assigning a 1/100,000,000 probability on how lucky I was), I got everything back that I wanted. It took me three days and a fair part of my sanity to get it back though, and I have exactly zero desire to do that again. I now have four copies of everything essential at all times. BACK YOUR DATA UP NAO.
I have been adhering to this philosophy for quite some time now:
* Btsync (used to be unison) to keep media files, coding projects and dotfiles in sync between my laptop and server.
* Backup script to backup cron jobs, package lists and the family calendar (saved from Google Calendar)
* Rdiff-backup for nearly complete incremental backups of the laptop and server to separate hard drive on the server (/mnt/backup)
* Duplicity for encrypted incremental backups (not including rdiff-backup data) of /mnt backup to a another external hard drive (/mnt/backup2)
* Headless Crashplan on the server to save /mnt/backup (I used to sync the duplicity backups /mnt/backup2 to a shared web host, but they finally decided they didn’t like that)
* Manual upload of all family pictures to Smugmug
I’ve nagged wife and kids for years to save their stuff to their network folders on the server….yeah right :-) Just yesterday I heard without sympathy the “I loaned my flash drive with my big school project on it to my sister and she lost it….”
Scott
@ Ethan:
Time machine is great for local backup. Dropbox and iCloud are a bit limited in scope though (as in you don’t want to use them to store your uncompressed vacation pictures for example). A robust cloud backup solution like crashplan/mozy/carbonite might give you an extra piece of mind. :)
Also consider scripting/automating your “manual backups” because you will probably forget them right before you need them. True story: my coworker used to religiously back-up all her work to a thumb drive at the end of the day. Last week she had to get like 6GB of data from the client so she deleted all her backups to make space for it. Since the drive was almost full, and she didn’t feel like cleaning it up, she did not do her manual back up routine that evening. Next morning the machine would not boot.
We ran the diagnostic on the HD and it lit up red – bad sectors everywhere. Of course the volume was encrypted with Checkpoint FDE so to get any data off of it we had to decrypt it which was of course impossible with cyclic redundancy disk errors popping up everywhere. A live recovery CD from Checkpoint would also BSOD when trying to mount the faulty drive. Checkpoint support dudes were like:
¯\_(ツ)_/¯
To make a long story short – she lost all her w0rk, and also bunch of personal stuff (pictures, documents, etc..) that she was storing on the company laptop for whatever reason and never thought about backing up to an external drive.
Robert wrote:
This is so true it is not even funny. :P
@ opti:
Haha, I did something very similar. At one point I decided to “upgrade” windows by means of a clean install on a new bigger hard drive, and then transfer my data from the old drive. This is on a desktop mind you, so I just plugged the new drive, popped in the Windows XP CD and started installation. Few hours and 27 reboots later (cause it’s Windows XP) I go to my 2nd drive to transfer data, saved games and etc and Windows is like “LOL, not formatted”.
So yeah, installed Windows on the wrong drive. All data lost.
At first I was like: ⊙▃⊙
Then I was like: (╥﹏╥)
@ Scott Hansen:
TIL about BTSync! Thanks! I’m messing around with it right now. Does it work over firewalls? It claims it does. I guess I’ll find out because the one we have at work is a nasty beast (I can’t even ssh out unless it’s to approved machines).
One downside of this over say Dropbox is that it doesn’t sync if one of the machines is asleep. I dropped some files in my BTSync folder on my MacBook and I’m trying to sync them to my work computer, but it’s not working because the Mac is likely in sleep mode now. Dropbox syncs via cloud so it doesn’t matter if machines are live or not.
Still, this is pretty cool. I’ll definitely have some good uses for it.
Luke Maciak wrote:
There’s some info about working around restrictive firewalls using the tracker server and even a relay server for btsync. I try to keep that stuff turned off when possible, but sometimes you need it! I had to resort to using a VPN connection to privateinternetaccess.com over port 443 from my work because of the massive port blockage going on there. This actually lets btsync work to sync the laptop to my home machines and my phone!
I have an always-on home server and a cheap VPS that I run btsync on. Sort of acts like the Dropbox server + redundancy. The biggest portion of what I sync is media (books, music, video) from laptops to home server. Smaller sync folders that also exist on my phone and the VPS are for syncing phone camera pix, keepass database, ssh keys, wallpapers and dotfiles.
For you late comers here…btsync is NOT a backup solution! Everything sync’d with btsync gets formally backed up with rdiff-backup and Crashplan!
Scott
I don’t have enough fingers on my hand to count the number of people saying “oh, noes, I lost everything I was working on the last 6 months!”. Dude, you have Dropbox, if anything, just use that!
Myself, I’ve considered buying a small NAS for some time now, since the amount of data I store grows steadily and is starting to be a bit big, scattered everywhere and unwieldily for Dropbox.
And since I really like distributed storage solutions (Git, Dropbox,…), there are a few Dropbox-like alternative you can use, too — we tried Sparkleshare and ownCloud at work, but both have a lot of issues, rendering them pretty much unusable at scale. Seafile sounds interesting, but right now we seem to be headed towards the one built-in in our Synology NAS.
I’ve also started considering backing up my GMail mails with OfflineIMAP. I’m still not really comfortable with it since, at heart, it’s not really a proper backup solution but rather a sync mechanism. I use `rdiff-backup` to make periodic snapshots of my mails, but if you have a better solution, I’m listening!
I do them manually. Every saturday morning, before breakfast. It’s become sort of a ritual. I know i only skipped one in the last year, because i make hardlinked snapshots named by date, and that one time that i missed was when i was on holiday. But then again, i am a very rigid and routinish person. The worst that has ever happened to me on a saturday morning was waking up 2 hours late because i had had 2 beers on friday and stayed up beyond midnight.
In addition to this, i use a similar script (again rsync plus dated, hardlinked snapshots) that i regularly invoke upon critical folders that i am currently working in, and it makes a second snapshot to the house of my parents way outside the nuclear blast zone. I practically do than one after each paragraph i write, and i now have 1000 versions of my manuscript.
this is what i use, maybe someone will find it useful:
#!/bin/bash
# TODO: Description
remote_host=nsa.gov
snapshot_dir=/media/data/snapshots
source=$1
target=$(basename $source)
ssh $remote_host test -d $snapshot_dir || exit 1
rsync -ahv --delete $source/ \
$remote_host:$snapshot_dir/$target
echo "Creating snapshot '$snapshot_dir/$target.$(date +%F-%H-%M-%S)'..."
ssh $remote_host cp -al $snapshot_dir/$target \
$snapshot_dir/$target.$(date +%F-%H-%M-%S)
exit 0
Pingback: Site Crash | Terminally Incoherent
Pingback: Using BTSync Behind a Corporate Firewall | Terminally Incoherent