BackupsWithDuplicityAndS3
From Hinterlands
Contents |
Backing up using Duplicity and Amazon S3
Introduction
This HOWTO will walk you through using Duplicity and Amazon's S3 service to back up your important data. The task is to create a remote backup protected by strong encryption consisting of a monthly full archive and daily incrementals.
As with all my HOWTOs, I am assuming you are using Debian and specifically squeeze. If you're using another distribution, there's no reason why the principles here won't work for you, but you'll need to adjust the syntax for various commands to match your distro.
Signing up for S3
I did quite a bit of research of the various online data repository services available. Amazon's S3 service ticked the boxes of availability, reliability and support from my choice of backup tool. It's also extremely cheap: I have so far been billed less than $0.30 for about 3G of data. You can sign up for Amazon S3 here, you can use your existing Amazon login if you have one. You will need to register a credit card with them for billing.
Once signed up, look for the Security Credentials page and make a note of the Access Key ID and Secret Access Key. How you manage these is up to you, I just have one pair I use for all my backups.
Creating a GPG Keypair
As we want to encrypt all the data sent up to S3, we need to create a GPG keypair. As all the backups will need to be done by root, it makes sense for root to be the user the keypair is created as, assuming you're not using GPG for other purposes as root. You will need to pick an email address and passphrase for the key, I chose backups@hinterlands.org and used the excellent pwgen utility to make a long, strong random passphrase. (Use pwgen -sy 30 for proper paranoia :)
backuphowto:~# gpg --gen-key
Accept the default key type, key length and decide if you ever want the key to expire (I don't). Then provide your chosen strong passphrase. Next you will need to upload your key to a public keyserver, to do this you need to know the key's ID.
backuphowto:~# gpg --list-keys
/root/.gnupg/pubring.gpg
------------------------
pub 2048R/02E93567 2010-07-06
uid Hinterlands Backups <backups@hinterlands.org>
sub 2048R/FAE5261F 2010-07-06
backuphowto:~# gpg --send-keys 02E93567
gpg: sending key 02E93567 to hkp server keys.gnupg.net
Make a note of the key ID and, of course, your passphrase. You should consider copying your private key elsewhere, if you lose it, or forget your passphrase, then you will lose access to your backups.
Installing Duplicity
Duplicity is provided as a Debian package, as is the backend required to talk to S3.
backuphowto:~# aptitude install duplicity python-boto
An example backup
Duplicity requires quite a few parameters to work. These include your Amazon S3 access keys and your GPG key details. For this reason, it's easier to wrap the command in a shell script so that you can easily set the variables that Duplicity will be looking for. An example script (downloadable version below) is as follows:
#!/bin/bash
export AWS_ACCESS_KEY_ID=FDKJFHADF8YADF7987FEDFIUHAL
export AWS_SECRET_ACCESS_KEY=fgsfdgJOI+khsiuhd+hdsoiuh897HIh
export PASSPHRASE=AReallyStrongAndSecretPassphrase
export GPG_KEY=02E93567
/usr/bin/duplicity --s3-european-buckets --s3-use-new-style --encrypt-key=${GPG_KEY} --sign-key=${GPG_KEY} \
--volsize 5 --full-if-older-than 1M incremental --include /home --exclude '**' / s3+http://backups.hinterlands.org/olga
We've used several options, here's what they all mean:
| Option | Explanation |
|---|---|
| --s3-european-buckets | Locate the backups on European servers |
| --s3-use-new-style | Required for use with the above option |
| --encrypt-key=${GPG_KEY} | Set the GPG key with which to encrypt data |
| --sign-key=${GPG_KEY} | Set the GPG key with which to sign data |
| --volsize 5 | Set the volume size to 5 megabytes. Data will be uploaded to S3 in this size blocks |
| --full-if-older-than 1M | If the last full backup happened more than 1 month ago, do a full backup |
| incremental | Peform an incremental backup (may be overriden by above) |
| --include /home | Include /home in the list of directories to back up |
| --exclude '**' | Exclude all directories from being backed up, --include above overrides |
| / | The path to start backing up from |
| s3+http://backups.hinterlands.org/olga | The bucket to upload to. You don't need to create this in advance. |
When run, this will backup the entirety of /home. We can test what will happen without transfering any data by adding the option --dry-run. Doing so results in the following:
backuphowto:~# ./s3-backup.sh Local and Remote metadata are synchronized, no sync needed. Last full backup date: none Last full backup is too old, forcing full backup --------------[ Backup Statistics ]-------------- StartTime 1278454009.66 (Tue Jul 6 23:06:49 2010) EndTime 1278454009.83 (Tue Jul 6 23:06:49 2010) ElapsedTime 0.18 (0.18 seconds) SourceFiles 85 SourceFileSize 116161 (113 KB) NewFiles 53 NewFileSize 116161 (113 KB) DeletedFiles 0 ChangedFiles 0 ChangedFileSize 0 (0 bytes) ChangedDeltaSize 0 (0 bytes) DeltaEntries 53 RawDeltaSize 0 (0 bytes) TotalDestinationSizeChange 0 (0 bytes) Errors 0 -------------------------------------------------
As you can see, my HOWTO example machine doesn't have a great deal of data in /home, so I shall add in /etc to the list of directories to backup and then run the script for real.
backuphowto:~# ./s3-backup.sh Local and Remote metadata are synchronized, no sync needed. Last full backup date: none Last full backup is too old, forcing full backup --------------[ Backup Statistics ]-------------- StartTime 1278454244.54 (Tue Jul 6 23:10:44 2010) EndTime 1278454247.57 (Tue Jul 6 23:10:47 2010) ElapsedTime 3.03 (3.03 seconds) SourceFiles 1038 SourceFileSize 2751115 (2.62 MB) NewFiles 1038 NewFileSize 2751115 (2.62 MB) DeletedFiles 0 ChangedFiles 0 ChangedFileSize 0 (0 bytes) ChangedDeltaSize 0 (0 bytes) DeltaEntries 1038 RawDeltaSize 2093587 (2.00 MB) TotalDestinationSizeChange 466031 (455 KB) Errors 0 -------------------------------------------------
To check that things are really working, I have created a new file in my home directory, and run the backup again.
backuphowto:~# ./s3-backup.sh Local and Remote metadata are synchronized, no sync needed. Last full backup date: Tue Jul 6 23:10:29 2010 --------------[ Backup Statistics ]-------------- StartTime 1278454440.50 (Tue Jul 6 23:14:00 2010) EndTime 1278454496.94 (Tue Jul 6 23:14:56 2010) ElapsedTime 56.44 (56.44 seconds) SourceFiles 1039 SourceFileSize 11722523 (11.2 MB) NewFiles 2 NewFileSize 8975504 (8.56 MB) DeletedFiles 0 ChangedFiles 0 ChangedFileSize 0 (0 bytes) ChangedDeltaSize 0 (0 bytes) DeltaEntries 2 RawDeltaSize 8971408 (8.56 MB) TotalDestinationSizeChange 9095375 (8.67 MB) Errors 0 -------------------------------------------------
As we have a full backup, only new and changed files will be added to S3.
Checking the backup status
Taking a look at what has been backed up so far is easy. We'll need to provide all the same Amazon and GPG key details, so you can just wrap this in another script, as above. I called my script "s3-listbackups.sh" and it runs this command:
/usr/bin/duplicity --s3-european-buckets --s3-use-new-style --encrypt-key=${GPG_KEY} --sign-key=${GPG_KEY} collection-status s3+http://backups.hinterlands.org/howto
So far, we have this status:
backuphowto:~# ./s3-listbackups.sh
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Tue Jul 6 23:10:29 2010
Collection Status
-----------------
Connecting with backend: BotoBackend
Archive dir: /root/.cache/duplicity/654a260264561170340ee24dd928b83a
Found 0 secondary backup chains.
Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Tue Jul 6 23:10:29 2010
Chain end time: Tue Jul 6 23:13:44 2010
Number of contained backup sets: 2
Total number of contained volumes: 3
Type of backup set: Time: Num volumes:
Full Tue Jul 6 23:10:29 2010 1
Incremental Tue Jul 6 23:13:44 2010 2
-------------------------
No orphaned or incomplete backup sets found.
This is not very interesting, but here's the status from a machine moved to S3 backups over a month ago:
olga:~# /usr/bin/duplicity --s3-european-buckets --s3-use-new-style --encrypt-key=${GPG_KEY} --sign-key=${GPG_KEY} collection-status s3+http://backups.hinterlands.org/olga
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Thu Jul 1 00:00:06 2010
Collection Status
-----------------
Connecting with backend: BotoBackend
Archive dir: /root/.cache/duplicity/fdaed45e97c8943b8dd52d90e3e3cfa3
Found 1 secondary backup chain.
Secondary chain 1 of 1:
-------------------------
Chain start time: Mon May 31 18:47:14 2010
Chain end time: Wed Jun 30 00:00:06 2010
Number of contained backup sets: 33
Total number of contained volumes: 217
Type of backup set: Time: Num volumes:
Full Mon May 31 18:47:14 2010 184
Incremental Mon May 31 19:31:23 2010 1
Incremental Mon May 31 19:37:11 2010 1
Incremental Tue Jun 1 00:00:05 2010 1
Incremental Wed Jun 2 00:00:04 2010 1
Incremental Thu Jun 3 00:00:06 2010 1
Incremental Fri Jun 4 00:00:05 2010 1
Incremental Sat Jun 5 00:00:05 2010 1
Incremental Sun Jun 6 00:00:05 2010 1
Incremental Mon Jun 7 00:00:15 2010 1
Incremental Tue Jun 8 00:00:06 2010 1
Incremental Wed Jun 9 00:00:06 2010 1
Incremental Thu Jun 10 00:00:06 2010 1
Incremental Fri Jun 11 00:00:06 2010 1
Incremental Sat Jun 12 00:00:07 2010 2
Incremental Sun Jun 13 00:00:07 2010 1
Incremental Mon Jun 14 00:00:08 2010 1
Incremental Tue Jun 15 00:00:08 2010 1
Incremental Wed Jun 16 00:00:05 2010 1
Incremental Thu Jun 17 00:00:08 2010 1
Incremental Fri Jun 18 00:00:07 2010 1
Incremental Sat Jun 19 00:00:05 2010 1
Incremental Sun Jun 20 00:00:07 2010 1
Incremental Mon Jun 21 00:00:06 2010 1
Incremental Tue Jun 22 00:00:06 2010 1
Incremental Wed Jun 23 00:00:05 2010 1
Incremental Thu Jun 24 00:00:06 2010 1
Incremental Fri Jun 25 00:00:05 2010 1
Incremental Sat Jun 26 00:00:06 2010 1
Incremental Sun Jun 27 00:00:05 2010 1
Incremental Mon Jun 28 00:00:04 2010 1
Incremental Tue Jun 29 00:00:06 2010 1
Incremental Wed Jun 30 00:00:06 2010 1
-------------------------
Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Thu Jul 1 00:00:06 2010
Chain end time: Tue Jul 6 00:00:05 2010
Number of contained backup sets: 6
Total number of contained volumes: 190
Type of backup set: Time: Num volumes:
Full Thu Jul 1 00:00:06 2010 184
Incremental Fri Jul 2 00:00:06 2010 2
Incremental Sat Jul 3 00:00:06 2010 1
Incremental Sun Jul 4 00:00:05 2010 1
Incremental Mon Jul 5 00:00:08 2010 1
Incremental Tue Jul 6 00:00:05 2010 1
-------------------------
No orphaned or incomplete backup sets found.
Restoring data
A backup isn't a backup until you've tested you can get your data back. Here's an example of restoring a single file:
olga:~# /usr/bin/duplicity --s3-european-buckets --s3-use-new-style --encrypt-key=${GPG_KEY} --sign-key=${GPG_KEY} \
--file-to-restore=home/martin/dirc/dirc.perl restore s3+http://backups.hinterlands.org/olga /tmp/dirc.perl
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Thu Jul 1 00:00:06 2010
olga:~# ls -l /tmp/dirc.perl
-rw-r--r-- 1 martin martin 124 Jan 18 09:15 /tmp/dirc.perl
Note the missing "/" before the name of the file to restore. If you put that / in, then duplicity will not find your file.
To restore a file as it was at a particular date and time, use the -t option. For example:
olga:~# /usr/bin/duplicity --s3-european-buckets --s3-use-new-style --encrypt-key=${GPG_KEY} --sign-key=${GPG_KEY} \
-t "20100628" --file-to-restore=home/martin/dirc/dirc.perl restore s3+http://backups.hinterlands.org/olga /tmp/dirc.perl
Automation
The last thing to do is to add a cron job to execute your backup script daily, or perhaps even more frequently if you wish. Don't forget to send the script output to an email account where an interested human can check to see all is well.
External links
- Duplicity's home page - http://www.nongnu.org/duplicity/
- Amazon S3 home page - http://aws.amazon.com/s3/
- The Duplicity/S3 backup script - http://hinterlands.org/howtos/files/s3-backup.sh.txt

