BackupsWithDuplicityAndS3

From Hinterlands

Jump to: navigation, search

Contents

Backing up using Duplicity and Amazon S3

Introduction

This HOWTO will walk you through using Duplicity and Amazon's S3 service to back up your important data. The task is to create a remote backup protected by strong encryption consisting of a monthly full archive and daily incrementals.

As with all my HOWTOs, I am assuming you are using Debian and specifically squeeze. If you're using another distribution, there's no reason why the principles here won't work for you, but you'll need to adjust the syntax for various commands to match your distro.

Signing up for S3

I did quite a bit of research of the various online data repository services available. Amazon's S3 service ticked the boxes of availability, reliability and support from my choice of backup tool. It's also extremely cheap: I have so far been billed less than $0.30 for about 3G of data. You can sign up for Amazon S3 here, you can use your existing Amazon login if you have one. You will need to register a credit card with them for billing.

Once signed up, look for the Security Credentials page and make a note of the Access Key ID and Secret Access Key. How you manage these is up to you, I just have one pair I use for all my backups.

Creating a GPG Keypair

As we want to encrypt all the data sent up to S3, we need to create a GPG keypair. As all the backups will need to be done by root, it makes sense for root to be the user the keypair is created as, assuming you're not using GPG for other purposes as root. You will need to pick an email address and passphrase for the key, I chose backups@hinterlands.org and used the excellent pwgen utility to make a long, strong random passphrase. (Use pwgen -sy 30 for proper paranoia :)

backuphowto:~# gpg --gen-key

Accept the default key type, key length and decide if you ever want the key to expire (I don't). Then provide your chosen strong passphrase. Next you will need to upload your key to a public keyserver, to do this you need to know the key's ID.

backuphowto:~# gpg --list-keys
/root/.gnupg/pubring.gpg
------------------------
pub   2048R/02E93567 2010-07-06
uid                  Hinterlands Backups <backups@hinterlands.org>
sub   2048R/FAE5261F 2010-07-06

backuphowto:~# gpg --send-keys 02E93567
gpg: sending key 02E93567 to hkp server keys.gnupg.net

Make a note of the key ID and, of course, your passphrase. You should consider copying your private key elsewhere, if you lose it, or forget your passphrase, then you will lose access to your backups.

Installing Duplicity

Duplicity is provided as a Debian package, as is the backend required to talk to S3.

backuphowto:~# aptitude install duplicity python-boto

An example backup

Duplicity requires quite a few parameters to work. These include your Amazon S3 access keys and your GPG key details. For this reason, it's easier to wrap the command in a shell script so that you can easily set the variables that Duplicity will be looking for. An example script (downloadable version below) is as follows:

#!/bin/bash
 
export AWS_ACCESS_KEY_ID=FDKJFHADF8YADF7987FEDFIUHAL
export AWS_SECRET_ACCESS_KEY=fgsfdgJOI+khsiuhd+hdsoiuh897HIh
export PASSPHRASE=AReallyStrongAndSecretPassphrase
export GPG_KEY=02E93567 
 
 
/usr/bin/duplicity --s3-european-buckets --s3-use-new-style --encrypt-key=${GPG_KEY} --sign-key=${GPG_KEY} \ 
--volsize 5 --full-if-older-than 1M incremental  --include /home --exclude '**' / s3+http://backups.hinterlands.org/olga


We've used several options, here's what they all mean:

Option Explanation
--s3-european-bucketsLocate the backups on European servers
--s3-use-new-styleRequired for use with the above option
--encrypt-key=${GPG_KEY}Set the GPG key with which to encrypt data
--sign-key=${GPG_KEY}Set the GPG key with which to sign data
--volsize 5Set the volume size to 5 megabytes. Data will be uploaded to S3 in this size blocks
--full-if-older-than 1MIf the last full backup happened more than 1 month ago, do a full backup
incrementalPeform an incremental backup (may be overriden by above)
--include /homeInclude /home in the list of directories to back up
--exclude '**'Exclude all directories from being backed up, --include above overrides
/The path to start backing up from
s3+http://backups.hinterlands.org/olgaThe bucket to upload to. You don't need to create this in advance.


When run, this will backup the entirety of /home. We can test what will happen without transfering any data by adding the option --dry-run. Doing so results in the following:

backuphowto:~# ./s3-backup.sh 
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
Last full backup is too old, forcing full backup
--------------[ Backup Statistics ]--------------
StartTime 1278454009.66 (Tue Jul  6 23:06:49 2010)
EndTime 1278454009.83 (Tue Jul  6 23:06:49 2010)
ElapsedTime 0.18 (0.18 seconds)
SourceFiles 85
SourceFileSize 116161 (113 KB)
NewFiles 53
NewFileSize 116161 (113 KB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 53
RawDeltaSize 0 (0 bytes)
TotalDestinationSizeChange 0 (0 bytes)
Errors 0
-------------------------------------------------


As you can see, my HOWTO example machine doesn't have a great deal of data in /home, so I shall add in /etc to the list of directories to backup and then run the script for real.

backuphowto:~# ./s3-backup.sh  
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
Last full backup is too old, forcing full backup
--------------[ Backup Statistics ]--------------
StartTime 1278454244.54 (Tue Jul  6 23:10:44 2010)
EndTime 1278454247.57 (Tue Jul  6 23:10:47 2010)
ElapsedTime 3.03 (3.03 seconds)
SourceFiles 1038
SourceFileSize 2751115 (2.62 MB)
NewFiles 1038
NewFileSize 2751115 (2.62 MB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 1038
RawDeltaSize 2093587 (2.00 MB)
TotalDestinationSizeChange 466031 (455 KB)
Errors 0
-------------------------------------------------


To check that things are really working, I have created a new file in my home directory, and run the backup again.

backuphowto:~# ./s3-backup.sh  
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Tue Jul  6 23:10:29 2010
--------------[ Backup Statistics ]--------------
StartTime 1278454440.50 (Tue Jul  6 23:14:00 2010)
EndTime 1278454496.94 (Tue Jul  6 23:14:56 2010)
ElapsedTime 56.44 (56.44 seconds)
SourceFiles 1039
SourceFileSize 11722523 (11.2 MB)
NewFiles 2
NewFileSize 8975504 (8.56 MB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 2
RawDeltaSize 8971408 (8.56 MB)
TotalDestinationSizeChange 9095375 (8.67 MB)
Errors 0
-------------------------------------------------

As we have a full backup, only new and changed files will be added to S3.

Checking the backup status

Taking a look at what has been backed up so far is easy. We'll need to provide all the same Amazon and GPG key details, so you can just wrap this in another script, as above. I called my script "s3-listbackups.sh" and it runs this command:

/usr/bin/duplicity --s3-european-buckets --s3-use-new-style --encrypt-key=${GPG_KEY} --sign-key=${GPG_KEY} collection-status s3+http://backups.hinterlands.org/howto

So far, we have this status:

backuphowto:~# ./s3-listbackups.sh 
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Tue Jul  6 23:10:29 2010
Collection Status
-----------------
Connecting with backend: BotoBackend
Archive dir: /root/.cache/duplicity/654a260264561170340ee24dd928b83a

Found 0 secondary backup chains.

Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Tue Jul  6 23:10:29 2010
Chain end time: Tue Jul  6 23:13:44 2010
Number of contained backup sets: 2
Total number of contained volumes: 3
 Type of backup set:                            Time:      Num volumes:
                Full         Tue Jul  6 23:10:29 2010                 1
         Incremental         Tue Jul  6 23:13:44 2010                 2
-------------------------
No orphaned or incomplete backup sets found.


This is not very interesting, but here's the status from a machine moved to S3 backups over a month ago:

olga:~# /usr/bin/duplicity --s3-european-buckets --s3-use-new-style --encrypt-key=${GPG_KEY} --sign-key=${GPG_KEY} collection-status s3+http://backups.hinterlands.org/olga
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Thu Jul  1 00:00:06 2010
Collection Status
-----------------
Connecting with backend: BotoBackend
Archive dir: /root/.cache/duplicity/fdaed45e97c8943b8dd52d90e3e3cfa3

Found 1 secondary backup chain.
Secondary chain 1 of 1:
-------------------------
Chain start time: Mon May 31 18:47:14 2010
Chain end time: Wed Jun 30 00:00:06 2010
Number of contained backup sets: 33
Total number of contained volumes: 217
 Type of backup set:                            Time:      Num volumes:
                Full         Mon May 31 18:47:14 2010               184
         Incremental         Mon May 31 19:31:23 2010                 1
         Incremental         Mon May 31 19:37:11 2010                 1
         Incremental         Tue Jun  1 00:00:05 2010                 1
         Incremental         Wed Jun  2 00:00:04 2010                 1
         Incremental         Thu Jun  3 00:00:06 2010                 1
         Incremental         Fri Jun  4 00:00:05 2010                 1
         Incremental         Sat Jun  5 00:00:05 2010                 1
         Incremental         Sun Jun  6 00:00:05 2010                 1
         Incremental         Mon Jun  7 00:00:15 2010                 1
         Incremental         Tue Jun  8 00:00:06 2010                 1
         Incremental         Wed Jun  9 00:00:06 2010                 1
         Incremental         Thu Jun 10 00:00:06 2010                 1
         Incremental         Fri Jun 11 00:00:06 2010                 1
         Incremental         Sat Jun 12 00:00:07 2010                 2
         Incremental         Sun Jun 13 00:00:07 2010                 1
         Incremental         Mon Jun 14 00:00:08 2010                 1
         Incremental         Tue Jun 15 00:00:08 2010                 1
         Incremental         Wed Jun 16 00:00:05 2010                 1
         Incremental         Thu Jun 17 00:00:08 2010                 1
         Incremental         Fri Jun 18 00:00:07 2010                 1
         Incremental         Sat Jun 19 00:00:05 2010                 1
         Incremental         Sun Jun 20 00:00:07 2010                 1
         Incremental         Mon Jun 21 00:00:06 2010                 1
         Incremental         Tue Jun 22 00:00:06 2010                 1
         Incremental         Wed Jun 23 00:00:05 2010                 1
         Incremental         Thu Jun 24 00:00:06 2010                 1
         Incremental         Fri Jun 25 00:00:05 2010                 1
         Incremental         Sat Jun 26 00:00:06 2010                 1
         Incremental         Sun Jun 27 00:00:05 2010                 1
         Incremental         Mon Jun 28 00:00:04 2010                 1
         Incremental         Tue Jun 29 00:00:06 2010                 1
         Incremental         Wed Jun 30 00:00:06 2010                 1
-------------------------


Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Thu Jul  1 00:00:06 2010
Chain end time: Tue Jul  6 00:00:05 2010
Number of contained backup sets: 6
Total number of contained volumes: 190
 Type of backup set:                            Time:      Num volumes:
                Full         Thu Jul  1 00:00:06 2010               184
         Incremental         Fri Jul  2 00:00:06 2010                 2
         Incremental         Sat Jul  3 00:00:06 2010                 1
         Incremental         Sun Jul  4 00:00:05 2010                 1
         Incremental         Mon Jul  5 00:00:08 2010                 1
         Incremental         Tue Jul  6 00:00:05 2010                 1
-------------------------
No orphaned or incomplete backup sets found.

Restoring data

A backup isn't a backup until you've tested you can get your data back. Here's an example of restoring a single file:

olga:~# /usr/bin/duplicity --s3-european-buckets --s3-use-new-style --encrypt-key=${GPG_KEY} --sign-key=${GPG_KEY} \ 
--file-to-restore=home/martin/dirc/dirc.perl restore s3+http://backups.hinterlands.org/olga /tmp/dirc.perl
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Thu Jul  1 00:00:06 2010
olga:~# ls -l /tmp/dirc.perl 
-rw-r--r-- 1 martin martin 124 Jan 18 09:15 /tmp/dirc.perl

Note the missing "/" before the name of the file to restore. If you put that / in, then duplicity will not find your file.

To restore a file as it was at a particular date and time, use the -t option. For example:

olga:~# /usr/bin/duplicity --s3-european-buckets --s3-use-new-style --encrypt-key=${GPG_KEY} --sign-key=${GPG_KEY} \
-t "20100628" --file-to-restore=home/martin/dirc/dirc.perl restore s3+http://backups.hinterlands.org/olga /tmp/dirc.perl

Automation

The last thing to do is to add a cron job to execute your backup script daily, or perhaps even more frequently if you wish. Don't forget to send the script output to an email account where an interested human can check to see all is well.

External links

Personal tools