Following my latest indulgence to online/offsite backups (Amazon S3 and s3sync) I thought I should update the situation.  It is painfully slow when running a backup.  It should be just uploading changed files, but I guess crawling many files (~30,000) over numerous directories and doing the comparison with Amazon just doesn’t work nicely.  It would be fine for smaller numbers of files, but unfortunately not for my case.

I then tried s3cmd (Python based) with the sync option – quicker, but not great – it could still take 4-6 hours to do the entire backup run, even if only a handful of files have been changed.

I needed to think about if I was doing it the right way – backing up files natively.  I then came across tarsnap which uses S3 for storage – whilst I like the principle, the added costs work out nearly 3 times the cost of Amazon, as backups proxy through their servers to maintain the tars.  Nice idea, but too costly for me.

I then found duplicity whilst still beta it looks promising.  I am trialling it with a small data set for the moment, but initial impressions are good – it holds files in an index which is uploaded as a separate file, and handles incremental very well.  I have encrpyted using GPG, so there is some compression happening here as well.  I will update when I have run it for a month..