Following my latest indulgence to online/offsite backups (Amazon S3 and s3sync) I thought I should update the situation. It is painfully slow when running a backup. It should be just uploading changed files, but I guess crawling many files (~30,000) over numerous directories and doing the comparison with Amazon just doesn’t work nicely. It would be fine for smaller numbers of files, but unfortunately not for my case.
I then tried s3cmd (Python based) with the sync option – quicker, but not great – it could still take 4-6 hours to do the entire backup run, even if only a handful of files have been changed.
I needed to think about if I was doing it the right way – backing up files natively. I then came across tarsnap which uses S3 for storage – whilst I like the principle, the added costs work out nearly 3 times the cost of Amazon, as backups proxy through their servers to maintain the tars. Nice idea, but too costly for me.
I then found duplicity whilst still beta it looks promising. I am trialling it with a small data set for the moment, but initial impressions are good – it holds files in an index which is uploaded as a separate file, and handles incremental very well. I have encrpyted using GPG, so there is some compression happening here as well. I will update when I have run it for a month..