Support Amazon S3 as a file storage backend. There's a haskell library that looks good. Not yet in Debian. Multiple ways of using S3 are possible. Current plan is to have a S3BUCKET backend, that is derived from Backend.File, so it caches files locally and can transfer files between systems too, without involving S3. get will try to get it from S3 or from a remote. A annex.s3.cost can configure the cost of S3 vs the cost of other remotes. add will always upload a copy to S3. Each file in the S3 bucket is assumed to be in the annex. So unused will show files in the bucket that nothing points to, and dropunused remove them. For numcopies counting, S3 will count as 1 copy (or maybe more?), so if numcopies=2, then you don't fully trust S3 and request git-annex assure one other copy. drop will remove a file locally, but keep it in S3. drop --force *might* remove it from S3. TBD. annex.s3.bucket would configure the bucket the use. (And an env var or something configure the password.) Although the bucket would also be encoded in the keys. So, the configured bucket would be used when adding new files. A system could move from one bucket to another over time while still having legacy files in an earlier one; perhaps you move to Europe and want new files to be put in that region. And git annex `migrate --backend=S3BUCKET --force` could move files between datacenters! Problem: Then the only way for unused to know what buckets are in use is to see what keys point to them -- but if the last file from a bucket is deleted, it would then not be able to say that the files in that bucket are all unused. Need cached list of recently seen S3 buckets? ----- One problem with this is what key metadata to include. Should it be like WORM? Or like SHA1? Or just a new unique identifier for each file? It might be worth having S3 variants of *all* the Backend.File derived backends. More blue-sky, it might be nice to be able to union or stack together multiple backends, so S3BUCKET+SHA1 or S3BUCKET+WORM. That would likely be hard to get right. Less blue-sky, if the S3 capability were added directly to Backend.File, and bucket name was configured by annex.s3.bucket, then any existing annexed file could be upgraded to also store on S3. ## alternate approach The above assumes S3 should be a separate backend somehow. What if, instead a S3 bucket is treated as a separate **remote**. * Could "git annex add" while offline, and "git annex push --to S3" when online. * No need to choose whether a file goes to S3 at add time; no need to migrate to move files there. * numcopies counting Just Works * Could have multiple S3 buckets as desired. The bucket name could 1:1 map with its annex.uuid, so not much configuration would be needed when cloning a repo to get it using S3 -- just configure the S3 access token(s) to use for various UUIDs. Implementing this might not be as conceptually nice as making S3 a separate backend. It would need some changes to the remotes code, perhaps lifting some of it into backend-specific hooks. Then the S3 backend could be implicitly stacked in front of a backend like WORM.