diff options
author | https://www.google.com/accounts/o8/id?id=AItOawmUJBh1lYmvfCCiGr3yrdx-QhuLCSRnU5c <Justin@web> | 2014-10-22 16:25:54 +0000 |
---|---|---|
committer | admin <admin@branchable.com> | 2014-10-22 16:25:54 +0000 |
commit | 36c4ddcdf14e7f9fd51acbe68ad86d551f3d58ff (patch) | |
tree | 02e6f0a89ad7bf4a0c8453bb496f7a751a77bf19 | |
parent | 25d59fe55f6d7b34213abde2083644422bb416ab (diff) |
-rw-r--r-- | doc/bugs/Issue_fewer_S3_GET_requests.mdwn | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/doc/bugs/Issue_fewer_S3_GET_requests.mdwn b/doc/bugs/Issue_fewer_S3_GET_requests.mdwn new file mode 100644 index 000000000..8bbcfa179 --- /dev/null +++ b/doc/bugs/Issue_fewer_S3_GET_requests.mdwn @@ -0,0 +1,9 @@ +It appears that git-annex issues one GET request to S3 / Google cloud for every file it tries to copy, if you don't pass --fast. (I could be wrong; I'm basing this on the fact that each "checking <remote name>" takes about the same amount of time, and that it's slow enough to be hitting the network.) + +Amazon lets you GET 1000 objects in one GET request, and afaict a request that returns 1000 objects costs just as much as a request that returns 1 object. The cost of GET'ing every file in my annex is nontrivial -- Google charges 0.01 per 1000 GETs, and my repo has 130k objects, so that's $1.3, compared to a monthly cost for storage of under $10. This means that if I want to back up my files more than, say, once a week, I need to write a script that parses the JSON output of git annex whereis and uploads with --fast only the files that aren't present in the cloud. It also means that I have to trust the output of whereis. + +All those GETs also slow down the non-fast copy, and this also applies to other kinds of remotes. + +There are a number of ways one could implement this. One way would be to have a command that updates the whereis data from the remote and then to add a parameter (maybe you already have it) to copy that's like --fast but skips files that are already present (maybe this is what --fast already does, but I did a quick check and it doesn't seem to). Because of the way git annex names files, I think it would be hard to coalesce GETs during a copy command, but it could be done. + +Anyway, please don't consider this a high-priority request; I can get by as-is, and I <3 git annex. |