summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2013-05-22 15:20:45 -0400
committerGravatar Joey Hess <joey@kitenet.net>2013-05-22 15:20:45 -0400
commitf1ee72b8098e487b036cc1f35bb6aa08a8f8f948 (patch)
treeda02f1d3dd6b8107d1c19eb38941873011bbae7c
parent186434797dc41c815a07825072a63c9de1b47a25 (diff)
parent70b6dc8296c0c4d9e66b3c06198245e09eec40bf (diff)
Merge branch 'master' of ssh://git-annex.branchable.com
-rw-r--r--doc/bugs/Glacier_remote_uploads_duplicates.mdwn31
-rw-r--r--doc/special_remotes/S3/comment_9_7ad757b3865b04967c79af0a263bb3b0._comment10
-rw-r--r--doc/special_remotes/comment_15_95ccfdd22a2391daa99e0beb04adedd6._comment11
-rw-r--r--doc/special_remotes/comment_16_b9d238fb15ad7628e33c90b071e07bb0._comment12
-rw-r--r--doc/special_remotes/comment_17_cc21b81a8f809f6efa5f5b6332513fc3._comment12
-rw-r--r--doc/special_remotes/glacier/comment_4_0c92cc82c7ac513130f862391a02d329._comment8
-rw-r--r--doc/todo/Build_for_Synology_DSM.mdwn1
7 files changed, 85 insertions, 0 deletions
diff --git a/doc/bugs/Glacier_remote_uploads_duplicates.mdwn b/doc/bugs/Glacier_remote_uploads_duplicates.mdwn
new file mode 100644
index 000000000..bcbd94815
--- /dev/null
+++ b/doc/bugs/Glacier_remote_uploads_duplicates.mdwn
@@ -0,0 +1,31 @@
+### Please describe the problem.
+
+Other references:
+
+https://github.com/basak/glacier-cli/pull/19
+http://git-annex.branchable.com/special_remotes/glacier/#comment-a2b05b8dc2d640ee498d90398f02931c
+
+#### Background
+
+ * Glacier doesn't support keys that the client selects, unlike S3. If you upload to Glacier, Glacier assigns a unique ID, not the client.
+ * Glacier does support an "archive description" which is immutable. It also provides this "archive description" in an inventory listing, together with the unique IDs.
+ * An "archive description" is not a unique key. It's perfectly possible to upload multiple archives to Glacier with the same "archive description".
+ * glacier-cli uses the "archive description" field as an upload identifier, since the unique IDs are unfriendly to users. However, since they are potentially ambiguous identifiers, it also supports disambiguation using the ID itself. See "Addressing Archives" in README.md for details.
+
+#### The Problem
+
+This what I believe is happening in the two reports referenced above. When git-annex is used without `--trust-glacier`, it can end up uploading the same data multiple times. From git-annex's point of view, it cannot verify that the data is already in Glacier, so it uploads again, expecting an overwrite operation if the key is already in Glacier. Since glacier-cli maps the key to an "archive description" that can be duplicated, this is not what happens. Instead, a second archive is uploaded.
+
+When git-annex later does a "checkpresent" operation, glacier-cli fails. This is because the request is ambiguous, since there are two archives in Glacier with the same "key". The error message could be better here, but I believe that the behaviour is correct.
+
+#### Discussion
+
+glacier-cli can find out what data Glacier claims to have using an inventory retrieval. However, this retrieval takes about four hours and can be out of date (eg. if someone else recently deleted the archive from another client). Thus, I can understand git-annex's desire not to trust this data or a cache of it.
+
+However, whatever we do, it is impossible to map an "upload or overwrite on key X" type command to Glacier. We'll always end up with duplicates. Even if git-annex stored the Glacier archive IDs, there is no API to replace an existing archive with the same ID, and inventories are out of date even before we retrieve them.
+
+#### Workaround
+
+If the problem is as I think it is, always applying `--trust-glacier` should prevent the problem from occurring in most cases, since git-annex will run "checkpresent" and glacier-cli will confirm that the archive exists.
+
+To fix the problem after it has occurred, it should be sufficient to delete duplicates using glacier-cli, since they _should_ be identical to each other. Some enhancement of the `glacier-cli archive list` command would help here.
diff --git a/doc/special_remotes/S3/comment_9_7ad757b3865b04967c79af0a263bb3b0._comment b/doc/special_remotes/S3/comment_9_7ad757b3865b04967c79af0a263bb3b0._comment
new file mode 100644
index 000000000..51b7ab16a
--- /dev/null
+++ b/doc/special_remotes/S3/comment_9_7ad757b3865b04967c79af0a263bb3b0._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="basak"
+ ip="2001:8b0:1c8::2"
+ subject="Recovering from a clone"
+ date="2013-05-22T18:32:05Z"
+ content="""
+How do I recover a special remote from a clone, please? I see that `remote.log` has most of the details, but my remote is not configured on my clone and I see no obvious way to do it. And I used `embedcreds`, but the only credentials I can see are stored in .git/annex/creds/ so did not survive a clone. I'm confused because the documentation here for `embedcreds` says that clones should have access.
+
+As a workaround, it looks like copying the remote over from `.git/config` as well as the credentials from `.git/annex/creds/` seems to work. Is there some other way I'm supposed to do this, or is this the intended way?
+"""]]
diff --git a/doc/special_remotes/comment_15_95ccfdd22a2391daa99e0beb04adedd6._comment b/doc/special_remotes/comment_15_95ccfdd22a2391daa99e0beb04adedd6._comment
new file mode 100644
index 000000000..3e2ea9948
--- /dev/null
+++ b/doc/special_remotes/comment_15_95ccfdd22a2391daa99e0beb04adedd6._comment
@@ -0,0 +1,11 @@
+[[!comment format=mdwn
+ username="http://yarikoptic.myopenid.com/"
+ nickname="site-myopenid"
+ subject="remotes costs"
+ date="2013-05-22T18:33:11Z"
+ content="""
+Thank you -- that is nice!
+
+Could costs be presented in 'whereis' and 'status' commands? e.g. like we know APT repositories priorities from apt-cache policy -- now I do not see them (at least in 4.20130501... updating to sid's 0521 now)
+
+"""]]
diff --git a/doc/special_remotes/comment_16_b9d238fb15ad7628e33c90b071e07bb0._comment b/doc/special_remotes/comment_16_b9d238fb15ad7628e33c90b071e07bb0._comment
new file mode 100644
index 000000000..8b1fcd831
--- /dev/null
+++ b/doc/special_remotes/comment_16_b9d238fb15ad7628e33c90b071e07bb0._comment
@@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="http://yarikoptic.myopenid.com/"
+ nickname="site-myopenid"
+ subject="compression -- storage and transfer"
+ date="2013-05-22T18:48:59Z"
+ content="""
+Is there any remote which would not only compress during transfer (I believe rsync does that, right?) but also store objects compressed?
+
+I thought bup would do both -- but it seems that git annex receives data uncompressed from a bup remote, and bup remote requires ssh access.
+
+In my case I want to make publicly available files which are binary blobs which could be compressed very well. It would be a pity if I waste storage on my end and also incur significant traffic, which could be avoided if data load was transferred compressed. May be HTTP compression (http://en.wikipedia.org/wiki/HTTP_compression) could somehow be used efficiently for this purpose (not sure if load then originally could already reside in a compressed form to avoid server time to re-compress it)?
+"""]]
diff --git a/doc/special_remotes/comment_17_cc21b81a8f809f6efa5f5b6332513fc3._comment b/doc/special_remotes/comment_17_cc21b81a8f809f6efa5f5b6332513fc3._comment
new file mode 100644
index 000000000..f576e2723
--- /dev/null
+++ b/doc/special_remotes/comment_17_cc21b81a8f809f6efa5f5b6332513fc3._comment
@@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="http://yarikoptic.myopenid.com/"
+ nickname="site-myopenid"
+ subject="Re: compression -- storage and transfer"
+ date="2013-05-22T19:17:33Z"
+ content="""
+ha -- apparently it is trivial to configure apache to serve pre-compressed files (e.g. see http://stackoverflow.com/questions/75482/how-can-i-pre-compress-files-with-mod-deflate-in-apache-2-x) and they arrive compressed to client with
+
+Content-Encoding: gzip
+
+but unfortunately git-annex doesn't like those (fails to \"verify\") -- do you think it could be implemented for web \"special remotes\"? that would be really nice -- then I could store such load on another website, and addurl links to the compressed content
+"""]]
diff --git a/doc/special_remotes/glacier/comment_4_0c92cc82c7ac513130f862391a02d329._comment b/doc/special_remotes/glacier/comment_4_0c92cc82c7ac513130f862391a02d329._comment
new file mode 100644
index 000000000..2de6632eb
--- /dev/null
+++ b/doc/special_remotes/glacier/comment_4_0c92cc82c7ac513130f862391a02d329._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="basak"
+ ip="2001:8b0:1c8::2"
+ subject="comment 4"
+ date="2013-05-22T18:10:32Z"
+ content="""
+Let's discuss this in a bug. I've created http://git-annex.branchable.com/bugs/Glacier_remote_uploads_duplicates/
+"""]]
diff --git a/doc/todo/Build_for_Synology_DSM.mdwn b/doc/todo/Build_for_Synology_DSM.mdwn
new file mode 100644
index 000000000..be45ea631
--- /dev/null
+++ b/doc/todo/Build_for_Synology_DSM.mdwn
@@ -0,0 +1 @@
+It would be wonderful if a pre-built package would be available for Synology NAS. Basically, this is an ARM-based Linux. It has most of the required shell commands either out of the box or easily available (through ipkg). But I think it would be difficult to install the Haskell compiler and all the required modules, so it would probably be better to cross-compile targeting ARM.