Merge branch 'master' into newchunks

author: Joey Hess <joey@kitenet.net> 2014-08-02 17:25:50 -0400
committer: Joey Hess <joey@kitenet.net> 2014-08-02 17:25:50 -0400
commit: d645129f6e573b60e54fb7c35bfe98a87d2eb9d0 (patch)
tree: 4ed3f157970605e42fe457bfa67efc825a39ee84 /doc
parent: 0eaed261ea11060fc9644400c7f31f8c3ec1052b (diff)
parent: 3beefc3b4bc54e0d2a0cc7a4cc0745af13d8014c (diff)
29 files changed, 531 insertions, 17 deletions
diff --git a/doc/bugs/Permission_problem_in_second_user_account_on_Android.mdwn b/doc/bugs/Permission_problem_in_second_user_account_on_Android.mdwn
new file mode 100644
index 000000000..8b7308223
--- /dev/null
+++ b/doc/bugs/Permission_problem_in_second_user_account_on_Android.mdwn
@@ -0,0 +1,15 @@
+I get the following error message upon starting git-annex in a second user account on Android:
+
+    Falling back to hardcoded app location: cannot find expected files in /data/app-lib
+    git annex webapp
+    lib/lib.runshell.so: line 133: git: Permission denied
+
+    [Terminal session finished]
+
+The same version of git-annex works just fine for the primary user.
+(The primary user has root access which unfortunately can't be enabled for other user accounts.)
+
+### What version of git-annex are you using? On what operating system?
+
+  * git-annex: 5.20140710
+  * OS: CyanogenMod 10.1.3-p3110
diff --git a/doc/bugs/direct_command_leaves_repository_inconsistent_if_interrupted.mdwn b/doc/bugs/direct_command_leaves_repository_inconsistent_if_interrupted.mdwn
new file mode 100644
index 000000000..0d81c6778
--- /dev/null
+++ b/doc/bugs/direct_command_leaves_repository_inconsistent_if_interrupted.mdwn
@@ -0,0 +1,43 @@
+### Please describe the problem.
+
+When `git annex direct` is interrupted (either through a power outage or deliberate `control-c`) it may leave the repository in an inconsistent state.
+
+A typical situation is `git-annex` believing that the repo is in `indirect` mode while the files are not symlinks anymore.
+
+I believe I have described this problem here before, but the bug report was deleted as part of the may 29th purge (222f78e9eadd3d2cc40ec94ab22241823a7d50d9,  [[bugs/git_annex_indirect_can_fail_catastrophically]]).
+
+### What steps will reproduce the problem?
+
+`git annex direct` on a large repository, `control-c` before it finishes.
+
+Observe how a lot of files are now considered to be in the famous [[typechange status|forum/git-status_typechange_in_direct_mode/]] in git.
+
+### What version of git-annex are you using? On what operating system?
+
+5.20140717 on Debian Jessie, ext4 filesystem.
+
+### Please provide any additional information below.
+
+I wish i could resume the `git annex direct` command, but this will do a `git commit -a` and therefore commit all those files to git directly. It still seems to me that `git annex` should never run `git commit -a` for exactly that kind of situations.
+
+I think that's it for now. -- [[anarcat]]
+
+Update: i was able to get rid of the `typechange` situation by running `git annex lock` on the repository, but then all files are found to be missing by `git annex fsck`:
+
+[[!format txt """
+fsck films/God Hates Cartoons/VIDEO_TS/VTS_15_0.BUP (fixing location log)
+  ** Based on the location log, films/God Hates Cartoons/VIDEO_TS/VTS_15_0.BUP
+  ** was expected to be present, but its content is missing.
+
+  Only 1 of 2 trustworthy copies exist of films/God Hates Cartoons/VIDEO_TS/VTS_15_0.BUP
+  Back it up with git-annex copy.
+"""]]
+
+Oddly enough, the repo still uses hundreds of gigs, because all the files ended up in `.git/annex/misctmp`. Not sure I remember what happened there.
+
+Similar issues and discussions:
+
+* [[bugs/direct_mode_merge_interrupt/]]
+* [[forum/Cleaning_up_after_aborted_sync_in_direct_mode/]]
+* [[bugs/failure_to_return_to_indirect_mode_on_usb/]]
+* [[forum/git-status_typechange_in_direct_mode/]]
diff --git a/doc/bugs/googlemail/comment_1_5614fa85029f9f97be03cb74899a7099._comment b/doc/bugs/googlemail/comment_1_5614fa85029f9f97be03cb74899a7099._comment
new file mode 100644
index 000000000..eb3c38811
--- /dev/null
+++ b/doc/bugs/googlemail/comment_1_5614fa85029f9f97be03cb74899a7099._comment
@@ -0,0 +1,19 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawm8BAEUyzYhORZmMuocRTk4M-3IumDm5VU"
+ nickname="luciusf0"
+ subject="Bug still valid"
+ date="2014-07-31T08:35:29Z"
+ content="""
+The bug is still valid. A lot of german users had to use the @googlemail.com extension as google couldn't get the gmail domain in Germany. 
+So it might be bothering not just a few people, but a whole country! Now, if that doesn't count ...
+
+	Mac OSX 10.9.4
+	Version: 5.20140717-g5a7d4ff 
+	Build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV FsEvents XMPP DNS Feeds Quvi TDFA CryptoHash
+
+This is the message I get
+
+	Unable to connect to the Jabber server. Maybe you entered the wrong password? (Error message: host xmpp.l.google.com.:5222 failed: AuthenticationFailure (Element {elementName = Name {nameLocalName = \"failure\", nameNamespace = Just \"urn:ietf:params:xml:ns:xmpp-sasl\", namePrefix = Nothing}, elementAttributes = [], elementNodes = [NodeElement (Element {elementName = Name {nameLocalName = \"not-authorized\", nameNamespace = Just \"urn:ietf:params:xml:ns:xmpp-sasl\", namePrefix = Nothing}, elementAttributes = [], elementNodes = []})]}); host alt2.xmpp.l.google.com.:5222 failed: AuthenticationFailure (Element {elementName = Name {nameLocalName = \"failure\", nameNamespace = Just \"urn:ietf:params:xml:ns:xmpp-sasl\", namePrefix = Nothing}, elementAttributes = [], elementNodes = [NodeElement (Element {elementName = Name {nameLocalName = \"not-authorized\", nameNamespace = Just \"urn:ietf:params:xml:ns:xmpp-sasl\", namePrefix = Nothing}, elementAttributes = [], elementNodes = []})]}); host alt1.xmpp.l.google.com.:5222 failed: AuthenticationFailure (Element {elementName = Name {nameLocalName = \"failure\", nameNamespace = Just \"urn:ietf:params:xml:ns:xmpp-sasl\", namePrefix = Nothing}, elementAttributes = [], elementNodes = [NodeElement (Element {elementName = Name {nameLocalName = \"not-authorized\", nameNamespace = Just \"urn:ietf:params:xml:ns:xmpp-sasl\", namePrefix = Nothing}, elementAttributes = [], elementNodes = []})]}); host alt4.xmpp.l.google.com.:5222 failed: AuthenticationFailure (Element {elementName = Name {nameLocalName = \"failure\", nameNamespace = Just \"urn:ietf:params:xml:ns:xmpp-sasl\", namePrefix = Nothing}, elementAttributes = [], elementNodes = [NodeElement (Element {elementName = Name {nameLocalName = \"not-authorized\", nameNamespace = Just \"urn:ietf:params:xml:ns:xmpp-sasl\", namePrefix = Nothing}, elementAttributes = [], elementNodes = []})]}); host alt3.xmpp.l.google.com.:5222 failed: AuthenticationFailure (Element {elementName = Name {nameLocalName = \"failure\", nameNamespace = Just \"urn:ietf:params:xml:ns:xmpp-sasl\", namePrefix = Nothing}, elementAttributes = [], elementNodes = [NodeElement (Element {elementName = Name {nameLocalName = \"not-authorized\", nameNamespace = Just \"urn:ietf:params:xml:ns:xmpp-sasl\", namePrefix = Nothing}, elementAttributes = [], elementNodes = []})]}))
+	
+
+"""]]
diff --git a/doc/bugs/runs_of_of_memory_adding_2_million_files/comment_8_7264b57f309d6e824c612eed8a088327._comment b/doc/bugs/runs_of_of_memory_adding_2_million_files/comment_8_7264b57f309d6e824c612eed8a088327._comment
new file mode 100644
index 000000000..c4d772104
--- /dev/null
+++ b/doc/bugs/runs_of_of_memory_adding_2_million_files/comment_8_7264b57f309d6e824c612eed8a088327._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawmwjQzWgiD7_I3zw-_91rMRf_6qoThupis"
+ nickname="Mike"
+ subject="comment 8"
+ date="2014-07-30T20:33:44Z"
+ content="""
+Great work Joeyh :-) I will install the new version soon. I is fantastic that you fixed this so thoroughly.
+"""]]
diff --git a/doc/bugs/sync_does_not_commit_with_alwasycommit___61___false/comment_1_e6dc7fa1b0a131bb7533f8407e1b5510._comment b/doc/bugs/sync_does_not_commit_with_alwasycommit___61___false/comment_1_e6dc7fa1b0a131bb7533f8407e1b5510._comment
new file mode 100644
index 000000000..ee9957648
--- /dev/null
+++ b/doc/bugs/sync_does_not_commit_with_alwasycommit___61___false/comment_1_e6dc7fa1b0a131bb7533f8407e1b5510._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="http://svario.it/gioele"
+ nickname="gioele"
+ subject="comment 1"
+ date="2014-07-29T14:25:19Z"
+ content="""
+For the records, the solution Joey suggested works but the correct option to pass to `sync` is `-c annex.alwayscommit=true`.
+"""]]
diff --git a/doc/bugs/whereis_does_not_work_in_direct_mode.mdwn b/doc/bugs/whereis_does_not_work_in_direct_mode.mdwn
new file mode 100644
index 000000000..1dbbbbba7
--- /dev/null
+++ b/doc/bugs/whereis_does_not_work_in_direct_mode.mdwn
@@ -0,0 +1,84 @@
+### Please describe the problem.
+
+`git annex whereis` says that there are no copies of any of the files annexed in repositories running in direct mode.
+
+This is the error received:
+
+    $ git annex whereis
+    whereis fileA (0 copies) failed
+    whereis fileB (0 copies) failed
+    git-annex: whereis: 2 failed
+
+### What steps will reproduce the problem?
+
+The following script (available at <https://gist.github.com/gioele/dde462df89edfe17c5e3>) will reproduce this problem.
+
+[[!format sh """
+#!/bin/sh -x
+ 
+set -e ; set -u
+export LC_ALL=C
+ 
+direct=true # set to false to make the problem disappear
+ 
+h=${h:-localhost}
+dr="/tmp/annex"
+ 
+sync='sync -c annex.alwayscommit=true'
+ 
+chmod a+rwx -R pc1 pc2 || true
+rm -Rf pc1 pc2
+ 
+# create central git repo
+ssh $h "chmod a+rwx -R ${dr}/Docs.git" || true
+ssh $h "rm -Rf ${dr}/Docs.git"
+ssh $h "mkdir -p ${dr}/Docs.git"
+ssh $h "cd ${dr}/Docs.git ; git init --bare"
+ 
+d=$(pwd)
+ 
+# populate repo in PC1
+mkdir -p pc1/Docs
+cd pc1/Docs
+echo AAA > fileA
+echo BBB > fileB
+ 
+git init
+git remote add origin $h:$dr/Docs.git
+git fetch --all
+ 
+# simulate a host without git-annex
+git config remote.origin.annex-ignore true
+ 
+git annex init "pc1"
+git annex info
+ 
+$direct && git annex direct
+ 
+git annex add .
+git annex $sync origin
+ 
+# re-create repo on PC2
+cd $d
+mkdir -p pc2
+cd pc2
+git clone $h:$dr/Docs.git
+cd Docs
+ 
+git config remote.origin.annex-ignore true
+ 
+git annex init "pc2"
+git annex info
+ 
+git annex whereis || true
+echo "I was expecting location info to be available after info (press Enter)" ; read enter
+ 
+git annex $sync origin
+ 
+git annex whereis || true
+echo "Why isn't location info available even after sync? (press Enter)"
+"""]]
+
+### What version of git-annex are you using? On what operating system?
+
+git-annex version: 5.20140708-g42df533
diff --git a/doc/chunking.mdwn b/doc/chunking.mdwn
index 87408f8e1..119a85c77 100644
--- a/doc/chunking.mdwn
+++ b/doc/chunking.mdwn
@@ -10,7 +10,7 @@ Note that git-annex has to buffer chunks in memory before they are sent to
 a remote. So, using a large chunk size will make it use more memory.
 
 To enable chunking, pass a `chunk=nnMiB` parameter to `git annex
-initremote, specifying the chunk size. 
+initremote`, specifying the chunk size. 
 
 Good chunk sizes will depend on the remote, but a good starting place
 is probably `1MiB`. Very large chunks are problimatic, both because
diff --git a/doc/design/assistant/chunks.mdwn b/doc/design/assistant/chunks.mdwn
index 454f15f9e..a9709a778 100644
--- a/doc/design/assistant/chunks.mdwn
+++ b/doc/design/assistant/chunks.mdwn
@@ -160,17 +160,11 @@ Instead of storing the chunk count in the special remote, store it in
 the git-annex branch. 
 
 The location log does not record locations of individual chunk keys
-(too space-inneficient).
-Instead, look at git-annex:aaa/bbb/SHA256-s12345--xxxxxxx.log.cnk to get
-the chunk count and size for a key.
+(too space-inneficient). Instead, look at a chunk log in the
+git-annex branch to get the chunk count and size for a key.
 
-Note that a given remote uuid might have multiple chunk sizes logged, if a
-key was stored on it twice using different chunk sizes. Also note that even
-when this file exists for a key, the object may be stored non-chunked on
-the remote too.
-
-`hasKey` would check if any one (chunksize, chunkcount) is satisfied by
-the files on the remote. It would also check if the non-chunked key is
+`hasKey` would check if any of the logged sets of chunks is 
+present on the remote. It would also check if the non-chunked key is
 present, as a fallback.
 
 When dropping a key from the remote, drop all logged chunk sizes.
@@ -185,6 +179,31 @@ remote doesn't know anything about chunk sizes. It uses a little more
 data in the git-annex branch, although with care (using the same timestamp
 as the location log), it can compress pretty well.
 
+## chunk log
+
+Stored in the git-annex branch, this provides a mapping `Key -> [[Key]]`.
+
+Note that a given remote uuid might have multiple sets of chunks (with
+different sizes) logged, if a key was stored on it twice using different
+chunk sizes. Also note that even when the log indicates a key is chunked,
+the object may be stored non-chunked on the remote too.
+
+For fixed size chunks, there's no need to store the list of chunk keys,
+instead the log only records the number of chunks (needed because the size
+of the parent Key may not be known), and the chunk size.
+
+Example:
+
+	1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55:10240 9
+
+Later, might want to support other kinds of chunks, for example ones made
+using a rsync-style rolling checksum. It would probably not make sense to
+store the full [Key] list for such chunks in the log. Instead, it might be
+stored in a file on the remote.
+
+To support such future developments, when updating the chunk log, 
+git-annex should preserve unparsable values (the part after the colon).
+
 ## chunk then encrypt
 
 Rather than encrypting the whole object 1st and then chunking, chunk and
@@ -239,3 +258,14 @@ checking hasKey.
 Note that this is safe to do only as long as the Key being transferred
 cannot possibly have 2 different contents in different repos. Notably not
 necessarily the case for the URL keys generated for quvi.
+
+Both **done**.
+
+## parallel
+
+If 2 remotes both support chunking, uploading could upload different chunks
+to them in parallel. However, the chunk log does not currently allow
+representing the state where some chunks are on one remote and others on
+another remote.
+
+Parallel downloading of chunks from different remotes is a bit more doable.
diff --git a/doc/design/assistant/deltas.mdwn b/doc/design/assistant/deltas.mdwn
index ff4185a18..0f7d308b8 100644
--- a/doc/design/assistant/deltas.mdwn
+++ b/doc/design/assistant/deltas.mdwn
@@ -4,6 +4,24 @@ One simple way is to find the key of the old version of a file that's
 being transferred, so it can be used as the basis for rsync, or any
 other similar transfer protocol.
 
-For remotes that don't use rsync, a poor man's version could be had by
-chunking each object into multiple parts. Only modified parts need be
-transferred. Sort of sub-keys to the main key being stored.
+For remotes that don't use rsync, use a rolling checksum based chunker,
+such as BuzHash. This will produce [[chunks]], which can be stored on the
+remote as regular Keys -- where unlike the fixed size chunk keys, the
+SHA256 part of these keys is the checksum of the chunk they contain.
+
+Once that's done, it's easy to avoid uploading chunks that have been sent
+to the remote before.
+
+When retriving a new version of a file, there would need to be a way to get
+the list of chunk keys that constitute the new version. Probably best to
+store this list on the remote. Then there needs to be a way to find which
+of those chunks are available in locally present files, so that the locally
+available chunks can be extracted, and combined with the chunks that need
+to be downloaded, to reconstitute the file.
+
+To find which chucks are locally available, here are 2 ideas:
+
+1. Use a single basis file, eg an old version of the file. Re-chunk it, and
+   use its chunks.  Slow, but simple.
+2. Some kind of database of locally available chunks. Would need to be kept
+   up-to-date as files are added, and as files are downloaded.
diff --git a/doc/design/roadmap.mdwn b/doc/design/roadmap.mdwn
index 631280828..7a3fa06fe 100644
--- a/doc/design/roadmap.mdwn
+++ b/doc/design/roadmap.mdwn
@@ -14,5 +14,10 @@ Now in the
 * Month 8 [[!traillink git-remote-daemon]]
 * Month 9 Brazil!, [[!traillink assistant/sshpassword]]
 * Month 10 polish [[assistant/Windows]] port
-* **Month 11 [[!traillink assistant/chunks]], [[!traillink assistant/deltas]], [[!traillink assistant/gpgkeys]] (pick 2?)**
-* Month 12 [[!traillink assistant/telehash]]
+* Month 11 [[!traillink assistant/chunks]]
+* **Month 12** user-driven features and polishing
+
+Deferred until later:
+
+* Month XX [[!traillink assistant/deltas]], [[!traillink assistant/gpgkeys]]
+* Month XX [[!traillink assistant/telehash]]
diff --git a/doc/devblog/day_206__zap.mdwn b/doc/devblog/day_206__zap.mdwn
new file mode 100644
index 000000000..eccee2464
--- /dev/null
+++ b/doc/devblog/day_206__zap.mdwn
@@ -0,0 +1,83 @@
+Zap! ... My internet gateway was [destroyed by lightning](https://identi.ca/joeyh/note/xogvXTFDR9CZaCPsmKZipA).
+Limping along regardless, and replacement ordered.
+
+Got resuming of uploads to chunked remotes working. Easy!
+
+----
+
+Next I want to convert the external special remotes to have these nice
+new features. But there is a wrinkle: The new chunking interface works
+entirely on ByteStrings containing the content, but the external special
+remote interface passes content around in files.
+
+I could just make it write the ByteString to a temp file, and pass the temp
+file to the external special remote to store. But then, when chunking is
+not being used, it would pointlessly read a file's content, only to write
+it back out to a temp file.
+
+Similarly, when retrieving a key, the external special remote saves it to a
+file. But we want a ByteString. Except, when not doing chunking or
+encryption, letting the external special remote save the content directly
+to a file is optimal.
+
+One approach would be to change the protocol for external special
+remotes, so that the content is sent over the protocol rather than in temp
+files. But I think this would not be ideal for some kinds of external
+special remotes, and it would probably be quite a lot slower and more
+complicated.
+
+Instead, I am playing around with some type class trickery:
+
+[[!format haskell """
+{-# LANGUAGE Rank2Types TypeSynonymInstances FlexibleInstances MultiParamTypeClasses #-}
+
+type Storer p = Key -> p -> MeterUpdate -> IO Bool
+
+-- For Storers that want to be provided with a file to store.
+type FileStorer a = Storer (ContentPipe a FilePath)
+
+-- For Storers that want to be provided with a ByteString to store
+type ByteStringStorer a = Storer (ContentPipe a L.ByteString)
+
+class ContentPipe src dest where
+        contentPipe :: src -> (dest -> IO a) -> IO a
+
+instance ContentPipe L.ByteString L.ByteString where
+        contentPipe b a = a b
+
+-- This feels a lot like I could perhaps use pipes or conduit...
+instance ContentPipe FilePath FilePath where
+        contentPipe f a = a f
+
+instance ContentPipe L.ByteString FilePath where
+        contentPipe b a = withTmpFile "tmpXXXXXX" $ \f h -> do
+                L.hPut h b
+                hClose h
+                a f
+
+instance ContentPipe FilePath L.ByteString where
+        contentPipe f a = a =<< L.readFile f
+"""]]
+
+The external special remote would be a FileStorer, so when a non-chunked,
+non-encrypted file is provided, it just runs on the FilePath with no extra
+work. While when a ByteString is provided, it's swapped out to a temp file
+and the temp file provided. And many other special remotes are ByteStorers,
+so they will just pass the provided ByteStream through, or read in the
+content of a file.
+
+I think that would work. Thoigh it is not optimal for external special
+remotes that are chunked but not encrypted. For that case, it might be worth
+extending the special remote protocol with a way to say "store a chunk of
+this file from byte N to byte M".
+
+---
+
+Also, talked with ion about what would be involved in using rolling checksum
+based chunks. That would allow for rsync or zsync like behavior, where
+when a file changed, git-annex uploads only the chunks that changed, and the
+unchanged chunks are reused.
+
+I am not ready to work on that yet, but I made some changes to the parsing
+of the chunk log, so that additional chunking schemes like this can be added
+to git-annex later without breaking backwards compatability.
diff --git a/doc/devblog/day_207__at_last.mdwn b/doc/devblog/day_207__at_last.mdwn
new file mode 100644
index 000000000..936ac98f0
--- /dev/null
+++ b/doc/devblog/day_207__at_last.mdwn
@@ -0,0 +1,34 @@
+It took 9 hours, but I finally got to make [[!commit c0dc134cded6078bb2e5fa2d4420b9cc09a292f7]],
+which both removes 35 lines of code, and adds chunking support to all
+external special remotes!
+
+The groundwork for that commit involved taking the type scheme I sketched
+out yesterday, completely failing to make it work with such high-ranked
+types, and falling back to a simpler set of types that both I and GHC seem
+better at getting our heads around.
+
+Then I also had more fun with types, when it turned out I needed to
+run encryption in the Annex monad. So I had to go convert several parts of
+the utility libraries to use MonadIO and exception lifting. Yurk.
+
+The final and most fun stumbling block caused git-annex to crash when
+retriving a file from an external special remote that had neither
+encryption not chunking. Amusingly it was because I had not put in an
+optimation (namely, just renaming the file that was retrieved in this case,
+rather than unnecessarily reading it in and writing it back out). It's
+not often that a lack of an optimisation causes code to crash!
+
+So, fun day, great result, and it should now be very simple to convert
+the bup, ddar, gcrypt, glacier, rsync, S3, and WebDAV special remotes
+to the new system. Fingers crossed.
+
+But first, I will probably take half a day or so and write a 
+`git annex testremote` that can be run in a repository and does live
+testing of a special remote including uploading and downloading files.
+There are quite a lot of cases to test now, and it seems best to get
+that in place before I start changing a lot of remotes without a way to
+test everything.
+
+----
+
+Today's work was sponsored by Daniel Callahan.
diff --git a/doc/devblog/day_208__testremote.mdwn b/doc/devblog/day_208__testremote.mdwn
new file mode 100644
index 000000000..87c497fcc
--- /dev/null
+++ b/doc/devblog/day_208__testremote.mdwn
@@ -0,0 +1,10 @@
+Built `git annex testremote` today. 
+
+That took a little bit longer than expected, because it actually found
+several fence post bugs in the chunking code.
+
+It also found a bug in the sample external special remote script.
+
+I am very pleased with this command. Being able to run 640 tests against
+any remote, without any possibility of damaging data already stored in the
+remote, is awesome. Should have written it a looong time ago!
diff --git a/doc/forum/Central_git_annex_server_that_always_keeps_one_copy.mdwn b/doc/forum/Central_git_annex_server_that_always_keeps_one_copy.mdwn
new file mode 100644
index 000000000..166a22dee
--- /dev/null
+++ b/doc/forum/Central_git_annex_server_that_always_keeps_one_copy.mdwn
@@ -0,0 +1 @@
+Is there a way to configure a central git repository that keeps track of large files with git annex so that multiple users can clone the repository but no repository clone can drop files from the server. Essentially, I'm looking for a way to have one repository that is always populated with at least one copy of each file. Other users shouldn't be able to tell that repository to drop any files (but would be able to add files it). The term "user" in that last sentence really refers to other clones...
diff --git a/doc/forum/Central_git_annex_server_that_always_keeps_one_copy/comment_1_e786c8df6e48d88cf15b555af1b8639a._comment b/doc/forum/Central_git_annex_server_that_always_keeps_one_copy/comment_1_e786c8df6e48d88cf15b555af1b8639a._comment
new file mode 100644
index 000000000..19339a3aa
--- /dev/null
+++ b/doc/forum/Central_git_annex_server_that_always_keeps_one_copy/comment_1_e786c8df6e48d88cf15b555af1b8639a._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawn-TDneVW-8kwb1fyTRAJfH3l1xs2VSEmk"
+ nickname="James"
+ subject="comment 1"
+ date="2014-07-30T20:37:27Z"
+ content="""
+It might not suit all your needs but you could try using gitolite and set permissions on the git-annex branch of your repository
+http://gitolite.com/gitolite/conf.html#write-types
+
+"""]]
diff --git a/doc/forum/Making_Firefox_not_dereference_symlinks_on_open.mdwn b/doc/forum/Making_Firefox_not_dereference_symlinks_on_open.mdwn
new file mode 100644
index 000000000..5680cbd02
--- /dev/null
+++ b/doc/forum/Making_Firefox_not_dereference_symlinks_on_open.mdwn
@@ -0,0 +1,3 @@
+Firefox has the nasty habit that it will force-dereference symlinks when locally opening files (i. e., opening an annexed document will cause it to be opened in .git/annex/objects/…). Since this will break relative links within HTML files, this would make Firefox pretty useless when working with a git annex containing HTML files. (Apparently this behavior is [desired](https://bugzilla.mozilla.org/show_bug.cgi?id=803999) upstream and might not be fixed.)
+
+Seems I’m not the only one who would like to work with annexed HTML files, though. On the [Debian bugtracker](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=691099), another user shared a handy shim which can be used in LD_PRELOAD and which will force Firefox to open symlinks in-place. Thought I’d share this here in case it’s of use to anyone.
diff --git a/doc/forum/Making_Firefox_not_dereference_symlinks_on_open/comment_1_a7b092f2291fa515279cf7dce23df20d._comment b/doc/forum/Making_Firefox_not_dereference_symlinks_on_open/comment_1_a7b092f2291fa515279cf7dce23df20d._comment
new file mode 100644
index 000000000..b4addb92d
--- /dev/null
+++ b/doc/forum/Making_Firefox_not_dereference_symlinks_on_open/comment_1_a7b092f2291fa515279cf7dce23df20d._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="zardoz"
+ ip="78.48.163.229"
+ subject="comment 1"
+ date="2014-07-31T11:43:16Z"
+ content="""
+Sorry, it escaped my attention there’s a dedicated tips forum. Maybe this should be moved there.
+"""]]
diff --git a/doc/forum/local_subtree_and_broken_symlinks/comment_1_779cc4e49cb4da8aea7f5743e6257f21._comment b/doc/forum/local_subtree_and_broken_symlinks/comment_1_779cc4e49cb4da8aea7f5743e6257f21._comment
new file mode 100644
index 000000000..e6ccb814f
--- /dev/null
+++ b/doc/forum/local_subtree_and_broken_symlinks/comment_1_779cc4e49cb4da8aea7f5743e6257f21._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="http://joeyh.name/"
+ ip="24.159.78.125"
+ subject="comment 1"
+ date="2014-07-30T14:55:16Z"
+ content="""
+Known bug: [[bugs/Git_annexed_files_symlink_are_wrong_when_submodule_is_not_in_the_same_path]]
+
+I don't think there's much likelyhood of a fix though. Using direct mode probably works around the problem. Or you can use something like myrepos instead of git subtrees.
+"""]]
diff --git a/doc/forum/remote_server_client_repositories_are_bare__33____63__/comment_3_32bf10cf837db16566dcc99d0b9aaf67._comment b/doc/forum/remote_server_client_repositories_are_bare__33____63__/comment_3_32bf10cf837db16566dcc99d0b9aaf67._comment
new file mode 100644
index 000000000..bf302f7c8
--- /dev/null
+++ b/doc/forum/remote_server_client_repositories_are_bare__33____63__/comment_3_32bf10cf837db16566dcc99d0b9aaf67._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawkftzaCvV7EDKVDfJhsQZ3E1Vn-0db516w"
+ nickname="Edward"
+ subject="One snag"
+ date="2014-07-28T19:37:04Z"
+ content="""
+I setup a non-bare repo on a server by following the above steps (git init, git annex init, then add it as a Remote Server from elsewhere and combine repos). It worked, but I hit a snag and needed to add another step.
+
+After git init, you're not sitting on any branch yet, and that seems to have prevented the assistant from doing anything to synchronize the working tree on the server. After I did \"git checkout synced/master\", it started working.
+"""]]
diff --git a/doc/forum/shared_cipher_for_S3_attempting_to_decrypt_a_non-encrypted_file.mdwn b/doc/forum/shared_cipher_for_S3_attempting_to_decrypt_a_non-encrypted_file.mdwn
new file mode 100644
index 000000000..bd172b56e
--- /dev/null
+++ b/doc/forum/shared_cipher_for_S3_attempting_to_decrypt_a_non-encrypted_file.mdwn
@@ -0,0 +1,20 @@
+I am trying to S3 as a file store for git annex. I have set up the remote via the following command:
+
+    git annex initremote xxx-s3 type=S3 encryption=shared embedcreds=yes datacenter=EU bucket=xxx-git-annex fileprefix=test/
+
+The remote gets set up correctly and creates the directory I want, and adds a annex-uuid file.
+
+Now when I try to copy a file to the xxx-s3 remote, I get the following error:
+
+    $ git annex add ssl-success-and-failure-with-tl-logs.log 
+    add ssl-success-and-failure-with-tl-logs.log ok
+    (Recording state in git...)
+    $ git annex copy ssl-success-and-failure-with-tl-logs.log --to xxx-s3
+    copy ssl-success-and-failure-with-tl-logs.log (gpg) gpg: no valid OpenPGP data found.
+    gpg: decrypt_message failed: eof
+
+    git-annex: user error (gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--batch","--passphrase-fd","10","--decrypt"] exited 2)
+    failed
+    git-annex: copy: 1 failed
+
+Any ideas what might be wrong? Is shared cipher broken somehow?
diff --git a/doc/forum/shared_cipher_for_S3_attempting_to_decrypt_a_non-encrypted_file/comment_1_b42ff37be172ba841980c17ad6223e06._comment b/doc/forum/shared_cipher_for_S3_attempting_to_decrypt_a_non-encrypted_file/comment_1_b42ff37be172ba841980c17ad6223e06._comment
new file mode 100644
index 000000000..1268d8cd0
--- /dev/null
+++ b/doc/forum/shared_cipher_for_S3_attempting_to_decrypt_a_non-encrypted_file/comment_1_b42ff37be172ba841980c17ad6223e06._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawmAINLSovhWM_4_KrbngOcxduIbBuKv8ZA"
+ nickname="Nuutti"
+ subject="comment 1"
+ date="2014-08-01T09:28:21Z"
+ content="""
+Sorry, this should probably be in bugs.
+"""]]
diff --git a/doc/forum/usability:_what_are_those_arrow_things__63__.mdwn b/doc/forum/usability:_what_are_those_arrow_things__63__.mdwn
new file mode 100644
index 000000000..ac8343040
--- /dev/null
+++ b/doc/forum/usability:_what_are_those_arrow_things__63__.mdwn
@@ -0,0 +1,21 @@
+i want to relate a usability story that happens fairly regularly when I show git-annex to people. the story goes like this.
+
+----
+
+Antoine sat down at his computer saying, "i have this great movie collection I want to share with you, my friend, because the fair use provisions allow for that, and I use this great git-annex tool that allows me to sync my movie collection between different places". His friend Charlie, a Linux user only vaguely familiar with the internals of how his operating system or legal system actually works, reads this as "yay free movies" and wholeheartedly agrees to lend himself to the experiment.
+
+Antoine creates a user account for Charlie on his home computer, because he doesn't want to have to do everything himself. "That way you can choose which movies you want, because you probably don't want my complete movie collection!" Charlie emphatically responds: "right, I only have my laptop and this USB key here, so I don't think I can get it all".
+
+Charlie logs into Antoine's computer, named `marcos`. Antoine shows Charlie where the movies are located (`/srv/video`) through the file browser (Thunar, for the record). Charlie inserts his USB key into `marcos` and a new icon for the USB key shows up. Then Charlie finds a video he likes, copies and pastes it into the USB key. But instead of a familiar progress bar, Charlie is prompted with a dialog that says "Le système de fichiers ne gère pas les liens symboliques." (Antoine is french, so excuse him, this weird message says that the filesystem doesn't support symbolic links.) Puzzled, Charlie tries to copy the file to his home directory instead. This works better, but the file has a little arrow on it, which seems odd to Charlie. He then asks Antoine for advice.
+
+Antoine then has no solution but to convert the git-annex repository into direct mode, something which takes a significant amount of time and is actually [[designated as "untrusted"|direct_mode]] in the documentation. In fact, so much so that he actually did [[screw up his repository magnificently|bugs/direct_command_leaves_repository_inconsistent_if_interrupted]] because he freaked out when `git-annex direct` started and interrupted it because he tought it would take too long.
+
+----
+
+Now I understand it is not necessarily `git-annex`'s responsability if Thunar (or Nautilus, for that matter), doesn't know how to properly deal with symlinks (hint: just dereference the damn thing already). Maybe I should file a bug about this against thunar? I also understand that symlinks are useful to ensure the security of the data hosted in `git-annex`, and that I could have used direct mode in the first place. But I like to track changes in git to those files, and direct mode makes that really difficult.
+
+I didn't file this as a bug because I want to start the conversation, but maybe it should qualify as a usability bug. As things stand, this is one of the biggest hurdle in teaching people about git annex.
+
+(The other being "how do i actually use git annex to sync those files instead of just copying them by hand", but that's for another story!)
+
+-- [[anarcat]]
diff --git a/doc/special_remotes.mdwn b/doc/special_remotes.mdwn
index 2ef98c40d..dd82d23f1 100644
--- a/doc/special_remotes.mdwn
+++ b/doc/special_remotes.mdwn
@@ -81,3 +81,21 @@ on special remotes, instead use `git annex unused --from`. Example:
 	  (To remove unwanted data: git-annex dropunused --from mys3 NUMBER)
 	$ git annex dropunused --from mys3 1
 	dropunused 12948 (from mys3...) ok
+
+## Testing special remotes
+
+To make sure that a special remote is working correctly, you can use the
+`git annex testremote` command. This expects you to have set up the remote
+as usual, and it then runs a lot of tests, using random data. It's
+particularly useful to test new implementations of special remotes.
+
+By default it will upload and download files of around 1MiB to the remote
+it tests; the `--size` parameter can adjust it test using smaller files.
+
+It's safe to use this command even when you're already storing data in a
+remote; it won't touch your existing files stored on the remote.
+
+For most remotes, it also won't bloat the remote with any data, since
+it cleans up the stuff it uploads. However, the bup, ddar, and tahoe
+special remotes don't support removal of uploaded files, so be careful
+with those.
diff --git a/doc/tips/metadata_driven_views.mdwn b/doc/tips/metadata_driven_views.mdwn
index 5128c18e2..1826ed1ce 100644
--- a/doc/tips/metadata_driven_views.mdwn
+++ b/doc/tips/metadata_driven_views.mdwn
@@ -6,7 +6,7 @@ keeps track of.
 
 One nice way to use the metadata is through **views**. You can ask
 git-annex to create a view of files in the currently checked out branch
-that have certian metadata. Once you're in a view, you can move and copy
+that have certain metadata. Once you're in a view, you can move and copy
 files to adjust their metadata further. Rather than the traditional
 hierarchical directory structure, views are dynamic; you can easily
 refine or reorder a view.
diff --git a/doc/todo/Speed_up___39__import_--clean-duplicates__39__.mdwn b/doc/todo/Speed_up___39__import_--clean-duplicates__39__.mdwn
new file mode 100644
index 000000000..34c21ab01
--- /dev/null
+++ b/doc/todo/Speed_up___39__import_--clean-duplicates__39__.mdwn
@@ -0,0 +1,7 @@
+I'm currently in the process of gutting old (some broken) git-annex's and cleaning out download directories from before I started using git-annex.
+
+To do this, I am running `git annex import --clean--duplicates $PATH` on the directories I want to clear out but sometimes, this takes a unnecessarily long time.
+
+For example, git-annex will calculate the digest for a huge file (30GB+) in $TARGET, even though there are no files in the annex of that size.
+
+It's a common shortcut to check for duplicate sizes first to eliminate definite non-matches really quickly. Can this be added to git-annex's `import` in some way or is this a no-go due to the constant memory constraint?
diff --git a/doc/todo/Speed_up___39__import_--clean-duplicates__39__/comment_1_9268c639d3d21cce4ca7b60d08e9cb65._comment b/doc/todo/Speed_up___39__import_--clean-duplicates__39__/comment_1_9268c639d3d21cce4ca7b60d08e9cb65._comment
new file mode 100644
index 000000000..8584d5ae8
--- /dev/null
+++ b/doc/todo/Speed_up___39__import_--clean-duplicates__39__/comment_1_9268c639d3d21cce4ca7b60d08e9cb65._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="http://joeyh.name/"
+ ip="24.159.78.125"
+ subject="interesting idea"
+ date="2014-07-30T15:03:46Z"
+ content="""
+This could be done in constant space using a bloom filter of known file sizes. Files with wrong sizes would sometimes match, but no problem, it would then just do the work it does now.
+
+However, to build such a filter, git-annex would need to do a scan of all keys it knows about. This would take approximately as long to run as `git annex unused` does. It might make sense to only build the filter if it runs into a fairly large file. Alternatively, a bloom filter of file sizes could be cached and updated on the fly as things change (but this gets pretty complex).
+"""]]
diff --git a/doc/todo/Speed_up___39__import_--clean-duplicates__39__/comment_2_9c6688901ef20badd834419202627d5c._comment b/doc/todo/Speed_up___39__import_--clean-duplicates__39__/comment_2_9c6688901ef20badd834419202627d5c._comment
new file mode 100644
index 000000000..e372b405e
--- /dev/null
+++ b/doc/todo/Speed_up___39__import_--clean-duplicates__39__/comment_2_9c6688901ef20badd834419202627d5c._comment
@@ -0,0 +1,21 @@
+[[!comment format=mdwn
+ username="Xyem"
+ ip="81.111.193.130"
+ subject="comment 2"
+ date="2014-08-01T09:05:45Z"
+ content="""
+Could be tested out with an additional flag `--with-size-bloom` on import?
+
+It would then build a bloom (and use a cached one with --fast) and do the usual import.
+
+So I could do this:
+
+    # Bloom is created and the import is done using it
+    git annex import --clean-duplicates --with-size-bloom $TARGET
+
+    # Previously created bloom is used
+    git annex import --clean-duplicates --with-size-bloom --fast $TARGET2
+    git annex import --clean-duplicates --with-size-bloom --fast $TARGET3
+
+I can implement this behaviour in Perl with Bloom::Filter and let you know how it performs if that would be useful to you..?
+"""]]
diff --git a/doc/todo/wishlist:_Parity_files_for_encrypted_remotes.mdwn b/doc/todo/wishlist:_Parity_files_for_encrypted_remotes.mdwn
new file mode 100644
index 000000000..34f06ad63
--- /dev/null
+++ b/doc/todo/wishlist:_Parity_files_for_encrypted_remotes.mdwn
@@ -0,0 +1,7 @@
+I have data that has accompanying parity files.  This is supposed to add some
+security to file integrity; however, it only works as long as the files are
+available unencrypted.  In case of encrypted special remotes the existing parity files
+won't be of any use if the encrypted versions of files get corrupted in the remote location.
+
+Would it be worthwhile for git-annex to generate its own
+parity files for the encrypted data in encrypted special remotes?
diff --git a/doc/todo/wishlist:_add_repository_name_to_commit_messages.mdwn b/doc/todo/wishlist:_add_repository_name_to_commit_messages.mdwn
new file mode 100644
index 000000000..1c37cc18b
--- /dev/null
+++ b/doc/todo/wishlist:_add_repository_name_to_commit_messages.mdwn
@@ -0,0 +1,3 @@
+The commit messages made by git-annex are quite spartan, especially in direct mode where one cannot enter its own commit messages. This means that all that the messages say is "branch created", "git-annex automatic sync", "update", "merging" or little more.
+
+It would be nice if git-annex could add at least the name of the repository/remote to the commit message. This would make the log a lot more clear, especially when dealing with problems or bugs.
author	Joey Hess <joey@kitenet.net>	2014-08-02 17:25:50 -0400
committer	Joey Hess <joey@kitenet.net>	2014-08-02 17:25:50 -0400
commit	d645129f6e573b60e54fb7c35bfe98a87d2eb9d0 (patch)
tree	4ed3f157970605e42fe457bfa67efc825a39ee84 /doc
parent	0eaed261ea11060fc9644400c7f31f8c3ec1052b (diff)
parent	3beefc3b4bc54e0d2a0cc7a4cc0745af13d8014c (diff)