aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2014-08-01 18:00:47 -0400
committerGravatar Joey Hess <joey@kitenet.net>2014-08-01 18:00:47 -0400
commitddbf5df3c9940473663a6e562f8ee3583867046e (patch)
tree40dc81d0c2693b5fd5fc3e5b9cd8df17e0505ec5 /doc
parent154cb13180fbe877d2030d83a415b30150ac7298 (diff)
parentd0a8e3d6217f2924b864393d425b6d7582370d07 (diff)
Merge branch 'newchunks'
I am happy enough with this to make it live!
Diffstat (limited to 'doc')
-rw-r--r--doc/chunking.mdwn31
-rw-r--r--doc/design/assistant/chunks.mdwn9
-rw-r--r--doc/design/external_special_remote_protocol.mdwn9
-rw-r--r--doc/git-annex.mdwn31
-rw-r--r--doc/internals/hashing.mdwn5
-rw-r--r--doc/special_remotes/directory.mdwn12
-rwxr-xr-xdoc/special_remotes/external/example.sh15
-rw-r--r--doc/special_remotes/webdav.mdwn12
-rw-r--r--doc/tips/using_box.com_as_a_special_remote.mdwn6
9 files changed, 103 insertions, 27 deletions
diff --git a/doc/chunking.mdwn b/doc/chunking.mdwn
new file mode 100644
index 000000000..87408f8e1
--- /dev/null
+++ b/doc/chunking.mdwn
@@ -0,0 +1,31 @@
+Some [[special_remotes]] have support for breaking large files up into
+chunks that are stored on the remote.
+
+This can be useful to work around limitations on the size of files
+on the remote.
+
+Chunking also allows for resuming interrupted downloads and uploads.
+
+Note that git-annex has to buffer chunks in memory before they are sent to
+a remote. So, using a large chunk size will make it use more memory.
+
+To enable chunking, pass a `chunk=nnMiB` parameter to `git annex
+initremote, specifying the chunk size.
+
+Good chunk sizes will depend on the remote, but a good starting place
+is probably `1MiB`. Very large chunks are problimatic, both because
+git-annex needs to buffer one chunk in memory when uploading, and because
+a larger chunk will make resuming interrupted transfers less efficient.
+On the other hand, when a file is split into a great many chunks,
+there can be increased overhead of making many requests to the remote.
+
+To disable chunking of a remote that was using chunking,
+pass `chunk=0` to `git annex enableremote`. Any content already stored on
+the remote using chunks will continue to be accessed via chunks, this
+just prevents using chunks when storing new content.
+
+To change the chunk size, pass a `chunk=nnMiB` parameter to
+`git annex enableremote`. This only affects the chunk sized used when
+storing new content.
+
+See also: [[design document|design/assistant/chunks]]
diff --git a/doc/design/assistant/chunks.mdwn b/doc/design/assistant/chunks.mdwn
index 48a1876e4..a9709a778 100644
--- a/doc/design/assistant/chunks.mdwn
+++ b/doc/design/assistant/chunks.mdwn
@@ -231,6 +231,15 @@ cannot check exact file sizes.
If padding is enabled, gpg compression should be disabled, to not leak
clues about how well the files compress and so what kind of file it is.
+## chunk key hashing
+
+A chunk key should hash into the same directory structure as its parent
+key. This will avoid lots of extra hash directories when using chunking
+with non-encrypted keys.
+
+Won't happen when the key is encrypted, but that is good; hashing to the
+same bucket then would allow statistical correlation.
+
## resuming interupted transfers
Resuming interrupted downloads, and uploads are both possible.
diff --git a/doc/design/external_special_remote_protocol.mdwn b/doc/design/external_special_remote_protocol.mdwn
index 6fe09ff7c..01ffe7fd4 100644
--- a/doc/design/external_special_remote_protocol.mdwn
+++ b/doc/design/external_special_remote_protocol.mdwn
@@ -101,12 +101,14 @@ The following requests *must* all be supported by the special remote.
Tells the special remote it's time to prepare itself to be used.
Only INITREMOTE can come before this.
* `TRANSFER STORE|RETRIEVE Key File`
- Requests the transfer of a key. For Send, the File is the file to upload;
- for Receive the File is where to store the download.
+ Requests the transfer of a key. For STORE, the File is the file to upload;
+ for RETRIEVE the File is where to store the download.
Note that the File should not influence the filename used on the remote.
The filename will not contain any whitespace.
+ Note that it's important that, while a Key is being stored, CHECKPRESENT
+ not indicate it's present until all the data has been transferred.
Multiple transfers might be requested by git-annex, but it's fine for the
- program to serialize them and only do one at a time.
+ program to serialize them and only do one at a time.
* `CHECKPRESENT Key`
Requests the remote check if a key is present in it.
* `REMOVE Key`
@@ -286,7 +288,6 @@ start a new process the next time it needs to use a remote.
the remote. However, \n and probably \0 need to be escaped somehow in the
file data, which adds complication.
* uuid discovery during INITREMOTE.
-* Support for splitting files into chunks.
* Support for getting and setting the list of urls that can be associated
with a key.
* Hook into webapp. Needs a way to provide some kind of prompt to the user
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index 8ba3558d3..ba851eef8 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -949,19 +949,42 @@ subdirectories).
Merge conflicts between two files that are not annexed will not be
automatically resolved.
+* `remotedaemon`
+
+ Detects when network remotes have received git pushes and fetches from them.
+
+* `xmppgit`
+
+ This command is used internally to perform git pulls over XMPP.
+
+# TESTING COMMANDS
+
* `test`
This runs git-annex's built-in test suite.
There are several parameters, provided by Haskell's tasty test framework.
+ Pass --help for details.
-* `remotedaemon`
+* `testremote remote`
- Detects when network remotes have received git pushes and fetches from them.
+ This tests a remote by generating some random objects and sending them to
+ the remote, then redownloading them, removing them from the remote, etc.
-* `xmppgit`
+ It's safe to run in an existing repository (the repository contents are
+ not altered), although it may perform expensive data transfers.
- This command is used internally to perform git pulls over XMPP.
+ The --size option can be used to tune the size of the generated objects.
+
+ Testing a single remote will use the remote's configuration,
+ automatically varying the chunk sizes, and with simple shared encryption
+ enabled and disabled.
+
+* `fuzztest`
+
+ Generates random changes to files in the current repository,
+ for use in testing the assistant. This is dangerous, so it will not
+ do anything unless --forced.
# OPTIONS
diff --git a/doc/internals/hashing.mdwn b/doc/internals/hashing.mdwn
index cc4bc6456..bdc259b63 100644
--- a/doc/internals/hashing.mdwn
+++ b/doc/internals/hashing.mdwn
@@ -36,3 +36,8 @@ string, but where that would normally encode the bits using the 16 characters
0-9a-f, this instead uses the 32 characters "0123456789zqjxkmvwgpfZQJXKMVWGPF".
The first 2 letters of the resulting string are the first directory, and the
second 2 are the second directory.
+
+## chunk keys
+
+The same hash directory is used for a chunk key as would be used for the
+key that it's a chunk of.
diff --git a/doc/special_remotes/directory.mdwn b/doc/special_remotes/directory.mdwn
index 96d593821..6279024ec 100644
--- a/doc/special_remotes/directory.mdwn
+++ b/doc/special_remotes/directory.mdwn
@@ -25,13 +25,11 @@ remote:
* `keyid` - Specifies the gpg key to use for [[encryption]].
-* `chunksize` - Avoid storing files larger than the specified size in the
- directory. For use on directories on mount points that have file size
- limitations. The default is to never chunk files.
- The value can use specified using any commonly used units.
- Example: `chunksize=100 megabytes`
- Note that enabling chunking on an existing remote with non-chunked
- files is not recommended; nor is changing the chunksize.
+* `chunk` - Enables [[chunking]] when storing large files.
+
+* `chunksize` - Deprecated version of chunk parameter above.
+ Do not use for new remotes. It is not safe to change the chunksize
+ setting of an existing remote.
Setup example:
diff --git a/doc/special_remotes/external/example.sh b/doc/special_remotes/external/example.sh
index 5152ccc28..8fed9f4aa 100755
--- a/doc/special_remotes/external/example.sh
+++ b/doc/special_remotes/external/example.sh
@@ -128,14 +128,25 @@ while read line; do
STORE)
# Store the file to a location
# based on the key.
- # XXX when possible, send PROGRESS
+ # XXX when at all possible, send PROGRESS
calclocation "$key"
mkdir -p "$(dirname "$LOC")"
- if runcmd cp "$file" "$LOC"; then
+ # Store in temp file first, so that
+ # CHECKPRESENT does not see it
+ # until it is all stored.
+ mkdir -p "$mydirectory/tmp"
+ tmp="$mydirectory/tmp/$key"
+ if runcmd cp "$file" "$tmp" \
+ && runcmd mv -f "$tmp" "$LOC"; then
echo TRANSFER-SUCCESS STORE "$key"
else
echo TRANSFER-FAILURE STORE "$key"
fi
+
+ mkdir -p "$(dirname "$LOC")"
+ # The file may already exist, so
+ # make sure we can overwrite it.
+ chmod 644 "$LOC" 2>/dev/null || true
;;
RETRIEVE)
# Retrieve from a location based on
diff --git a/doc/special_remotes/webdav.mdwn b/doc/special_remotes/webdav.mdwn
index 871540a97..64eed5d0b 100644
--- a/doc/special_remotes/webdav.mdwn
+++ b/doc/special_remotes/webdav.mdwn
@@ -29,13 +29,11 @@ the webdav remote.
be created as needed. Use of a https URL is strongly
encouraged, since HTTP basic authentication is used.
-* `chunksize` - Avoid storing files larger than the specified size in
- WebDAV. For use when the WebDAV server has file size
- limitations. The default is to never chunk files.
- The value can use specified using any commonly used units.
- Example: `chunksize=75 megabytes`
- Note that enabling chunking on an existing remote with non-chunked
- files is not recommended, nor is changing the chunksize.
+* `chunk` - Enables [[chunking]] when storing large files.
+
+* `chunksize` - Deprecated version of chunk parameter above.
+ Do not use for new remotes. It is not safe to change the chunksize
+ setting of an existing remote.
Setup example:
diff --git a/doc/tips/using_box.com_as_a_special_remote.mdwn b/doc/tips/using_box.com_as_a_special_remote.mdwn
index ac59834f5..149d1f824 100644
--- a/doc/tips/using_box.com_as_a_special_remote.mdwn
+++ b/doc/tips/using_box.com_as_a_special_remote.mdwn
@@ -5,9 +5,9 @@ for providing 50 gb of free storage if you sign up with its Android client.
git-annex can use Box as a [[special remote|special_remotes]].
Recent versions of git-annex make this very easy to set up:
- WEBDAV_USERNAME=you@example.com WEBDAV_PASSWORD=xxxxxxx git annex initremote box.com type=webdav url=https://dav.box.com/dav/git-annex chunksize=75mb encryption=shared
+ WEBDAV_USERNAME=you@example.com WEBDAV_PASSWORD=xxxxxxx git annex initremote box.com type=webdav url=https://dav.box.com/dav/git-annex chunk=50mb encryption=shared
-Note the use of chunksize; Box has a 100 mb maximum file size, and this
+Note the use of [[chunking]]; Box has a 100 mb maximum file size, and this
breaks up large files into chunks before that limit is reached.
# old davfs2 method
@@ -58,7 +58,7 @@ Create the special remote, in your git-annex repository.
** This example is non-encrypted; fill in your gpg key ID for a securely
encrypted special remote! **
- git annex initremote box.com type=directory directory=/media/box.com chunksize=2mb encryption=none
+ git annex initremote box.com type=directory directory=/media/box.com chunk=2mb encryption=none
Now git-annex can copy files to box.com, get files from it, etc, just like
with any other special remote.