summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2011-11-04 15:21:45 -0400
committerGravatar Joey Hess <joey@kitenet.net>2011-11-04 15:51:01 -0400
commitef3457196ace3669ddfa93039f2d3c15baf54713 (patch)
tree391787de35537c71068cdd8e2fc882109a2c3b79 /doc
parent1089e85d48a0d3c455fc2f4139b82484b94b5bbe (diff)
use SHA256 by default
To get old behavior, add a .gitattributes containing: * annex.backend=WORM I feel that SHA256 is a better default for most people, as long as their systems are fast enough that checksumming their files isn't a problem. git-annex should default to preserving the integrity of data as well as git does. Checksum backends also work better with editing files via unlock/lock. I considered just using SHA1, but since that hash is believed to be somewhat near to being broken, and git-annex deals with large files which would be a perfect exploit medium, I decided to go to a SHA-2 hash. SHA512 is annoyingly long when displayed, and git-annex displays it in a few places (and notably it is shown in ls -l), so I picked the shorter hash. Considered SHA224 as it's even shorter, but feel it's a bit weird. I expect git-annex will use SHA-3 at some point in the future, but probably not soon! Note that systems without a sha256sum (or sha256) program will fall back to defaulting to SHA1.
Diffstat (limited to 'doc')
-rw-r--r--doc/backends.mdwn32
-rw-r--r--doc/walkthrough/adding_files.mdwn4
-rw-r--r--doc/walkthrough/moving_file_content_between_repositories.mdwn2
-rw-r--r--doc/walkthrough/unused_data.mdwn14
-rw-r--r--doc/walkthrough/using_ssh_remotes.mdwn2
5 files changed, 29 insertions, 25 deletions
diff --git a/doc/backends.mdwn b/doc/backends.mdwn
index ebcdedc2a..2030d107a 100644
--- a/doc/backends.mdwn
+++ b/doc/backends.mdwn
@@ -5,17 +5,19 @@ to retrieve the file's content (its value).
Multiple pluggable key-value backends are supported, and a single repository
can use different ones for different files.
-* `WORM` ("Write Once, Read Many") This assumes that any file with
- the same basename, size, and modification time has the same content.
- This is the default, and the least expensive backend.
-* `SHA1` -- This uses a key based on a sha1 checksum. This allows
+* `SHA256` -- The default backend for new files. This allows
verifying that the file content is right, and can avoid duplicates of
files with the same content. Its need to generate checksums
- can make it slower for large files.
-* `SHA512`, `SHA384`, `SHA256`, `SHA224` -- Like SHA1, but larger
- checksums. Mostly useful for the very paranoid, or anyone who is
- researching checksum collisions and wants to annex their colliding data. ;)
-* `SHA1E`, `SHA512E`, etc -- Variants that preserve filename extension as
+ can make it slower for large files.
+* `WORM` ("Write Once, Read Many") This assumes that any file with
+ the same basename, size, and modification time has the same content.
+ This is the the least expensive backend, recommended for really large
+ files or slow systems.
+* `SHA512` -- Best currently available hash, for the very paranoid.
+* `SHA1` -- Smaller hash than `SHA256` for those who want a checksum
+ but are not concerned about security.
+* `SHA384`, `SHA224` -- Hashes for people who like unusual sizes.
+* `SHA256E`, `SHA1E`, etc -- Variants that preserve filename extension as
part of the key. Useful for archival tasks where the filename extension
contains metadata that should be preserved.
@@ -27,9 +29,11 @@ For finer control of what backend is used when adding different types of
files, the `.gitattributes` file can be used. The `annex.backend`
attribute can be set to the name of the backend to use for matching files.
-For example, to use the SHA1 backend for sound files, which tend to be
-smallish and might be modified or copied over time, you could set in
-`.gitattributes`:
+For example, to use the SHA256 backend for sound files, which tend to be
+smallish and might be modified or copied over time,
+while using the WORM backend for everything else, you could set
+in `.gitattributes`:
- *.mp3 annex.backend=SHA1
- *.ogg annex.backend=SHA1
+ * annex.backend=WORM
+ *.mp3 annex.backend=SHA256
+ *.ogg annex.backend=SHA256
diff --git a/doc/walkthrough/adding_files.mdwn b/doc/walkthrough/adding_files.mdwn
index 77a7fbc15..d1b5a04f7 100644
--- a/doc/walkthrough/adding_files.mdwn
+++ b/doc/walkthrough/adding_files.mdwn
@@ -2,8 +2,8 @@
# cp /tmp/big_file .
# cp /tmp/debian.iso .
# git annex add .
- add big_file ok
- add debian.iso ok
+ add big_file (checksum...) ok
+ add debian.iso (checksum...) ok
# git commit -a -m added
When you add a file to the annex and commit it, only a symlink to
diff --git a/doc/walkthrough/moving_file_content_between_repositories.mdwn b/doc/walkthrough/moving_file_content_between_repositories.mdwn
index 27dffe913..3ffcc1175 100644
--- a/doc/walkthrough/moving_file_content_between_repositories.mdwn
+++ b/doc/walkthrough/moving_file_content_between_repositories.mdwn
@@ -9,5 +9,5 @@ makes it very easy.
move my_cool_big_file (to usbdrive...) ok
# git annex move video/hackity_hack_and_kaxxt.mov --from fileserver
move video/hackity_hack_and_kaxxt.mov (from fileserver...)
- WORM-s86050597-m1274316523--hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02
+ SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e 100% 82MB 199.1KB/s 07:02
ok
diff --git a/doc/walkthrough/unused_data.mdwn b/doc/walkthrough/unused_data.mdwn
index e142b576c..bd6c39871 100644
--- a/doc/walkthrough/unused_data.mdwn
+++ b/doc/walkthrough/unused_data.mdwn
@@ -1,8 +1,8 @@
-It's possible for data to accumulate in the annex that no files point to
-anymore. One way it can happen is if you `git rm` a file without
-first calling `git annex drop`. And, when you modify an annexed file, the old
-content of the file remains in the annex. Another way is when migrating
-between key-value [[backends|backend]].
+It's possible for data to accumulate in the annex that no files in any
+branch point to anymore. One way it can happen is if you `git rm` a file
+without first calling `git annex drop`. And, when you modify an annexed
+file, the old content of the file remains in the annex. Another way is when
+migrating between key-value [[backends|backend]].
This might be historical data you want to preserve, so git-annex defaults to
preserving it. So from time to time, you may want to check for such data and
@@ -12,8 +12,8 @@ eliminate it to save space.
unused . (checking for unused data...)
Some annexed data is no longer used by any files in the repository.
NUMBER KEY
- 1 WORM-s3-m1289672605--file
- 2 WORM-s14-m1289672605--file
+ 1 SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e
+ 2 SHA1-s14--f1358ec1873d57350e3dc62054dc232bc93c2bd1
(To see where data was previously used, try: git log --stat -S'KEY')
(To remove unwanted data: git-annex dropunused NUMBER)
ok
diff --git a/doc/walkthrough/using_ssh_remotes.mdwn b/doc/walkthrough/using_ssh_remotes.mdwn
index fbbbbe070..60011a200 100644
--- a/doc/walkthrough/using_ssh_remotes.mdwn
+++ b/doc/walkthrough/using_ssh_remotes.mdwn
@@ -13,7 +13,7 @@ Now you can get files and they will be transferred (using `rsync` via `ssh`):
# git annex get my_cool_big_file
get my_cool_big_file (getting UUID for origin...) (from origin...)
- WORM-s2159-m1285650548--my_cool_big_file 100% 2159 2.1KB/s 00:00
+ SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e 100% 2159 2.1KB/s 00:00
ok
When you drop files, git-annex will ssh over to the remote and make