summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2011-11-04 15:21:45 -0400
committerGravatar Joey Hess <joey@kitenet.net>2011-11-04 15:51:01 -0400
commitef3457196ace3669ddfa93039f2d3c15baf54713 (patch)
tree391787de35537c71068cdd8e2fc882109a2c3b79
parent1089e85d48a0d3c455fc2f4139b82484b94b5bbe (diff)
use SHA256 by default
To get old behavior, add a .gitattributes containing: * annex.backend=WORM I feel that SHA256 is a better default for most people, as long as their systems are fast enough that checksumming their files isn't a problem. git-annex should default to preserving the integrity of data as well as git does. Checksum backends also work better with editing files via unlock/lock. I considered just using SHA1, but since that hash is believed to be somewhat near to being broken, and git-annex deals with large files which would be a perfect exploit medium, I decided to go to a SHA-2 hash. SHA512 is annoyingly long when displayed, and git-annex displays it in a few places (and notably it is shown in ls -l), so I picked the shorter hash. Considered SHA224 as it's even shorter, but feel it's a bit weird. I expect git-annex will use SHA-3 at some point in the future, but probably not soon! Note that systems without a sha256sum (or sha256) program will fall back to defaulting to SHA1.
-rw-r--r--Backend.hs4
-rw-r--r--Backend/SHA.hs6
-rw-r--r--debian/changelog3
-rw-r--r--doc/backends.mdwn32
-rw-r--r--doc/walkthrough/adding_files.mdwn4
-rw-r--r--doc/walkthrough/moving_file_content_between_repositories.mdwn2
-rw-r--r--doc/walkthrough/unused_data.mdwn14
-rw-r--r--doc/walkthrough/using_ssh_remotes.mdwn2
8 files changed, 37 insertions, 30 deletions
diff --git a/Backend.hs b/Backend.hs
index a09fc0e99..9a40e5459 100644
--- a/Backend.hs
+++ b/Backend.hs
@@ -26,12 +26,12 @@ import Types.Key
import qualified Types.Backend as B
-- When adding a new backend, import it here and add it to the list.
-import qualified Backend.WORM
import qualified Backend.SHA
+import qualified Backend.WORM
import qualified Backend.URL
list :: [Backend Annex]
-list = Backend.WORM.backends ++ Backend.SHA.backends ++ Backend.URL.backends
+list = Backend.SHA.backends ++ Backend.WORM.backends ++ Backend.URL.backends
{- List of backends in the order to try them when storing a new key. -}
orderedList :: Annex [Backend Annex]
diff --git a/Backend/SHA.hs b/Backend/SHA.hs
index 3a54a8871..d44982117 100644
--- a/Backend/SHA.hs
+++ b/Backend/SHA.hs
@@ -16,12 +16,12 @@ import qualified Build.SysConfig as SysConfig
type SHASize = Int
+-- order is slightly significant; want SHA256 first, and more general
+-- sizes earlier
sizes :: [Int]
-sizes = [1, 256, 512, 224, 384]
+sizes = [256, 1, 512, 224, 384]
backends :: [Backend Annex]
--- order is slightly significant; want sha1 first, and more general
--- sizes earlier
backends = catMaybes $ map genBackend sizes ++ map genBackendE sizes
genBackend :: SHASize -> Maybe (Backend Annex)
diff --git a/debian/changelog b/debian/changelog
index e59b4f404..e74a190ba 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,5 +1,8 @@
git-annex (3.20111026) UNRELEASED; urgency=low
+ * The default backend used when adding files to the annex is changed
+ from WORM to SHA256.
+ To get old behavior, add a .gitattributes containing: * annex.backend=WORM
* Sped up some operations on remotes that are on the same host.
* copy --to: Fixed leak when copying many files to a remote on the same
host.
diff --git a/doc/backends.mdwn b/doc/backends.mdwn
index ebcdedc2a..2030d107a 100644
--- a/doc/backends.mdwn
+++ b/doc/backends.mdwn
@@ -5,17 +5,19 @@ to retrieve the file's content (its value).
Multiple pluggable key-value backends are supported, and a single repository
can use different ones for different files.
-* `WORM` ("Write Once, Read Many") This assumes that any file with
- the same basename, size, and modification time has the same content.
- This is the default, and the least expensive backend.
-* `SHA1` -- This uses a key based on a sha1 checksum. This allows
+* `SHA256` -- The default backend for new files. This allows
verifying that the file content is right, and can avoid duplicates of
files with the same content. Its need to generate checksums
- can make it slower for large files.
-* `SHA512`, `SHA384`, `SHA256`, `SHA224` -- Like SHA1, but larger
- checksums. Mostly useful for the very paranoid, or anyone who is
- researching checksum collisions and wants to annex their colliding data. ;)
-* `SHA1E`, `SHA512E`, etc -- Variants that preserve filename extension as
+ can make it slower for large files.
+* `WORM` ("Write Once, Read Many") This assumes that any file with
+ the same basename, size, and modification time has the same content.
+ This is the the least expensive backend, recommended for really large
+ files or slow systems.
+* `SHA512` -- Best currently available hash, for the very paranoid.
+* `SHA1` -- Smaller hash than `SHA256` for those who want a checksum
+ but are not concerned about security.
+* `SHA384`, `SHA224` -- Hashes for people who like unusual sizes.
+* `SHA256E`, `SHA1E`, etc -- Variants that preserve filename extension as
part of the key. Useful for archival tasks where the filename extension
contains metadata that should be preserved.
@@ -27,9 +29,11 @@ For finer control of what backend is used when adding different types of
files, the `.gitattributes` file can be used. The `annex.backend`
attribute can be set to the name of the backend to use for matching files.
-For example, to use the SHA1 backend for sound files, which tend to be
-smallish and might be modified or copied over time, you could set in
-`.gitattributes`:
+For example, to use the SHA256 backend for sound files, which tend to be
+smallish and might be modified or copied over time,
+while using the WORM backend for everything else, you could set
+in `.gitattributes`:
- *.mp3 annex.backend=SHA1
- *.ogg annex.backend=SHA1
+ * annex.backend=WORM
+ *.mp3 annex.backend=SHA256
+ *.ogg annex.backend=SHA256
diff --git a/doc/walkthrough/adding_files.mdwn b/doc/walkthrough/adding_files.mdwn
index 77a7fbc15..d1b5a04f7 100644
--- a/doc/walkthrough/adding_files.mdwn
+++ b/doc/walkthrough/adding_files.mdwn
@@ -2,8 +2,8 @@
# cp /tmp/big_file .
# cp /tmp/debian.iso .
# git annex add .
- add big_file ok
- add debian.iso ok
+ add big_file (checksum...) ok
+ add debian.iso (checksum...) ok
# git commit -a -m added
When you add a file to the annex and commit it, only a symlink to
diff --git a/doc/walkthrough/moving_file_content_between_repositories.mdwn b/doc/walkthrough/moving_file_content_between_repositories.mdwn
index 27dffe913..3ffcc1175 100644
--- a/doc/walkthrough/moving_file_content_between_repositories.mdwn
+++ b/doc/walkthrough/moving_file_content_between_repositories.mdwn
@@ -9,5 +9,5 @@ makes it very easy.
move my_cool_big_file (to usbdrive...) ok
# git annex move video/hackity_hack_and_kaxxt.mov --from fileserver
move video/hackity_hack_and_kaxxt.mov (from fileserver...)
- WORM-s86050597-m1274316523--hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02
+ SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e 100% 82MB 199.1KB/s 07:02
ok
diff --git a/doc/walkthrough/unused_data.mdwn b/doc/walkthrough/unused_data.mdwn
index e142b576c..bd6c39871 100644
--- a/doc/walkthrough/unused_data.mdwn
+++ b/doc/walkthrough/unused_data.mdwn
@@ -1,8 +1,8 @@
-It's possible for data to accumulate in the annex that no files point to
-anymore. One way it can happen is if you `git rm` a file without
-first calling `git annex drop`. And, when you modify an annexed file, the old
-content of the file remains in the annex. Another way is when migrating
-between key-value [[backends|backend]].
+It's possible for data to accumulate in the annex that no files in any
+branch point to anymore. One way it can happen is if you `git rm` a file
+without first calling `git annex drop`. And, when you modify an annexed
+file, the old content of the file remains in the annex. Another way is when
+migrating between key-value [[backends|backend]].
This might be historical data you want to preserve, so git-annex defaults to
preserving it. So from time to time, you may want to check for such data and
@@ -12,8 +12,8 @@ eliminate it to save space.
unused . (checking for unused data...)
Some annexed data is no longer used by any files in the repository.
NUMBER KEY
- 1 WORM-s3-m1289672605--file
- 2 WORM-s14-m1289672605--file
+ 1 SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e
+ 2 SHA1-s14--f1358ec1873d57350e3dc62054dc232bc93c2bd1
(To see where data was previously used, try: git log --stat -S'KEY')
(To remove unwanted data: git-annex dropunused NUMBER)
ok
diff --git a/doc/walkthrough/using_ssh_remotes.mdwn b/doc/walkthrough/using_ssh_remotes.mdwn
index fbbbbe070..60011a200 100644
--- a/doc/walkthrough/using_ssh_remotes.mdwn
+++ b/doc/walkthrough/using_ssh_remotes.mdwn
@@ -13,7 +13,7 @@ Now you can get files and they will be transferred (using `rsync` via `ssh`):
# git annex get my_cool_big_file
get my_cool_big_file (getting UUID for origin...) (from origin...)
- WORM-s2159-m1285650548--my_cool_big_file 100% 2159 2.1KB/s 00:00
+ SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e 100% 2159 2.1KB/s 00:00
ok
When you drop files, git-annex will ssh over to the remote and make