summaryrefslogtreecommitdiff
path: root/doc/internals/hashing.mdwn
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2013-03-31 20:13:49 -0400
committerGravatar Joey Hess <joey@kitenet.net>2013-03-31 20:13:49 -0400
commite1d64aa423f4881e239e0025d34b2988ddcfe29a (patch)
treeaac23b23681288943df17cdab705abdbdf1bc79e /doc/internals/hashing.mdwn
parentf979683d030861b3ef8475f440857dadfcc874b2 (diff)
document directory hashes
Diffstat (limited to 'doc/internals/hashing.mdwn')
-rw-r--r--doc/internals/hashing.mdwn34
1 files changed, 34 insertions, 0 deletions
diff --git a/doc/internals/hashing.mdwn b/doc/internals/hashing.mdwn
new file mode 100644
index 000000000..3c1d86b0c
--- /dev/null
+++ b/doc/internals/hashing.mdwn
@@ -0,0 +1,34 @@
+In both the .git/annex directory and the git-annex branch, two levels of
+hash directories are used, to avoid issues with too many files in one
+directory.
+
+Two separate hash methods are used. One, the old hash format, is only used
+for non-bare git repositories. The other, the new hash format, is used for
+bare git repositories, the git-annex branch, and on special remotes as
+well.
+
+## new hash format
+
+This uses two directories, each with a three-letter name, such as "f87/4d5"
+
+The directory names come from the md5sum of the [[key|key_format]].
+
+Note that you cannot use the `md5sum` utility from coreutils to generate
+the same hash. Why it generates something else is unknown. The md5 hash
+libraries for programming languages will work though.
+
+For example:
+
+ python -c 'import hashlib, sys; print hashlib.md5(sys.argv[1]).hexdigest()'
+
+## old hash format
+
+This uses two directories, each with a two-letter name, such as "pX/1J"
+
+It takes the md5sum of the key, but rather than a string, represents it as 4
+32bit words. Only the first word is used. It is converted into a string by the
+same mechanism that would be used to encode a normal md5sum value into a
+string, but where that would normally encode the bits using the 16 characters
+0-9a-f, this instead uses the 32 characters "0123456789zqjxkmvwgpfZQJXKMVWGPF".
+The first 2 letters of the resulting string are the first directory, and the
+second 2 are the second directory.