aboutsummaryrefslogtreecommitdiff
path: root/doc/bugs/using_old_remote_format_generates_irritating_output/comment_4_a81f06191bc03a7aad5929af99f0634e._comment
blob: 9ea804767871c3057f248cab5f024968e0b3b6d3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[[!comment format=mdwn
 username="http://joeyh.name/"
 ip="4.252.11.120"
 subject="comment 4"
 date="2012-11-14T17:31:38Z"
 content="""
The new hash directory tree is generated in a simple to explain way. Take the md5sum of the key and the first 3 characters are the first directory, and the next 3 characters are the second directory.

The old hash directory tree is rather harder to explain. It takes the md5sum of the key, but rather than a string, represents it as 4 32bit words. Only the first word is used. It is converted into a string by the same mechanism that would be used to encode a normal md5sum value into a string, but where that would normally encode the bits using the 16 characters 0-9a-f, this instead uses the 32 characters \"0123456789zqjxkmvwgpfZQJXKMVWGPF\". The first 2 letters of the resulting string are the first directory, and the second 2 are the second directory.

There's probably a 1:1 mapping between this special md5 encoding an a regular md5 encoding. But it's certainly easier just to use the existing Haskell implementation of the hash. The following program, which needs to be built inside a git-annex source tree, reads keys on stdin, and outputs their old hash directory tree values, and their new values on stdout.

<pre>
import Locations
import Types.Key
import Utility.Misc

main = interact $ \s -> case file2key $ firstLine s of
        Nothing -> \"bad key\"
        Just k -> hashDirMixed k ++ \" \" ++ hashDirLower k ++ \"\n\"
</pre>

<pre>
joey@gnu:~/src/git-annex>ghc --make convert.hs
joey@gnu:~/src/git-annex>echo WORM--foo | ./ convert
jq/8w/ 2b1/ba3/
</pre>
"""]]