diff options
author | Joey Hess <joey@kitenet.net> | 2014-07-28 13:00:46 -0400 |
---|---|---|
committer | Joey Hess <joey@kitenet.net> | 2014-07-28 13:00:46 -0400 |
commit | 494eecccecd5676f2f48c610aad99db6466dbe43 (patch) | |
tree | de12157d74a14fee95820561005f4cbfec33aeba | |
parent | 05430a956cfb6228949735673ca4ef61b50d23e5 (diff) |
chunk log format should be extensible to allow for eg, logging when rolling hash chunks are used
-rw-r--r-- | doc/design/assistant/chunks.mdwn | 39 |
1 files changed, 29 insertions, 10 deletions
diff --git a/doc/design/assistant/chunks.mdwn b/doc/design/assistant/chunks.mdwn index 51fd72177..52ddf07c8 100644 --- a/doc/design/assistant/chunks.mdwn +++ b/doc/design/assistant/chunks.mdwn @@ -160,17 +160,11 @@ Instead of storing the chunk count in the special remote, store it in the git-annex branch. The location log does not record locations of individual chunk keys -(too space-inneficient). -Instead, look at git-annex:aaa/bbb/SHA256-s12345--xxxxxxx.log.cnk to get -the chunk count and size for a key. +(too space-inneficient). Instead, look at a chunk log in the +git-annex branch to get the chunk count and size for a key. -Note that a given remote uuid might have multiple chunk sizes logged, if a -key was stored on it twice using different chunk sizes. Also note that even -when this file exists for a key, the object may be stored non-chunked on -the remote too. - -`hasKey` would check if any one (chunksize, chunkcount) is satisfied by -the files on the remote. It would also check if the non-chunked key is +`hasKey` would check if any of the logged sets of chunks is +present on the remote. It would also check if the non-chunked key is present, as a fallback. When dropping a key from the remote, drop all logged chunk sizes. @@ -185,6 +179,31 @@ remote doesn't know anything about chunk sizes. It uses a little more data in the git-annex branch, although with care (using the same timestamp as the location log), it can compress pretty well. +## chunk log + +Stored in the git-annex branch, this provides a mapping `Key -> [[Key]]`. + +Note that a given remote uuid might have multiple sets of chunks (with +different sizes) logged, if a key was stored on it twice using different +chunk sizes. Also note that even when the log indicates a key is chunked, +the object may be stored non-chunked on the remote too. + +For fixed size chunks, there's no need to store the list of chunk keys, +instead the log only records the number of chunks (needed because the size +of the parent Key may not be known), and the chunk size. + +Example: + + 1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55:10240 9 + +Later, might want to support other kinds of chunks, for example ones made +using a rsync-style rolling checksum. It would probably not make sense to +store the full [Key] list for such chunks in the log. Instead, it might be +stored in a file on the remote. + +To support such future developments, when updating the chunk log, +git-annex should preserve unparsable values (the part after the colon). + ## chunk then encrypt Rather than encrypting the whole object 1st and then chunking, chunk and |