summaryrefslogtreecommitdiff
path: root/doc/design
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2014-07-28 13:00:46 -0400
committerGravatar Joey Hess <joey@kitenet.net>2014-07-28 13:00:46 -0400
commit494eecccecd5676f2f48c610aad99db6466dbe43 (patch)
treede12157d74a14fee95820561005f4cbfec33aeba /doc/design
parent05430a956cfb6228949735673ca4ef61b50d23e5 (diff)
chunk log format should be extensible to allow for eg, logging when rolling hash chunks are used
Diffstat (limited to 'doc/design')
-rw-r--r--doc/design/assistant/chunks.mdwn39
1 files changed, 29 insertions, 10 deletions
diff --git a/doc/design/assistant/chunks.mdwn b/doc/design/assistant/chunks.mdwn
index 51fd72177..52ddf07c8 100644
--- a/doc/design/assistant/chunks.mdwn
+++ b/doc/design/assistant/chunks.mdwn
@@ -160,17 +160,11 @@ Instead of storing the chunk count in the special remote, store it in
the git-annex branch.
The location log does not record locations of individual chunk keys
-(too space-inneficient).
-Instead, look at git-annex:aaa/bbb/SHA256-s12345--xxxxxxx.log.cnk to get
-the chunk count and size for a key.
+(too space-inneficient). Instead, look at a chunk log in the
+git-annex branch to get the chunk count and size for a key.
-Note that a given remote uuid might have multiple chunk sizes logged, if a
-key was stored on it twice using different chunk sizes. Also note that even
-when this file exists for a key, the object may be stored non-chunked on
-the remote too.
-
-`hasKey` would check if any one (chunksize, chunkcount) is satisfied by
-the files on the remote. It would also check if the non-chunked key is
+`hasKey` would check if any of the logged sets of chunks is
+present on the remote. It would also check if the non-chunked key is
present, as a fallback.
When dropping a key from the remote, drop all logged chunk sizes.
@@ -185,6 +179,31 @@ remote doesn't know anything about chunk sizes. It uses a little more
data in the git-annex branch, although with care (using the same timestamp
as the location log), it can compress pretty well.
+## chunk log
+
+Stored in the git-annex branch, this provides a mapping `Key -> [[Key]]`.
+
+Note that a given remote uuid might have multiple sets of chunks (with
+different sizes) logged, if a key was stored on it twice using different
+chunk sizes. Also note that even when the log indicates a key is chunked,
+the object may be stored non-chunked on the remote too.
+
+For fixed size chunks, there's no need to store the list of chunk keys,
+instead the log only records the number of chunks (needed because the size
+of the parent Key may not be known), and the chunk size.
+
+Example:
+
+ 1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55:10240 9
+
+Later, might want to support other kinds of chunks, for example ones made
+using a rsync-style rolling checksum. It would probably not make sense to
+store the full [Key] list for such chunks in the log. Instead, it might be
+stored in a file on the remote.
+
+To support such future developments, when updating the chunk log,
+git-annex should preserve unparsable values (the part after the colon).
+
## chunk then encrypt
Rather than encrypting the whole object 1st and then chunking, chunk and