diff options
author | Joey Hess <joey@kitenet.net> | 2014-08-02 17:25:50 -0400 |
---|---|---|
committer | Joey Hess <joey@kitenet.net> | 2014-08-02 17:25:50 -0400 |
commit | d645129f6e573b60e54fb7c35bfe98a87d2eb9d0 (patch) | |
tree | 4ed3f157970605e42fe457bfa67efc825a39ee84 /doc/design/assistant/chunks.mdwn | |
parent | 0eaed261ea11060fc9644400c7f31f8c3ec1052b (diff) | |
parent | 3beefc3b4bc54e0d2a0cc7a4cc0745af13d8014c (diff) |
Merge branch 'master' into newchunks
Diffstat (limited to 'doc/design/assistant/chunks.mdwn')
-rw-r--r-- | doc/design/assistant/chunks.mdwn | 50 |
1 files changed, 40 insertions, 10 deletions
diff --git a/doc/design/assistant/chunks.mdwn b/doc/design/assistant/chunks.mdwn index 454f15f9e..a9709a778 100644 --- a/doc/design/assistant/chunks.mdwn +++ b/doc/design/assistant/chunks.mdwn @@ -160,17 +160,11 @@ Instead of storing the chunk count in the special remote, store it in the git-annex branch. The location log does not record locations of individual chunk keys -(too space-inneficient). -Instead, look at git-annex:aaa/bbb/SHA256-s12345--xxxxxxx.log.cnk to get -the chunk count and size for a key. +(too space-inneficient). Instead, look at a chunk log in the +git-annex branch to get the chunk count and size for a key. -Note that a given remote uuid might have multiple chunk sizes logged, if a -key was stored on it twice using different chunk sizes. Also note that even -when this file exists for a key, the object may be stored non-chunked on -the remote too. - -`hasKey` would check if any one (chunksize, chunkcount) is satisfied by -the files on the remote. It would also check if the non-chunked key is +`hasKey` would check if any of the logged sets of chunks is +present on the remote. It would also check if the non-chunked key is present, as a fallback. When dropping a key from the remote, drop all logged chunk sizes. @@ -185,6 +179,31 @@ remote doesn't know anything about chunk sizes. It uses a little more data in the git-annex branch, although with care (using the same timestamp as the location log), it can compress pretty well. +## chunk log + +Stored in the git-annex branch, this provides a mapping `Key -> [[Key]]`. + +Note that a given remote uuid might have multiple sets of chunks (with +different sizes) logged, if a key was stored on it twice using different +chunk sizes. Also note that even when the log indicates a key is chunked, +the object may be stored non-chunked on the remote too. + +For fixed size chunks, there's no need to store the list of chunk keys, +instead the log only records the number of chunks (needed because the size +of the parent Key may not be known), and the chunk size. + +Example: + + 1287290776.765152s e605dca6-446a-11e0-8b2a-002170d25c55:10240 9 + +Later, might want to support other kinds of chunks, for example ones made +using a rsync-style rolling checksum. It would probably not make sense to +store the full [Key] list for such chunks in the log. Instead, it might be +stored in a file on the remote. + +To support such future developments, when updating the chunk log, +git-annex should preserve unparsable values (the part after the colon). + ## chunk then encrypt Rather than encrypting the whole object 1st and then chunking, chunk and @@ -239,3 +258,14 @@ checking hasKey. Note that this is safe to do only as long as the Key being transferred cannot possibly have 2 different contents in different repos. Notably not necessarily the case for the URL keys generated for quvi. + +Both **done**. + +## parallel + +If 2 remotes both support chunking, uploading could upload different chunks +to them in parallel. However, the chunk log does not currently allow +representing the state where some chunks are on one remote and others on +another remote. + +Parallel downloading of chunks from different remotes is a bit more doable. |