summaryrefslogtreecommitdiff
path: root/doc/design
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2014-07-28 17:11:37 -0400
committerGravatar Joey Hess <joey@kitenet.net>2014-07-28 17:11:37 -0400
commit92154d3401963469c6cd251c98194690241055b6 (patch)
tree84e5ea1e4b28c63e829089622cef71390b9ab51e /doc/design
parent8e6025ff897345a9824575d0afd9510cbf1572f1 (diff)
expand to rolling hash based design
Diffstat (limited to 'doc/design')
-rw-r--r--doc/design/assistant/deltas.mdwn24
1 files changed, 21 insertions, 3 deletions
diff --git a/doc/design/assistant/deltas.mdwn b/doc/design/assistant/deltas.mdwn
index ff4185a18..0f7d308b8 100644
--- a/doc/design/assistant/deltas.mdwn
+++ b/doc/design/assistant/deltas.mdwn
@@ -4,6 +4,24 @@ One simple way is to find the key of the old version of a file that's
being transferred, so it can be used as the basis for rsync, or any
other similar transfer protocol.
-For remotes that don't use rsync, a poor man's version could be had by
-chunking each object into multiple parts. Only modified parts need be
-transferred. Sort of sub-keys to the main key being stored.
+For remotes that don't use rsync, use a rolling checksum based chunker,
+such as BuzHash. This will produce [[chunks]], which can be stored on the
+remote as regular Keys -- where unlike the fixed size chunk keys, the
+SHA256 part of these keys is the checksum of the chunk they contain.
+
+Once that's done, it's easy to avoid uploading chunks that have been sent
+to the remote before.
+
+When retriving a new version of a file, there would need to be a way to get
+the list of chunk keys that constitute the new version. Probably best to
+store this list on the remote. Then there needs to be a way to find which
+of those chunks are available in locally present files, so that the locally
+available chunks can be extracted, and combined with the chunks that need
+to be downloaded, to reconstitute the file.
+
+To find which chucks are locally available, here are 2 ideas:
+
+1. Use a single basis file, eg an old version of the file. Re-chunk it, and
+ use its chunks. Slow, but simple.
+2. Some kind of database of locally available chunks. Would need to be kept
+ up-to-date as files are added, and as files are downloaded.