summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/design/assistant/deltas.mdwn24
-rw-r--r--doc/devblog/day_206__zap.mdwn83
2 files changed, 104 insertions, 3 deletions
diff --git a/doc/design/assistant/deltas.mdwn b/doc/design/assistant/deltas.mdwn
index ff4185a18..0f7d308b8 100644
--- a/doc/design/assistant/deltas.mdwn
+++ b/doc/design/assistant/deltas.mdwn
@@ -4,6 +4,24 @@ One simple way is to find the key of the old version of a file that's
being transferred, so it can be used as the basis for rsync, or any
other similar transfer protocol.
-For remotes that don't use rsync, a poor man's version could be had by
-chunking each object into multiple parts. Only modified parts need be
-transferred. Sort of sub-keys to the main key being stored.
+For remotes that don't use rsync, use a rolling checksum based chunker,
+such as BuzHash. This will produce [[chunks]], which can be stored on the
+remote as regular Keys -- where unlike the fixed size chunk keys, the
+SHA256 part of these keys is the checksum of the chunk they contain.
+
+Once that's done, it's easy to avoid uploading chunks that have been sent
+to the remote before.
+
+When retriving a new version of a file, there would need to be a way to get
+the list of chunk keys that constitute the new version. Probably best to
+store this list on the remote. Then there needs to be a way to find which
+of those chunks are available in locally present files, so that the locally
+available chunks can be extracted, and combined with the chunks that need
+to be downloaded, to reconstitute the file.
+
+To find which chucks are locally available, here are 2 ideas:
+
+1. Use a single basis file, eg an old version of the file. Re-chunk it, and
+ use its chunks. Slow, but simple.
+2. Some kind of database of locally available chunks. Would need to be kept
+ up-to-date as files are added, and as files are downloaded.
diff --git a/doc/devblog/day_206__zap.mdwn b/doc/devblog/day_206__zap.mdwn
new file mode 100644
index 000000000..eccee2464
--- /dev/null
+++ b/doc/devblog/day_206__zap.mdwn
@@ -0,0 +1,83 @@
+Zap! ... My internet gateway was [destroyed by lightning](https://identi.ca/joeyh/note/xogvXTFDR9CZaCPsmKZipA).
+Limping along regardless, and replacement ordered.
+
+Got resuming of uploads to chunked remotes working. Easy!
+
+----
+
+Next I want to convert the external special remotes to have these nice
+new features. But there is a wrinkle: The new chunking interface works
+entirely on ByteStrings containing the content, but the external special
+remote interface passes content around in files.
+
+I could just make it write the ByteString to a temp file, and pass the temp
+file to the external special remote to store. But then, when chunking is
+not being used, it would pointlessly read a file's content, only to write
+it back out to a temp file.
+
+Similarly, when retrieving a key, the external special remote saves it to a
+file. But we want a ByteString. Except, when not doing chunking or
+encryption, letting the external special remote save the content directly
+to a file is optimal.
+
+One approach would be to change the protocol for external special
+remotes, so that the content is sent over the protocol rather than in temp
+files. But I think this would not be ideal for some kinds of external
+special remotes, and it would probably be quite a lot slower and more
+complicated.
+
+Instead, I am playing around with some type class trickery:
+
+[[!format haskell """
+{-# LANGUAGE Rank2Types TypeSynonymInstances FlexibleInstances MultiParamTypeClasses #-}
+
+type Storer p = Key -> p -> MeterUpdate -> IO Bool
+
+-- For Storers that want to be provided with a file to store.
+type FileStorer a = Storer (ContentPipe a FilePath)
+
+-- For Storers that want to be provided with a ByteString to store
+type ByteStringStorer a = Storer (ContentPipe a L.ByteString)
+
+class ContentPipe src dest where
+ contentPipe :: src -> (dest -> IO a) -> IO a
+
+instance ContentPipe L.ByteString L.ByteString where
+ contentPipe b a = a b
+
+-- This feels a lot like I could perhaps use pipes or conduit...
+instance ContentPipe FilePath FilePath where
+ contentPipe f a = a f
+
+instance ContentPipe L.ByteString FilePath where
+ contentPipe b a = withTmpFile "tmpXXXXXX" $ \f h -> do
+ L.hPut h b
+ hClose h
+ a f
+
+instance ContentPipe FilePath L.ByteString where
+ contentPipe f a = a =<< L.readFile f
+"""]]
+
+The external special remote would be a FileStorer, so when a non-chunked,
+non-encrypted file is provided, it just runs on the FilePath with no extra
+work. While when a ByteString is provided, it's swapped out to a temp file
+and the temp file provided. And many other special remotes are ByteStorers,
+so they will just pass the provided ByteStream through, or read in the
+content of a file.
+
+I think that would work. Thoigh it is not optimal for external special
+remotes that are chunked but not encrypted. For that case, it might be worth
+extending the special remote protocol with a way to say "store a chunk of
+this file from byte N to byte M".
+
+---
+
+Also, talked with ion about what would be involved in using rolling checksum
+based chunks. That would allow for rsync or zsync like behavior, where
+when a file changed, git-annex uploads only the chunks that changed, and the
+unchanged chunks are reused.
+
+I am not ready to work on that yet, but I made some changes to the parsing
+of the chunk log, so that additional chunking schemes like this can be added
+to git-annex later without breaking backwards compatability.