summaryrefslogtreecommitdiff
path: root/doc/devblog/day_204__mowing.mdwn
blob: b34f1ba380be69661acf9474611ceebe2719172a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
Remained frustratingly stuck until 3 pm on the same stuff that puzzled
me yesterday. However, 6 hours later, I have the directory
special remote 100% working with both new chunk= and legacy chunksize=
configuration, both with and without encryption.

----

So, the root of why this is was hard, since I thought about it a lot today
in between beating my head into the wall: git-annex's internal API for remotes
is really, really simple. It basically comes down to:

[[!format haskell """
	Remote
		{ storeKey :: Key -> AssociatedFile -> MeterUpdate -> Annex Bool
		, retrieveKeyFile :: Key -> AssociatedFile -> FilePath -> MeterUpdate -> Annex Bool
		, removeKey :: Key -> Annex Bool
		, hasKey :: Key -> Annex (Either String Bool)
		}
"""]]

This simplicity is a Good Thing, because it maps very well to REST-type
services. And it allows for quite a lot of variety in implementations of
remotes. Ranging from reguar git remotes, that rsync files around without
git-annex ever loading them itself, to remotes like webdav that load
and store files themselves, to remotes like tahoe that intentionally do not
support git-annex's built-in encryption methods.

However, the simplicity of that API means that lots of complicated stuff,
like handling chunking, encryption, etc, has to be handled on a per-remote
basis. Or, more generally, by `Remote -> Remote` transformers that take
a remote and add some useful feature to it.

One problem is that the API is so simple that a remote transformer that adds
encryption is not feasible. In fact, every encryptable remote has
had its own code that loads a file from local disk, encrypts it, and sends
it to the remote. Because there's no way to make a remote transformer that
converts a `storeKey` into an encrypted `storeKey`. (Ditto for retrieving
keys.)

I almost made the API more complicated today. Twice. But both times
I ended up not, and I think that was the right choice, even though
it meant I had to write some quite painful code.

----

In the end, I instead wrote a little module that pulls together supporting
both encryption and chunking. I'm not completely happy because those
two things should be independent, and so separate. But, 120 lines of
code that don't keep them separate is not the end of the world.

That module also contains some more powerful, less general APIs, 
that will work well with the kinds of remotes that will use it.

The really nice result, is that the implementation of the directory
special remote melts down from 267 lines of code to just 172! (Plus some
legacy code for the old style chunking, refactored out into a file I can
delete one day.) It's a lot cleaner too.

With all this done, I expect I can pretty easily add the new style chunking
to most git-annex remotes, and remove code from them while doing it!

----

Today's work was sponsored by Mark Hepburn.