doc/design/assistant/blog/day_83__3-way.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

Syncing works well when the graph of repositories is strongly connected.
Now I'm working on making it work reliably with less connected graphs.

I've been focusing on and testing a doubly-connected list of repositories,
such as: `A <-> B <-> C`

----

I was seeing a lot of git-annex branch push failures occuring in
this line of repositories topology. Sometimes was is able to recover from
these, but when two repositories were trying to push to one-another at the
same time, and both failed, both would pull and merge, which actually keeps
the git-annex branch still diverged. (The two merge commits differ.)

A large part of the problem was that it pushed directly into the git-annex
branch on the remote; the same branch the remote modifies. I changed it to
push to `synced/git-annex` on the remote, which avoids most push failures.
Only when A and C are both trying to push into `B/synced/git-annex` at the
same time would one fail, and need to pull, merge, and retry.

-----

With that change, git syncing always succeeded in my tests, and without
needing any retries. But with more complex sets of repositories, or more
traffic, it could still fail.

I want to avoid repeated retries, exponential backoffs, and that kind of
thing. It'd probably be good enough, but I'm not happy with it because
it could take arbitrarily long to get git in sync.

I've settled on letting it retry once to push to the synced/git-annex
and synced/master branches. If the retry fails, it enters a fallback mode,
which is guaranteed to succeed, as long as the remote is accessible.

The problem with the fallback mode is it uses really ugly branch names.
Which is why Joachim Breitner and I originally decided on making `git annex
sync` use the single `synced/master` branch, despite the potential for
failed syncs. But in the assistant, the requirements are different,
and I'm ok with the uglier names.

It does seem to make sense to only use the uglier names as a fallback,
rather than by default. This preserves compatability with `git annex sync`,
and it allows the assistant to delete fallback sync branches after it's
merged them, so the ugliness is temporary.

---

Also worked some today on a bug that prevents C from receiving files
added to A.

The problem is that file contents and git metadata sync independantly. So C
will probably receive the git metadata from B before B has finished
downloading the file from A. C would normally queue a download of the
content when it sees the file appear, but at this point it has nowhere to
get it from.

My first stab at this was a failure. I made each download of a file result
in uploads of the file being queued to every remote that doesn't have it
yet. So rather than C downloading from B, B uploads to C. Which works fine,
but then C sees this download from B has finished, and proceeds to try to
re-upload to B. Which rejects it, but notices that this download has
finished, so re-uploads it to C...

The problem with that approach is that I don't have an event when a download
succeeds, just an event when a download ends.  Of course, C could skip
uploading back to the same place it just downloaded from, but loops are
still possible with other network topologies (ie, if D is connected to both
B and C, there would be an upload loop 'B -> C -> D -> B`). So unless I can
find a better event to hook into, this idea is doomed.

I do have another idea to fix the same problem. C could certianly remember
that it saw a file and didn't know where to get the content from, and then
when it receives a git push of a git-annex branch, try again.