summaryrefslogtreecommitdiff
path: root/doc/devblog/day_-4__forgetting.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'doc/devblog/day_-4__forgetting.mdwn')
-rw-r--r--doc/devblog/day_-4__forgetting.mdwn80
1 files changed, 80 insertions, 0 deletions
diff --git a/doc/devblog/day_-4__forgetting.mdwn b/doc/devblog/day_-4__forgetting.mdwn
new file mode 100644
index 000000000..9cec51475
--- /dev/null
+++ b/doc/devblog/day_-4__forgetting.mdwn
@@ -0,0 +1,80 @@
+Yesterday I spent making a release, and shopping for a new laptop, since
+this one is dying. (Soon I'll be able to compile git-annex fast-ish! Yay!)
+And thinking about [[todo/wishlist:_dropping_git-annex_history]].
+
+Today, I added the `git annex forget` command. It's currently been lightly
+tested, seems to work, and is living in the `forget` branch until I gain
+confidence with it. It should be perfectly safe to use, even if it's buggy,
+because you can use `git reflog git-annex` to pull out and revert to an old
+version of your git-annex branch. So if you're been wanting this feature,
+please beta test!
+
+----
+
+I actually implemented something more generic than just forgetting git
+history. There's now a whole mechanism for git-annex doing distributed
+transitions of whatever sort is needed.
+
+There were several subtleties involved in distributed transitions:
+
+First is how to tell when a given transition has already been done on a
+branch. At first I was thinking that the transition log should include the
+sha of the first commit on the old branch that got rewritten. However, that
+would mean that after a single transition had been done, every git-annex
+branch merge would need to look up the first commit of the current branch,
+to see if it's done the transition yet. That's slow! Instead, transitions
+are logged with a timestamp, and as long as a branch contains a transition
+with the same timestamp, it's been done.
+
+A really tricky problem is what to do if the local repository has
+transitioned, but a remote has not, and changes keep being made to the
+remote. What it does so far is incorporate the changes from the remote into
+the index, and re-run the transition code over the whole thing to yeild a
+single new commit. This might not be very efficient (once I write the more
+full-featured transition code), but it lets the local repo keep up with
+what's going on in the remote, without directly merging with it (which
+would revert the transition). And once the remote repository has its
+git-annex upgraded to one that knows about transitions, it will finish up
+the transition on its side automatically, and the two branches will once
+again merge.
+
+Related to the previous problem, we don't want to keep trying to merge
+from a remote branch when it's not yet transitioned. So a blacklist is
+used, of untransitioned commits that have already been integrated.
+
+One really subtle thing is that when the user does a transition more
+complicated than `git annex forget`, like the `git annex forget --dead`
+that I need to implement to forget dead remotes, they're not just telling
+git-annex to forget whatever dead remotes it knows right now. They're
+actually telling git-annex to perform the transition one time on every
+existing clone of the repository, at some point in the future. Repositories
+with unfinished transitions could hang around for years, and at some future
+point when git-annex runs in the repository again, it would merge in the
+current state of the world, and re-do the transition. So you might tell it
+to forget dead remotes today, and then the very repository you ran that in
+later becomes dead, and a long-slumbering repo wakes up and forgets about
+the repo that started the whole process! I hope users don't find this
+massively confusing, but that's how the implementation works right now.
+
+----
+
+I think I have at least two more days of work to do to finish up this
+feature.
+
+* I still need to add some extra features like forgetting about dead remotes,
+ and forgetting about keys that are no longer present on any remote.
+
+* After `git annex forget`, `git annex sync`
+ will fail to push the synced/annex branch to remotes, since the branch
+ is no longer a fast-forward of the old one. I will probably fix this by
+ making `git annex sync` do a fallback push of a unique branch in this case,
+ like the assistant already does. Although I may need to adjust that code
+ to handle this case, too..
+
+* For some reason the automatic transitioning code triggers
+ a "(recovery from race)" commit. This is certianly a bug somewhere,
+ because you can't have a race with only 1 participant.
+
+----
+
+Today's work was sponsored by Richard Hartmann.