aboutsummaryrefslogtreecommitdiff
path: root/doc/design/adjusted_branches.mdwn
blob: 7216fcbc4aef42d7d0a439d929ac3bd053b2f142 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
Consider two use cases:

1. Using a v6 repo with locked files on a crippled filesystem not
   supporting symlinks. For the files to be usable, they need to be
   unlocked. But, the user may not want to unlock the files everywhere,
   just on this one crippled system.
2. [[todo/hide_missing_files]]

Both of these could be met by making `git-annex sync` maintain an adjusted
version of the original branch, eg `adjusted/master`.

There would be a filter function. For #1 above it would simply convert all
annex symlinks to annex file pointers. For #2 above it would omit files
whose content is not currently in the annex. Sometimes, both #1 and #2 would
be wanted.

[Alternatively, it could stay on the master branch, and only adjust the
work tree and index. See WORKTREE notes below for how this choice would
play out.]

[[!toc]]

## filtering

	master           adjusted/master
	A
	|--------------->A'
	|                |

When generating commit A', reuse the date of A and use a standard author,
committer, and message. This means that two users with the adjusted branch
checked out and using the same filters will get identical shas for A', and
so can collaborate on them.

## commit

When committing changes, a commit is made as usual to the adjusted branch.
So, the user can `git commit` as usual. This does not touch the
original branch yet. 

Then we need to get from that commit to one with the filters reversed,
which should be the same as if the adjusted branch had not been used.
This commit gets added onto the original branch.

So, the branches would look like this:

	master           adjusted/master
	A
	|--------------->A'
	|                |
	|                C (new commit)
	B < - - - - - - -
	|                
	|--------------->B'
	|                |

Note particularly that B does not have A' or C in its history;
the adjusted branch is not evident from outside.

Also note that B gets filtered and the adjusted branch is rebased on top of
it, so C does not remain in the adjusted branch history either. This will
make other checkouts that are in the same adjusted branch end up with the
same B' commit when they pull B.

It might be useful to have a post-commit hook that generates B and B'
and updates the branches. And/or `git-annex sync` could do it.

There may be multiple commits made to the adjusted branch before any get
applied back to the original branch. This is handled by reverse filtering
commits one at a time and rebasing the others on top.

	master           adjusted/master
	A
	|--------------->A'
	|                |
	|                C1
	|                |
	|                C2


	master           adjusted/master
	A
	|--------------->A'
	|                |
	|                C1
	B1< - - - - - - -
	|
	|--------------->B1'
	|                |
	|                C2'
	B2< - - - - - - -
	|
	|--------------->B2'


[WORKTREE: A pre-commit hook would be needed to update the staged changes, 
reversing the filter before the commit is made. All the other complications
above are avoided.]

## merge

This would be done by `git annex merge` and `git annex sync`, with the goal
of merging origin/master into master, and updating adjusted/master.

Note that the adjusted files db needs to be updated to reflect the changes
that are merged in, for object add/remove to work as described below.

When merging, there should never be any commits present on the
adjusted/master branch that have not yet been filtered over to the master
branch. If there are any such commits, just filter them into master before
beginning the merge. There may be staged changes, or changes in the work tree.

First filter the new commit:

	origin/master    adjusted/master
	A
	|--------------->A'
	|                |
	|                |
	B
	|                
	|---------->B'

Then, merge that into adjusted/master:

	origin/master    adjusted/master
	A
	|--------------->A'
	|                |
	|                |
	B                |
	|                |
	|----------->B'->B''

That merge will take care of updating the work tree.

(What if there is a merge conflict between A' and B'? Normally such a merge
conflict should only affect the work tree/index, so can be resolved without
making a commit, but B'' may end up being made to resolve a merge
conflict.)

------

FIXME: When an adjusted unlocked branch has gotten a file, and a new
commit is merged in, that does not touch that file, there is a false merge
conflict on the file. It's auto-resolved by creating a .variant file.
This is probably a bug in the auto-resolve code for v6 files.

Test case:

	git clone ~/lib/tmp
	cd tmp
	git annex upgrade
	git annex adjust
	git annex get t/foo
	# make change in ~/lib/tmp and commit
	git annex sync
	# t/foo.variant-* is there

------



Once the merge is done, we have a commit B'' on adjusted/master. To finish,
adjust that commit so it does not have adjusted/master as its parent.

	origin/master    adjusted/master
	A
	|--------------->A'
	|                |
	|                |
	B
	|                
	|--------------->B''
	|                |

Finally, update master, by reverse filtering B''. TODO

Notice how similar this is to the commit graph. So, "fast-forward" 
merging the same B commit from origin/master will lead to an identical
sha for B' as the original committer got.

Since the adjusted/master branch is not present on the remote, if the user
does a `git pull`, it won't merge in changes from origin/master. Which is
good because the filter needs to be applied first.

However, if the user does `git merge origin/master`, they'll get into a
state where the filter has not been applied. The post-merge hook could be
used to clean up after that. Or, let the user foot-shoot this way; they can
always reset back once they notice the mistake.

[WORKTREE: `git pull` would update the work tree, and may lead to conflicts
between the adjusted work tree and pulled changes. A post-merge hook would
be needed to re-adjust the work tree, and there would be a window where eg,
not present files would appear in the work tree.]

## annex object add/remove

When objects are added/removed from the annex, the associated file has to
be looked up, and the filter applied to it. So, dropping a file with the
missing file filter would cause it to be removed from the adjusted branch,
and receiving a file's content would cause it to appear in the adjusted
branch.

These changes would need to be committed to the adjusted branch, otherwise
`git diff` would show them.

[WORKTREE: Simply adjust the work tree (and index) per the filter.]

## reverse filtering

Reversing filter #1 would mean only converting pointer files to
symlinks when the file was originally a symlink. This is problimatic when a
file is renamed. Would it be ok, if foo is renamed to bar and bar is
committed, for it to be committed as an unlocked file, even if foo was
originally locked? Probably.

Reversing filter #2 would mean not deleting removed files whose content was
not present. When the commit includes deletion of files that were removed
due to their content not being present, those deletions are not propigated.
When the user deletes an unlocked file, the content is still
present in annex, so reversing the filter should propigate the file
deletion. 

What if an object was sent to the annex (or removed from the annex)
after the commit and before the reverse filtering? This would cause the
reverse filter to draw the wrong conclusion. Maybe look at a list of what
objects were not present when applying the filter, and use that to decide
which to not delete when reversing it?

So, a reverse filter may need some state that was collected when running
the filter forwards, in order to decide what to do.

Alternatively, instead of reverse filtering the whole adjusted tree,
look at just the new commit that's being propigated back from the
adjusted to master branch. Get the diff from it to the previous
commit; the changes that were made. Then de-adjust those changes,
and apply the changes to the master branch.

## push

The new master branch can then be pushed out to remotes. The
adjusted/master branch is not pushed to remotes. `git-annex sync` should
automatically push master when adjusted/master is checked out.

When push.default is "simple" (the new default), running `git push` when in
adjusted/master won't push anything. It would with "matching". Pity. (I
continue to feel git picked the wrong default here.) Users may find that
surprising. Users of `git-annex sync` won't need to worry about it though.

[WORKTREE: push works as usual]

## acting on filtered-out files

If a file is filtered out due to not existing, there should be a way
for `git annex get` to get it. Since the filtered out file is not in the
index, that would not normally work. What to do?

Maybe instead of making a branch where the file is deleted, it would be
better to delete it from the work tree, but keep the branch as-is. Then
`git annex get` would see the file, as it's in the index. 

But, not maintaining an adjusted branch complicates other things. See
WORKTREE notes throughout this page. Overall, the WORKTREE approach seems
too problimatic.

Ah, but we know that when filter #2 is in place, any file that `git annex
get` could act on is not in the index. So, it could look at the master branch
instead. (Same for `git annex move --from` and `git annex copy --from`)

OTOH, if filter #1 is in place and not #2, a file might be renamed in the
index, and `git annex get $newname` should work. So, it should look at the
index in that case.

## problems

Using `git checkout` when in an adjusted branch is problimatic, because a
non-adjusted branch would then be checked out. But, we can just say, if
you want to get into an adjusted branch, you have to run some command.
Or, could make a post-checkout hook.

Tags are bit of a problem. If the user tags an ajusted branch, the tag
includes the local adjustments.  
[WORKTREE: not a problem]

If the user refers to commit shas (in, eg commit messages), those won't be
visible to anyone else.  
[WORKTREE: not a problem]

When a pull modifies a file, its content won't be available, and so it
would be hidden temporarily by filter #2. So the file would seem to vanish,
and come back later, which could be confusing. Could be fixed as discussed
in [[todo/deferred_update_mode]]. Arguably, it's just as confusing for the
file to remain visible but have its content temporarily replaced with a
annex pointer.

## integration with view branches

Entering a view from an adjusted branch should probably carry the filtering
over into the creation/updating of the view branch.

Could go a step further, and implement view branches as another branch
adjusting filter, albeit an extreme one. This might improve view branches.
For example, it's not currently possible to update a view branch with
changes fetched from a remote, and this could get us there.

This would need the reverse filter to be able to change metadata.

[WORKTREE: Wouldn't be able to integrate, unless view branches are changed
into adjusted view worktrees.]

## filter interface

Distilling all of the above, the filter interface needs to be something
like this, at its most simple:

	data Filter = UnlockFilter | HideMissingFilter | UnlockHideMissingFilter

	getFilter :: Annex Filter

	setFilter :: Filter -> Annex ()

	data FilterAction
		= UnchangedFile FilePath
		| UnlockFile FilePath
		| HideFile FilePath

	data FileInfo = FileInfo
		{ originalBranchFile :: FileStatus
		, isContentPresent :: Bool
		}

	data FileStatus = IsAnnexSymlink | IsAnnexPointer
		deriving (Eq)

	filterAction :: Filter -> FilePath -> FileInfo -> FilterAction
	filterAction UnlockFilter f fi
		| originalBranchFile fi == IsAnnexSymlink = UnlockFile f
	filterAction HideMissingFilter f fi
		| not (isContentPresent fi) = HideFile f
	filterAction UnlockHideMissingFilter f fi
		| not (isContentPresent fi) = HideFile f
		| otherwise = filterAction UnlockFilter f fi
	filterAction _ f _ = UnchangedFile f

	filteredCommit :: Filter -> Git.Commit -> Git.Commit

	-- Generate a version of the commit made on the filter branch
	-- with the filtering of modified files reversed.
	unfilteredCommit :: Filter -> Git.Commit -> Git.Commit

## TODOs

* Need a better command-line interface than `git annex adjust`,
  that allows picking adjustments.
* Interface in webapp to enable adjustments.
* Entering an adjusted branch can race with commits to the current branch,
  and so the assistant should not be running, or at least should have
  commits disabled when entering it.
* When the adjusted branch unlocks files, behave as if annex.addunlocked is
  set, so git annex add will add files unlocked.