summaryrefslogtreecommitdiff
path: root/doc/todo
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2015-11-24 11:39:47 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2015-11-24 11:39:47 -0400
commit026d9f537a9f95d6752d6afc13d149cb594796ea (patch)
tree927126377c418afda4d49f073a705b4742b52160 /doc/todo
parente0319618f008f4838a0636a3289c2d489fda7cf5 (diff)
file map analysis
Diffstat (limited to 'doc/todo')
-rw-r--r--doc/todo/smudge.mdwn81
1 files changed, 65 insertions, 16 deletions
diff --git a/doc/todo/smudge.mdwn b/doc/todo/smudge.mdwn
index 335f69be8..aea0c9b98 100644
--- a/doc/todo/smudge.mdwn
+++ b/doc/todo/smudge.mdwn
@@ -162,8 +162,8 @@ Data:
be a message saying that the file's content is not currently available.
An annex pointer file is checked into the git repository the same way
that an annex symlink is checked in.
-* file2key maps are maintained by git-annex, to keep track of
- what files are pointers at keys.
+* A file map is maintained by git-annex, to keep track of the keys
+ that are used by files in the working tree.
Configuration:
@@ -206,16 +206,16 @@ git-annex clean:
also drop that copy once the object gets uploaded to another repo ...
But that gets complicated quickly.
- Update file2key map.
+ Update file map.
Output the pointer file content to stdout.
git-annex smudge:
-* Run by eg `git checkout` and passed the filename, as well as fed
- the pointer file content on stdin.
+* Run by eg `git checkout`
+ and passed the filename, as well as fed the pointer file content on stdin.
- Updates file2key map.
+ Update file map.
When an object is present in the annex, outputs its content to stdout.
Otherwise, outputs the file pointer content.
@@ -242,16 +242,65 @@ git annex lock/unlock:
itself to break such a hard link. Always finish by locking down the
permissions of the annex object.
-All other git-annex commands that look at annex symlinks to get keys will
-need fall back to checking if a given work tree file is stored in git as
-pointer file. This can be done by checking the file2key map (or by looking
-it up in the index).
-
-Note that I have not verified if file2key maps can be maintained
-consistently using the smudge/clean filters. Seems likely to work,
-based on when I see smudge/clean filters being run. The file2key
-optimisation may not be needed though, looking at the index
-might be fast enough.
+#### file map
+
+The file map needs to map from `Key -> [File]`. `File -> Key`
+seems useful to have, but in practice is not worthwhile.
+
+Drop and get operations need to know what files in the work tree use a
+given key in order to update the work tree.
+
+git-annex commands that look at annex symlinks to get keys to act on will
+need fall back to either consulting the file map, or looking at the staged
+file to see if it's a pointer to a key. So a `File -> Key` map is a possible
+optimisation.
+
+Question: If the smudge/clean filters update the file map incrementally
+based on the pointer files they generate/see, will the result
+always be consistent with the content of the working tree?
+
+This depends on when git calls the smudge/clean filters and on what.
+In particular:
+
+* Does the clean filter always get called when adding a relevant
+ file to git? Yes.
+* Is the clean filter called at any other time? Yes, for example
+ git diff will clean relevant modified files to generate the diff.
+ So, the clean filter may see file versions that have not yet been staged
+ in git.
+* Is the clean filter ever passed content not in the work tree?
+ I don't think so, but not 100% sure.
+* Is the smudge filter always called when git updates a relevant file
+ in the work tree? Yes.
+* Is the smudge filter called at any other time? Seems unlikely but then
+ there could be situations with a detached work tree or such.
+* Does git call any useful hooks when removing a file from the work tree,
+ or converting it to not be annexed?
+ No!
+
+From this analysis, any file map generated by the smudge/clean filters
+is necessary potentially innaccurate. It may list deleted files.
+It may or may not reflect current unstaged changes from the work tree.
+
+Follows that any use of the file map needs to verify the info from it,
+and throw out bad cached info (updating the map to match reality).
+
+When downloading a key, check if the files listed in the file map are
+still pointer files in the work tree, and only replace them with the
+content if so.
+
+When dropping a key, check if the files listed for it in the file map are
+unmodified in the work tree, and are staged as pointers to the key,
+and only reset them to the pointers if so. Note that this means that
+a modified work tree file that has not yet been staged, but that
+corresponds to a key, won't be reset when the key is dropped.
+This is probably not a big deal; the user will either add the
+file, which will add the key back, or reset the file.
+
+Does the `File -> Key` map have any benefits given this innaccuracy?
+Answer seems to be no; any answer that map gives may be innaccurate and
+needs to be verified by looking at actual repo content, so might as well
+just look at the repo content in the first place..
#### Upgrading