summaryrefslogtreecommitdiff
path: root/doc/todo/speed_up_fsck.mdwn
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2011-06-29 20:22:19 -0400
committerGravatar Joey Hess <joey@kitenet.net>2011-06-29 20:22:19 -0400
commit8725fde5c66984d9769558a07612361b112be58f (patch)
treedc8d49f131e3c7a5175267bc8c4ab767c8a262c4 /doc/todo/speed_up_fsck.mdwn
parent06a1f5f74286795708b219de8fb080077ff134a7 (diff)
new plan
Diffstat (limited to 'doc/todo/speed_up_fsck.mdwn')
-rw-r--r--doc/todo/speed_up_fsck.mdwn20
1 files changed, 20 insertions, 0 deletions
diff --git a/doc/todo/speed_up_fsck.mdwn b/doc/todo/speed_up_fsck.mdwn
index aceb5868c..e22c01766 100644
--- a/doc/todo/speed_up_fsck.mdwn
+++ b/doc/todo/speed_up_fsck.mdwn
@@ -5,6 +5,8 @@ are slightly slower but are swamped by the normal runtime.
For fsck though, it has to pull each file's location log info out of git.
And, it's typically run on the entire tree.
+Another slow one in `git annex copy --from`.
+
It would be possible to run a single `git cat-file --batch` and pass it
sha1s of location logs for file that is going to be fsked (gotten via
`read-tree`). Then just read its output until the next requested sha1 to
@@ -16,3 +18,21 @@ provide the info on a side channel of some sort.
If this is implemented, the same infrastructure could be used for other
commands like whereis and add. --[[Joey]]
+
+> Updated plan:
+>
+> Run `git ls-file --batch`, and cache its stdin and out handles in Branch
+> state.
+>
+> To see a git-annex branch file, send it something like
+> "git-annex:uuid.log", and read the content fron stdout handle.
+>
+> To detect the end of content, send "TOKEN\n", and look for
+> "TOKEN missing" in its output. A good choice for TOKEN is anything
+> that will never exist in the repo; 40 0's would be a fairly good choice,
+> but even better seems to be something completely invalid and impossible
+> to have as a sha1 or filename or ref: "".
+>
+> Hmm, except that's actually an error message sent to stderr. Unless
+> stderr is connected to stdout, it might be better to look for a known,
+> empty object. Could just add a git-annex:empty file to that end.