summaryrefslogtreecommitdiff
path: root/doc/todo
diff options
context:
space:
mode:
Diffstat (limited to 'doc/todo')
-rw-r--r--doc/todo/smudge.mdwn49
1 files changed, 32 insertions, 17 deletions
diff --git a/doc/todo/smudge.mdwn b/doc/todo/smudge.mdwn
index f78b215ac..2f5d21d7e 100644
--- a/doc/todo/smudge.mdwn
+++ b/doc/todo/smudge.mdwn
@@ -41,18 +41,17 @@ efficiently using the existing code in `Branch.catFile`.
### efficiency
+#### clean
+
The trick is doing it efficiently. Since git a2b665d, v1.7.4.1,
something like this works to provide a filename to the clean script:
git config --global filter.huge.clean huge-clean %f
-This avoids it needing to read all the current file content from stdin
+This could avoid it needing to read all the current file content from stdin
when doing eg, a git status or git commit. Instead it is passed the
filename that git is operating on, in the working directory.
-(The smudge script can also be provided a filename with %f, but it
-cannot directly write to the file or git gets unhappy.)
-
So, WORM could just look at that file and easily tell if it is one
it already knows (same mtime and size). If so, it can short-circuit and
do nothing, file content is already cached.
@@ -61,6 +60,21 @@ SHA1 has a harder job. Would not want to re-sha1 the file every time,
probably. So it'd need a local cache of file stat info, mapped to known
objects.
+But: Even with %f, git actually passes the full file content to the clean
+filter, and if it fails to consume it all, it will crash (may only happen
+if the file is larger than some chunk size; tried with 500 mb file and
+saw a SIGPIPE.) This means unnecessary works needs to be done,
+and it slows down *everything*, from `git status` to `git commit`.
+**showstopper** I have sent a patch to the git mailing list to address
+this.
+
+#### smudge
+
+The smudge script can also be provided a filename with %f, but it
+cannot directly write to the file or git gets unhappy.
+
+
+
### dealing with partial content availability
The smudge filter cannot be allowed to fail, that leaves the tree and
@@ -82,13 +96,13 @@ huge-smudge:
<pre>
#!/bin/sh
-read sha1
+read f
file="$1"
-echo "smudging $sha1" >&2
-if [ -e ~/$sha1 ]; then
- cat ~/$sha1 # possibly expensive copy here
+echo "smudging $f" >&2
+if [ -e ~/$f ]; then
+ cat ~/$f # possibly expensive copy here
else
- echo "$sha1 not available"
+ echo "$f not available"
fi
</pre>
@@ -96,16 +110,17 @@ huge-clean:
<pre>
#!/bin/sh
-temp="$1"
-if grep -q 'not available' "$temp"; then
- awk '{print $1}' "$temp" # provide what we would if the content were avail!
+file="$1"
+# in real life, this should be done more efficiently, not trying to read
+# the whole file content!
+if grep -q 'not available' "$file"; then
+ awk '{print $1}' "$file" # provide what we would if the content were avail!
exit 0
fi
-sha1=`sha1sum "$temp" | cut -d' ' -f1`
-echo "cleaning $sha1" >&2
-ls -l "$temp" >&2
-ln -f "$temp" ~/$sha1 # can't delete temp file
-echo $sha1
+echo "cleaning $file" >&2
+ls -l "$file" >&2
+ln -f "$file" ~/$file # can't delete temp file
+echo $file
</pre>
.gitattributes: