diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/todo/smudge.mdwn | 46 |
1 files changed, 35 insertions, 11 deletions
diff --git a/doc/todo/smudge.mdwn b/doc/todo/smudge.mdwn index c51662b28..f78b215ac 100644 --- a/doc/todo/smudge.mdwn +++ b/doc/todo/smudge.mdwn @@ -19,6 +19,26 @@ add` files, and just being able to use `git add` or `git commit -a`, and have it use git-annex when .gitattributes says to. Also, annexed files can be directly modified without having to `git annex unlock`. +### design + +In .gitattributes, the user would put something like "* filter=git-annex". +This way they could control which files are annexed vs added normally. + +(git-annex could have further controls to allow eg, passing small files +through to regular processing. At least .gitattributes is a special case, +it should never be annexed...) + +For files not configured this way, git-annex could continue to use +its symlink method -- this would preserve backwards compatability, +and even allow mixing the two methods in a repo as desired. + +To find files in the repository that are annexed, git-annex would do +`ls-files` as now, but would check if found files have the appropriate +filter, rather than the current symlink checks. To determine the key +of a file, rather than reading its symlink, git-annex would need to +look up the git blob associated with the file -- this can be done +efficiently using the existing code in `Branch.catFile`. + ### efficiency The trick is doing it efficiently. Since git a2b665d, v1.7.4.1, @@ -30,12 +50,16 @@ This avoids it needing to read all the current file content from stdin when doing eg, a git status or git commit. Instead it is passed the filename that git is operating on, in the working directory. +(The smudge script can also be provided a filename with %f, but it +cannot directly write to the file or git gets unhappy.) + So, WORM could just look at that file and easily tell if it is one it already knows (same mtime and size). If so, it can short-circuit and do nothing, file content is already cached. SHA1 has a harder job. Would not want to re-sha1 the file every time, -probably. So it'd need a cache of file stat info, mapped to known objects. +probably. So it'd need a local cache of file stat info, mapped to known +objects. ### dealing with partial content availability @@ -59,9 +83,10 @@ huge-smudge: <pre> #!/bin/sh read sha1 +file="$1" echo "smudging $sha1" >&2 if [ -e ~/$sha1 ]; then - cat ~/$sha1 + cat ~/$sha1 # possibly expensive copy here else echo "$sha1 not available" fi @@ -71,16 +96,15 @@ huge-clean: <pre> #!/bin/sh -cat >temp -if grep -q 'not available' temp; then - awk '{print $1}' temp # provide what we would if the content were avail! - rm temp +temp="$1" +if grep -q 'not available' "$temp"; then + awk '{print $1}' "$temp" # provide what we would if the content were avail! exit 0 fi -sha1=`sha1sum temp | cut -d' ' -f1` +sha1=`sha1sum "$temp" | cut -d' ' -f1` echo "cleaning $sha1" >&2 -ls -l temp >&2 -mv temp ~/$sha1 +ls -l "$temp" >&2 +ln -f "$temp" ~/$sha1 # can't delete temp file echo $sha1 </pre> @@ -94,6 +118,6 @@ in .git/config: <pre> [filter "huge"] - clean = huge-clean - smudge = huge-smudge + clean = huge-clean %f + smudge = huge-smudge %f <pre> |