diff options
Diffstat (limited to 'doc')
4 files changed, 58 insertions, 9 deletions
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index b3d671bb8..2d0d2597e 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -241,15 +241,20 @@ subdirectories). * find [path ...] - Outputs a list of annexed files whose content is currently present. - Or, if a file matching option is specified, outputs a list of all - matching files, whether or not their content is currently present. + Outputs a list of annexed files in the specified path. With no path, + finds files in the current directory and its subdirectories. - With no parameters, defaults to finding all files in the current directory - and its subdirectories. + By default, only lists annexed files whose content is currently present. + This can be changed by specifying file matching options. To list all + annexed files, present or not, specify --include "*". To list all + annexed files whose content is not present, specify --not --in "." To output filenames terminated with nulls, for use with xargs -0, - specify --print0. + specify --print0. Or, a custom output formatting can be specified using + --format. The default output format is the same as --format='${file}\\n' + + These variables are available for use in formats: file, key, backend, + bytesize, humansize * whereis [path ...] @@ -427,6 +432,16 @@ subdirectories). are in the annex, their backend is known and this option is not necessary. +* --format=value + + Specifies a custom output format. The value is a format string, + in which '${var}' is expanded to the value of a variable. To right-justify + a variable with whitespace, use '${var;width}' ; to left-justify + a variable, use '${var;-width}'; to escape unusual characters in a variable, + use '${escaped_var}' + + Also, '\\n' is a newline, '\\000' is a NULL, etc. + * -c name=value Used to override git configuration settings. May be specified multiple times. @@ -447,7 +462,16 @@ file contents are present at either of two repositories. * --exclude=glob Skips files matching the glob pattern. The glob is matched relative to - the current directory. For example: --exclude='*.mp3' --exclude='subdir/*' + the current directory. For example: + + --exclude='*.mp3' --exclude='subdir/*' + +* --include=glob + + Skips files not matching the glob pattern. (Same as --not --exclude.) + For example, to include only mp3 and ogg files: + + --include='*.mp3' --or --include='*.ogg' * --in=repository diff --git a/doc/tips/finding_duplicate_files.mdwn b/doc/tips/finding_duplicate_files.mdwn new file mode 100644 index 000000000..94fc85400 --- /dev/null +++ b/doc/tips/finding_duplicate_files.mdwn @@ -0,0 +1,21 @@ +Maybe you had a lot of files scattered around on different drives, and you +added them all into a single git-annex repository. Some of the files are +surely duplicates of others. + +While git-annex stores the file contents efficiently, it would still +help in cleaning up this mess if you could find, and perhaps remove +the duplicate files. + +Here's a command line that will show duplicate sets of files grouped together: + + git annex find --include '*' --format='${file} ${escaped_key}\n' | \ + sort -k2 | uniq --all-repeated=separate -f1 | \ + sed 's/ [^ ]*$//' + +Here's a command line that will remove one of each duplicate set of files: + + git annex find --include '*' --format='${file} ${escaped_key}\n' | \ + sort -k2 | uniq --repeated -f1 | sed 's/ [^ ]*$//' | \ + xargs -d '\n' git rm + +--[[Joey]] diff --git a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates.mdwn b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates.mdwn index eda17aea6..933653578 100644 --- a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates.mdwn +++ b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates.mdwn @@ -24,3 +24,5 @@ I want this because I have copies of various of mine (photos, in particular) sca (Another way to do this would be to "git annex add" them all, and then use a "git annex remove-duplicates" that could prompt me about which files are duplicates of each other, and then I could pipe that command's output into xargs git rm.) (As I write this, I realize it's possible to parse the destination of the symlink in a way that does this..) + +> [[done]]; see [[tips/finding_duplicate_files]] --[[Joey]] diff --git a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_6_f24541ada1c86d755acba7e9fa7cff24._comment b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_6_f24541ada1c86d755acba7e9fa7cff24._comment index 93d3d41f4..5d8ac8e61 100644 --- a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_6_f24541ada1c86d755acba7e9fa7cff24._comment +++ b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_6_f24541ada1c86d755acba7e9fa7cff24._comment @@ -8,7 +8,9 @@ My main concern with putting this in git-annex is that finding duplicates necess So I would rather come at this from a different angle.. like providing a way to output a list of files and their associated keys, which the user can then use in their own shell pipelines to find duplicate keys: - git annex find --include '*' --format=\"%f %k\n\" | sort foo --key 2 | uniq --all-repeated --skip-fields=1 + git annex find --include '*' --format='${file} ${key}\n' | sort --key 2 | uniq --all-repeated --skip-fields=1 -(Making that properly handle filenames with spaces is left as an exercise for the reader..) +Which is implemented now! + +(Making that pipeline properly handle filenames with spaces is left as an exercise for the reader..) """]] |