summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/git-annex.mdwn38
-rw-r--r--doc/tips/finding_duplicate_files.mdwn21
-rw-r--r--doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates.mdwn2
-rw-r--r--doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_6_f24541ada1c86d755acba7e9fa7cff24._comment6
4 files changed, 58 insertions, 9 deletions
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index b3d671bb8..2d0d2597e 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -241,15 +241,20 @@ subdirectories).
* find [path ...]
- Outputs a list of annexed files whose content is currently present.
- Or, if a file matching option is specified, outputs a list of all
- matching files, whether or not their content is currently present.
+ Outputs a list of annexed files in the specified path. With no path,
+ finds files in the current directory and its subdirectories.
- With no parameters, defaults to finding all files in the current directory
- and its subdirectories.
+ By default, only lists annexed files whose content is currently present.
+ This can be changed by specifying file matching options. To list all
+ annexed files, present or not, specify --include "*". To list all
+ annexed files whose content is not present, specify --not --in "."
To output filenames terminated with nulls, for use with xargs -0,
- specify --print0.
+ specify --print0. Or, a custom output formatting can be specified using
+ --format. The default output format is the same as --format='${file}\\n'
+
+ These variables are available for use in formats: file, key, backend,
+ bytesize, humansize
* whereis [path ...]
@@ -427,6 +432,16 @@ subdirectories).
are in the annex, their backend is known and this option is not
necessary.
+* --format=value
+
+ Specifies a custom output format. The value is a format string,
+ in which '${var}' is expanded to the value of a variable. To right-justify
+ a variable with whitespace, use '${var;width}' ; to left-justify
+ a variable, use '${var;-width}'; to escape unusual characters in a variable,
+ use '${escaped_var}'
+
+ Also, '\\n' is a newline, '\\000' is a NULL, etc.
+
* -c name=value
Used to override git configuration settings. May be specified multiple times.
@@ -447,7 +462,16 @@ file contents are present at either of two repositories.
* --exclude=glob
Skips files matching the glob pattern. The glob is matched relative to
- the current directory. For example: --exclude='*.mp3' --exclude='subdir/*'
+ the current directory. For example:
+
+ --exclude='*.mp3' --exclude='subdir/*'
+
+* --include=glob
+
+ Skips files not matching the glob pattern. (Same as --not --exclude.)
+ For example, to include only mp3 and ogg files:
+
+ --include='*.mp3' --or --include='*.ogg'
* --in=repository
diff --git a/doc/tips/finding_duplicate_files.mdwn b/doc/tips/finding_duplicate_files.mdwn
new file mode 100644
index 000000000..94fc85400
--- /dev/null
+++ b/doc/tips/finding_duplicate_files.mdwn
@@ -0,0 +1,21 @@
+Maybe you had a lot of files scattered around on different drives, and you
+added them all into a single git-annex repository. Some of the files are
+surely duplicates of others.
+
+While git-annex stores the file contents efficiently, it would still
+help in cleaning up this mess if you could find, and perhaps remove
+the duplicate files.
+
+Here's a command line that will show duplicate sets of files grouped together:
+
+ git annex find --include '*' --format='${file} ${escaped_key}\n' | \
+ sort -k2 | uniq --all-repeated=separate -f1 | \
+ sed 's/ [^ ]*$//'
+
+Here's a command line that will remove one of each duplicate set of files:
+
+ git annex find --include '*' --format='${file} ${escaped_key}\n' | \
+ sort -k2 | uniq --repeated -f1 | sed 's/ [^ ]*$//' | \
+ xargs -d '\n' git rm
+
+--[[Joey]]
diff --git a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates.mdwn b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates.mdwn
index eda17aea6..933653578 100644
--- a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates.mdwn
+++ b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates.mdwn
@@ -24,3 +24,5 @@ I want this because I have copies of various of mine (photos, in particular) sca
(Another way to do this would be to "git annex add" them all, and then use a "git annex remove-duplicates" that could prompt me about which files are duplicates of each other, and then I could pipe that command's output into xargs git rm.)
(As I write this, I realize it's possible to parse the destination of the symlink in a way that does this..)
+
+> [[done]]; see [[tips/finding_duplicate_files]] --[[Joey]]
diff --git a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_6_f24541ada1c86d755acba7e9fa7cff24._comment b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_6_f24541ada1c86d755acba7e9fa7cff24._comment
index 93d3d41f4..5d8ac8e61 100644
--- a/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_6_f24541ada1c86d755acba7e9fa7cff24._comment
+++ b/doc/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/comment_6_f24541ada1c86d755acba7e9fa7cff24._comment
@@ -8,7 +8,9 @@ My main concern with putting this in git-annex is that finding duplicates necess
So I would rather come at this from a different angle.. like providing a way to output a list of files and their associated keys, which the user can then use in their own shell pipelines to find duplicate keys:
- git annex find --include '*' --format=\"%f %k\n\" | sort foo --key 2 | uniq --all-repeated --skip-fields=1
+ git annex find --include '*' --format='${file} ${key}\n' | sort --key 2 | uniq --all-repeated --skip-fields=1
-(Making that properly handle filenames with spaces is left as an exercise for the reader..)
+Which is implemented now!
+
+(Making that pipeline properly handle filenames with spaces is left as an exercise for the reader..)
"""]]