From ffa6812d083f158e7afe6978ed8f06f2ff9ebdf6 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 16 Jun 2015 20:17:17 -0400 Subject: rewrite so it's understandable without knowing about the related command-line options --- doc/preferred_content.mdwn | 260 +++++++++++++++++++++++++++------------------ 1 file changed, 158 insertions(+), 102 deletions(-) diff --git a/doc/preferred_content.mdwn b/doc/preferred_content.mdwn index e285a6a7c..1dbc4b60b 100644 --- a/doc/preferred_content.mdwn +++ b/doc/preferred_content.mdwn @@ -39,8 +39,8 @@ files that `git annex get --auto` will want to get, and `git annex find will want to drop. The expressions are very similar to the matching options documented -on the [[git-annex]] man page. At the command line, you can use those -options in commands like this: +on the [[git-annex-matching-options]] man page. +At the command line, you can use those options in commands like this: git annex get --include='*.mp3' --and -'(' --not --largerthan=100mb -')' @@ -48,152 +48,208 @@ The equivalent preferred content expression looks like this: include=*.mp3 and (not largerthan=100mb) -So, just remove the dashes, basically. However, there are some differences -from the command line options to keep in mind: +So, just remove the dashes, basically. But, there are some differences +between the command line options and expressions, so see the documentation +below to get the full story. -### difference: file matching +## expressions -While --include and --exclude match files relative to the current -directory, preferred content expressions always match files relative to the -top of the git repository. +* `include=glob` and `exclude=glob` -For example, suppose you put files into `archive` directories -when you're done with them. Then you could configure your laptop to prefer -to not retain those files, like this: + Match files to include, or exclude. + + While --include=glob and --exclude=glob match files relative to the current + directory, preferred content expressions always match files relative to the + top of the git repository. + + For example, suppose you put files into `archive` directories + when you're done with them. Then you could configure your laptop to prefer + to not retain those files, like this: exclude=*/archive/* -### difference: no "in=" +* `copies=number` -Preferred content expressions have no direct equivalent to `--in`. + Matches only files that git-annex believes to have the specified number + of copies, or more. Note that it does not check remotes to verify that + the copies still exist. -Often, it's best to add repositories to groups, and match against -the groups in a preferred content expression. So rather than -`--in=usbdrive`, put all the USB drives into a "transfer" group, -and use "copies=transfer:1" + To decide if content should be dropped, git-annex evaluates the preferred + content expression under the assumption that the content has *already* been + dropped. If the content would not be wanted then, the drop can be done. + So, for example, `copies=2` in a preferred content expression lets + content be dropped only when there are currently 3 copies of it, including + the repo it's being dropped from. This is different than running `git annex + drop --copies=2`, which will drop files that currently have 2 copies. -### difference: dropping +* `copies=trustlevel:number` -To decide if content should be dropped, git-annex evaluates the preferred -content expression under the assumption that the content has *already* been -dropped. If the content would not be wanted then, the drop can be done. -So, for example, `copies=2` in a preferred content expression lets -content be dropped only when there are currently 3 copies of it, including -the repo it's being dropped from. This is different than running `git annex -drop --copies=2`, which will drop files that currently have 2 copies. + Matches only files that git-annex believes have the specified number + copies, on remotes with the specified trust level. For example, + `copies=trusted:2` -### difference: "present" + To match any trust level at or higher than a given level, + use 'trustlevel+'. For example, `--copies=semitrusted+:2` -There's a special "present" keyword you can use in a preferred content -expression. This means that content is wanted if it's present, -and not otherwise. This leaves it up to you to use git-annex manually -to move content around. You can use this to avoid preferred content -settings from affecting a subdirectory. For example: +* `copies=groupname:number` - auto/* or (include=ad-hoc/* and present) + Matches only files that git-annex believes have the specified number of + copies, on remotes in the specified group. For example, + `copies=archive:2` + + Preferred content expressions have no equivilant to the `--in` + option, but groups can accomplish similar things. You can add + repositories to groups, and match against the groups in a + preferred content expression. So rather than `--in=usbdrive`, + put all the USB drives into a "transfer" group, and use + `copies=transfer:1` + +* `lackingcopies=number` + + Matches only files that git-annex believes need the specified number or + more additional copies to be made in order to satisfy their numcopies + settings. + +* `approxlackingcopies=number` + + Like lackingcopies, but does not look at .gitattributes annex.numcopies + settings. This makes it significantly faster. + +* `inbackend=name` + + Matches only files whose content is stored using the specified key-value + backend. + +* `inallgroup=groupname` + + Matches only files that git-annex believes are present in all repositories + in the specified group. -Note that `not present` is a very bad thing to put in a preferred content -expression. It'll make it want to get content that's not present, and -drop content that is present! Don't go there.. +* `smallerthan=size` and `largerthan=size` -### difference: "inpreferreddir" + Matches only files whose content is smaller than, or larger than the + specified size. -There's a special "inpreferreddir" keyword you can use in a -preferred content expression of a special remote. This means that the -content is preferred if it's in a directory (located anywhere in the tree) -with a special name. + The size can be specified with any commonly used units, for example, + "0.5 gb" or "100 KiloBytes" -The name of the directory can be configured using -`git annex enableremote $remote preferreddir=$dirname` +* `metadata=field=glob` -(If no directory name is configured, it uses "public" by default.) + Matches only files that have a metadata field attached with a value that + matches the glob. The values of metadata fields are matched case + insensitively. -### difference: "standard" + To match a tag "done", use `metadata=tag=done` -git-annex comes with some built-in preferred content expressions, that -can be used with repositories that are in some [[standard_groups]]. + To match author metadata, use `metadata=author=* Smith" -When a repository is in exactly one such group, you can use the "standard" -keyword in its preferred content expression, to match whatever content -the group's expression matches. -(If a repository is put into multiple standard -groups, "standard" will match anything.. so don't do that!) +* `present` -Most often, the whole preferred content expression is simply "standard". -But, you can do more complicated things, for example: -"`standard or include=otherdir/*`" + Makes content be wanted if it's present, but not otherwise. -### difference: "groupwanted" + This leaves it up to you to use git-annex manually + to move content around. You can use this to avoid preferred content + settings from affecting a subdirectory. For example: -The "groupwanted" keyword can be used to refer to a preferred content -expression that is associated with a group. This is like the "standard" -keyword, but you can configure the preferred content expressions -using `git annex groupwanted`. + auto/* or (include=ad-hoc/* and present) + + Note that `not present` is a very bad thing to put in a preferred content + expression. It'll make it want to get content that's not present, and + drop content that is present! Don't go there.. + +* `inpreferreddir` + + Makes content be preferred if it's in a directory (located anywhere + in the tree) with a particular name. + + The name of the directory can be configured using + `git annex enableremote $remote preferreddir=$dirname` + + (If no directory name is configured, it uses "public" by default.) -Note that when writing a groupwanted preferred content expression, -you can use all of the keywords listed above, including "standard". -(But not "groupwanted".) +* `standard` -For example, to make a variant of the standard client preferred content -expression that does not want files in the "out" directory, you -could run: `git annex groupwanted client "standard and exclude=out/*"` + git-annex comes with some built-in preferred content expressions, that + can be used with repositories that are in some [[standard_groups]]. -Then repositories that are in the client group and have their preferred -content expression set to "groupwanted" will use that, while -other client repositories that have their preferred content expression -set to "standard" will use the standard expression. + When a repository is in exactly one such group, you can use the "standard" + keyword in its preferred content expression, to match whatever content + the group's expression matches. + (If a repository is put into multiple standard + groups, "standard" will match anything.. so don't do that!) -Or, you could make a new group, with your own custom preferred content -expression tuned for your needs, and every repository you put in this -group and make its preferred content be "groupwanted" will use it. + Most often, the whole preferred content expression is simply "standard". + But, you can do more complicated things, for example: + `standard or include=otherdir/*` -For example, the archive group only wants to archive 1 copy of each file, -spread among every repository in the group. -Here's how to configure a group named redundantarchive, that instead -wants to contain 3 copies of each file: +* `groupwanted` + The "groupwanted" keyword can be used to refer to a preferred content + expression that is associated with a group. This is like the "standard" + keyword, but you can configure the preferred content expressions + using `git annex groupwanted`. + + Note that when writing a groupwanted preferred content expression, + you can use all of the keywords listed above, including "standard". + (But not "groupwanted".) + + For example, to make a variant of the standard client preferred content + expression that does not want files in the "out" directory, you + could run: `git annex groupwanted client "standard and exclude=out/*"` + + Then repositories that are in the client group and have their preferred + content expression set to "groupwanted" will use that, while + other client repositories that have their preferred content expression + set to "standard" will use the standard expression. + + Or, you could make a new group, with your own custom preferred content + expression tuned for your needs, and every repository you put in this + group and make its preferred content be "groupwanted" will use it. + + For example, the archive group only wants to archive 1 copy of each file, + spread among every repository in the group. + Here's how to configure a group named redundantarchive, that instead + wants to contain 3 copies of each file: + git annex groupwanted redundantarchive "not (copies=redundantarchive:3)" for repo in foo bar baz; do git annex group $repo redundantarchive git annex wanted $repo groupwanted done -### difference: metadata matching - -This: +* `unused` - git annex get --metadata tag=done + Matches only keys that `git annex unused` has determined to be unused. -becomes + This is related the the --unused option. + However, putting `unused` in a preferred content expression + doesn't make git-annex consider those unused keys. So when git-annex is + only checking preferred content expressions against files in the + repository (which are obviously used), `unused` in a preferred + content expression won't match anything. - metadata=tag=done + So when is `unused` useful in a preferred content expression? -### difference: unused + 1. Using `git annex sync --content --all` will operate on all files, + including unused ones, and take `unused` in preferred content expressions + into account. + 2. The git-annex assistant periodically scans for unused files, and + moves them to some repository whose preferred content expression + matches "unused". (Or, if annex.expireunused is set, it may just delete + them.) -The --unused option makes git-annex operate on every key that `git annex -unused` has determined to be unused. The corresponding `unused` keyword -in a preferred content expression also matches those keys. +* `anything` -However, using `unused` in a preferred content expression -doesn't make git-annex consider those keys. So when git-annex is -only checking preferred content expressions against files in the -repository (which are obviously used), `unused` in a preferred -content expression won't match anything. + Matches any version of any file. -So when is `unused` useful in a preferred content expression? +* `not expression` -* The git-annex assistant periodically scans for unused files, and - moves them to some repository whose preferred content expression - matches "unused". (Or, if annex.expireunused is set, it may just delete - them.) -* Using `git annex sync --content --all` will operate on all files, - including unused ones, and take `unused` in preferred content expressions - into account. + Inverts what the expression matches. For example, `not include=archive/*` + is the same as `exclude=archive/*` -### difference: anything +* `and` / `or` / `( expression )` -The "anything" keyword can be used in a preferred content expression -to match any version of any file. + These can be used to build up more complicated expressions. ## upgrades -- cgit v1.2.3