summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2014-01-20 17:34:58 -0400
committerGravatar Joey Hess <joey@kitenet.net>2014-01-20 17:35:29 -0400
commit049bea4659e108967977852b5ca0cc00b74a8831 (patch)
tree87b5cb9f48f4692eee93ce2565589441509857c8
parentbe851d42f4a78fc05494e1204c5914a19b305893 (diff)
Add and use numcopiesneeded preferred content expression.
* Add numcopiesneeded preferred content expression. * Client, transfer, incremental backup, and archive repositories now want to get content that does not yet have enough copies. This means the asssistant will make copies of files that don't yet meet the configured numcopies, even to places that would not normally want the file. For example, if numcopies is 4, and there are 2 client repos and 2 transfer repos, and 2 removable backup drives, the file will be sent to both transfer repos in order to make 4 copies. Once a removable drive get a copy of the file, it will be dropped from one transfer repo or the other (but not both). Another example, numcopies is 3 and there is a client that has a backup removable drive and two small archive repos. Normally once one of the small archives has a file, it will not be put into the other one. But, to satisfy numcopies, the assistant will duplicate it into the other small archive too, if the backup repo is not available to receive the file. I notice that these examples are fairly unlikely setups .. the old behavior was not too bad, but it's nice to finally have it really correct. .. Almost. I have skipped checking the annex.numcopies .gitattributes out of fear it will be too slow. This commit was sponsored by Florian Schlegel.
-rw-r--r--Annex/FileMatcher.hs1
-rw-r--r--GitAnnex/Options.hs2
-rw-r--r--Limit.hs27
-rw-r--r--Types/StandardGroups.hs4
-rw-r--r--debian/changelog3
-rw-r--r--doc/git-annex.mdwn9
-rw-r--r--doc/preferred_content.mdwn8
-rw-r--r--doc/todo/preferred_content_numcopies_check.mdwn7
8 files changed, 52 insertions, 9 deletions
diff --git a/Annex/FileMatcher.hs b/Annex/FileMatcher.hs
index 96cb8fd6f..6ec0bace9 100644
--- a/Annex/FileMatcher.hs
+++ b/Annex/FileMatcher.hs
@@ -70,6 +70,7 @@ parseToken checkpresent checkpreferreddir groupmap t
[ ("include", limitInclude)
, ("exclude", limitExclude)
, ("copies", limitCopies)
+ , ("numcopiesneeded", limitNumCopiesNeeded)
, ("inbackend", limitInBackend)
, ("largerthan", limitSize (>))
, ("smallerthan", limitSize (<))
diff --git a/GitAnnex/Options.hs b/GitAnnex/Options.hs
index fbb34470b..ad1e0c93b 100644
--- a/GitAnnex/Options.hs
+++ b/GitAnnex/Options.hs
@@ -41,6 +41,8 @@ options = Option.common ++
"match files present in a remote"
, Option ['C'] ["copies"] (ReqArg Limit.addCopies paramNumber)
"skip files with fewer copies"
+ , Option [] ["numcopiesneeded"] (ReqArg Limit.addNumCopiesNeeded paramNumber)
+ "match files that need more copies"
, Option ['B'] ["inbackend"] (ReqArg Limit.addInBackend paramName)
"match files using a key-value backend"
, Option [] ["inallgroup"] (ReqArg Limit.addInAllGroup paramGroup)
diff --git a/Limit.hs b/Limit.hs
index fa6fa1f41..c0d32c68e 100644
--- a/Limit.hs
+++ b/Limit.hs
@@ -1,6 +1,6 @@
{- user-specified limits on files to act on
-
- - Copyright 2011-2013 Joey Hess <joey@kitenet.net>
+ - Copyright 2011-2014 Joey Hess <joey@kitenet.net>
-
- Licensed under the GNU GPL version 3 or higher.
-}
@@ -23,6 +23,7 @@ import qualified Backend
import Annex.Content
import Annex.UUID
import Logs.Trust
+import Logs.NumCopies
import Types.TrustLevel
import Types.Key
import Types.Group
@@ -177,6 +178,30 @@ limitCopies want = case split ":" want of
| "+" `isSuffixOf` s = (>=) <$> readTrustLevel (beginning s)
| otherwise = (==) <$> readTrustLevel s
+{- Adds a limit to match files that need more copies made.
+ -
+ - Does not look at annex.numcopies .gitattributes, because that
+ - would require querying git check-attr every time a preferred content
+ - expression is checked, which would probably be quite slow.
+ -}
+addNumCopiesNeeded :: String -> Annex ()
+addNumCopiesNeeded = addLimit . limitNumCopiesNeeded
+
+limitNumCopiesNeeded :: MkLimit
+limitNumCopiesNeeded want = case readish want of
+ Just needed -> Right $ \notpresent -> checkKey $
+ handle needed notpresent
+ Nothing -> Left "bad value for numcopiesneeded"
+ where
+ handle needed notpresent key = do
+ gv <- getGlobalNumCopies
+ case gv of
+ Nothing -> return False
+ Just numcopies -> do
+ us <- filter (`S.notMember` notpresent)
+ <$> (trustExclude UnTrusted =<< Remote.keyLocations key)
+ return $ numcopies - length us >= needed
+
{- Adds a limit to skip files not believed to be present in all
- repositories in the specified group. -}
addInAllGroup :: String -> Annex ()
diff --git a/Types/StandardGroups.hs b/Types/StandardGroups.hs
index 51788ec4e..c4c3ba9f3 100644
--- a/Types/StandardGroups.hs
+++ b/Types/StandardGroups.hs
@@ -93,6 +93,6 @@ notArchived :: String
notArchived = "not (copies=archive:1 or copies=smallarchive:1)"
{- Most repositories want any content that is only on untrusted
- - or dead repositories. -}
+ - or dead repositories, or that otherwise does not have enough copies. -}
lastResort :: String -> PreferredContentExpression
-lastResort s = "(" ++ s ++ ") or (not copies=semitrusted+:1)"
+lastResort s = "(" ++ s ++ ") or numcopiesneeded=1"
diff --git a/debian/changelog b/debian/changelog
index 9ecb4d2b6..923fb1692 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -14,6 +14,9 @@ git-annex (5.20140118) UNRELEASED; urgency=medium
command is used to set the global number of copies, any annex.numcopies
git configs will be ignored.
* assistant: Make the prefs page set the global numcopies.
+ * Add numcopiesneeded preferred content expression.
+ * Client, transfer, incremental backup, and archive repositories
+ now want to get content that does not yet have enough copies.
-- Joey Hess <joeyh@debian.org> Sat, 18 Jan 2014 11:54:17 -0400
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index 4e7bd2395..8a948d303 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -1020,6 +1020,15 @@ file contents are present at either of two repositories.
copies, on remotes in the specified group. For example,
`--copies=archive:2`
+* `--numcopiesneeded=number`
+
+ Matches only files that git-annex believes need the specified number or
+ more additional copies to be made in order to satisfy their numcopies
+ setting, as configured by the global numcopies setting of the repository.
+
+ Note that for various reasons, including speed, this does not look
+ at the annex.numcopies .gitattributes settings of files.
+
* `--inbackend=name`
Matches only files whose content is stored using the specified key-value
diff --git a/doc/preferred_content.mdwn b/doc/preferred_content.mdwn
index 9c698c8ba..b18f46c33 100644
--- a/doc/preferred_content.mdwn
+++ b/doc/preferred_content.mdwn
@@ -113,7 +113,7 @@ any repository that can will back it up.)
All content is preferred, unless it's for a file in a "archive" directory,
which has reached an archive repository.
-`((exclude=*/archive/* and exclude=archive/*) or (not (copies=archive:1 or copies=smallarchive:1))) or (not copies=semitrusted+:1)`
+`((exclude=*/archive/* and exclude=archive/*) or (not (copies=archive:1 or copies=smallarchive:1))) or numcopiesneeded=1`
### transfer
@@ -147,20 +147,20 @@ All content is preferred.
Only prefers content that's not already backed up to another backup
or incremental backup repository.
-`(include=* and (not copies=backup:1) and (not copies=incrementalbackup:1)) or (not copies=semitrusted+:1)`
+`(include=* and (not copies=backup:1) and (not copies=incrementalbackup:1)) or numcopiesneeded=1`
### small archive
Only prefers content that's located in an "archive" directory, and
only if it's not already been archived somewhere else.
-`((include=*/archive/* or include=archive/*) and not (copies=archive:1 or copies=smallarchive:1)) or (not copies=semitrusted+:1)`
+`((include=*/archive/* or include=archive/*) and not (copies=archive:1 or copies=smallarchive:1)) or numcopiesneeded=1`
### full archive
All content is preferred, unless it's already been archived somewhere else.
-`(not (copies=archive:1 or copies=smallarchive:1)) or (not copies=semitrusted+:1)`
+`(not (copies=archive:1 or copies=smallarchive:1)) or numcopiesneeded=1`
Note that if you want to archive multiple copies (not a bad idea!),
you should instead configure all your archive repositories with a
diff --git a/doc/todo/preferred_content_numcopies_check.mdwn b/doc/todo/preferred_content_numcopies_check.mdwn
index 152afe08c..8aa736a04 100644
--- a/doc/todo/preferred_content_numcopies_check.mdwn
+++ b/doc/todo/preferred_content_numcopies_check.mdwn
@@ -54,9 +54,12 @@ Conclusion:
* Add "numcopiesneeded=N" preferred content expression using the git-annex
branch numcopies setting, overridden by any .gitattributes numcopies setting
for a particular file. It should ignore the other ways to specify
- numcopies.
+ numcopies, particularly git config annex.numcopies. **done**
* Make the repo groups that currently end with "or (not copies=semitrusted+:1)"
- to instead end with "or numcopiesneeded=1"
+ to instead end with "or numcopiesneeded=1" **done**
+* See if "numcopiesneeded=N" can check .gitattributes without getting
+ a lot slower. If now, perhaps add a "numcopiesneededaccurate=N" that
+ checks it.
## Stability analysis