summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2014-01-20 14:28:33 -0400
committerGravatar Joey Hess <joey@kitenet.net>2014-01-20 14:28:33 -0400
commit5c4a88a575449cdce510540cd36b28699a907011 (patch)
treecb32161cf12d745f2f0e620d6032ff39ad36da8e
parent2834e9f5813c320c9374a4d928c4ccf09a6ee33b (diff)
design for preferred content numcopies check
-rw-r--r--doc/todo/Provide_a___34__git_annex_satisfy__95__num__95__copies__34___command.mdwn12
-rw-r--r--doc/todo/preferred_content_numcopies_check.mdwn61
2 files changed, 66 insertions, 7 deletions
diff --git a/doc/todo/Provide_a___34__git_annex_satisfy__95__num__95__copies__34___command.mdwn b/doc/todo/Provide_a___34__git_annex_satisfy__95__num__95__copies__34___command.mdwn
index 877e9fdbf..cbd01181f 100644
--- a/doc/todo/Provide_a___34__git_annex_satisfy__95__num__95__copies__34___command.mdwn
+++ b/doc/todo/Provide_a___34__git_annex_satisfy__95__num__95__copies__34___command.mdwn
@@ -7,12 +7,10 @@ for i in `git remote`; do git copy -to $i --auto; done
The use case is this:
I have a very large repo (300.000 files) in three places. Now I want the fastest possible way to ensure, that every file exists in annex.numcopies. This should scan every file one time and then get it or copy it to other repos as needed. Right now, I make one "git annex get --auto" in every repo, which is is a waste of time, since most of the files never change anyway!
-> The closest we have to this is the (new) `git annex sync --content`.
-> It does effectivly just what the shown for loop does.
+> Now `git annex sync --content` does effectivly just what the shown for
+> loop does. [[done]]
>
-> But, that actually satisfies preferred content settings, which default
-> to preferring every repo have a copy, and even if configured will
-> typically be more than numcopies.
->
-> Numcopies is more of a minimum lower bound (though not a hard bound).
+> The only difference is that copy --auto proactively downloads otherwise
+> unwanted files to satisfy numcopies, and sync --content does not.
+> We need a [[preferred_content_numcopies_check]] to solve that.
> --[[Joey]]
diff --git a/doc/todo/preferred_content_numcopies_check.mdwn b/doc/todo/preferred_content_numcopies_check.mdwn
new file mode 100644
index 000000000..956888cca
--- /dev/null
+++ b/doc/todo/preferred_content_numcopies_check.mdwn
@@ -0,0 +1,61 @@
+The assistant and git annex sync --content do not try to proactively
+download content that is not otherwise wanted in order to get numcopies
+satisfied. (Unlike get --auto, which does take numcopies into account.)
+
+Should these automated systems try to proactively satisfy numcopies? I
+don't feel they should. It could result in surprising results. For example,
+a transfer repository, which is of limited size, could start being filled
+up with lots of content that all clients have, just because numcopies was
+set to a larger number than the total number of clients. Another example,
+a source repository on eg an Android phone, should never have content in it
+that was not created on that device.
+
+However, it would make sense for some specific
+types of repositories to proactively get content to satisfy numcopies.
+Currently some types of repositories use "or (not copies=semitrusted+:1)",
+to ensure that if the only copy of a file is on a dead repository, they
+will try to get that file before the repo goes away. This is done
+by client repositories, and backup, and archive. Probably the same set
+would make sense to proactively satisfy numcopies.
+
+So, a new type of preferred content expression is called for. Such as, for
+example, "numcopiesneeded=1". Which indicates that at least 1 more copy
+is needed to satifsy numcopies.
+
+(Note that it should only count semittrusted and higher trust
+level repos as satisfying numcopies.)
+
+But, preferred content expressions can only operate on info stored in the
+git repo, or they will fail to be stable. Ie, repo A needs to be able to
+calculate whether a file is preferred content by repo B and get the same
+result as when repo B calculates that.
+
+numcopies is currently configured in 3 places:
+
+* .git/config `annex.numcopies` (global, stored only locally)
+* .gitattributes `annex.numcopies` (per file, stored in git repo)
+* --numcopies (not relevant)
+
+So, need to add a global numcopies setting that is stored in the git repo.
+That could either be a file in the git-annex branch, or just
+`* annex.numcopies=2` in the toplevel .gitattributes. Note that the
+assistant needs to be able to query and set it, which I think argues
+against using .gitattributes for it. Also arguing against that is that the
+.git/config numcopies valie applies even to objects with no file in the
+work tree, which gitattributes settings do not.
+
+Conclusion:
+
+* Add to the git-annex branch a numcopies file that holds the global
+ numcopies default if present.
+* Modify the assistant to use it when configuring numcopies.
+* To deprecate .git/config's annex.numcopies, only make it take effect
+ when there is no numcopies file in the git-annex branch.
+* Add "numcopiesneeded=N" preferred content expression using the git-annex
+ branch numcopies setting, overridden by any .gitattributes numcopies setting
+ for a particular file. It should ignore the other ways to specify
+ numcopies.
+* Make the repo groups that currently end with "or (not copies=semitrusted+:1)"
+ to instead end with "or (not numcopiesneeded=1)"
+
+--[[Joey]]