preferred content stability analysis

author: Joey Hess <joey@kitenet.net> 2014-01-22 15:55:44 -0400
committer: Joey Hess <joey@kitenet.net> 2014-01-22 15:55:44 -0400
commit: 9a5de318d15f0234080a6f0bd802fe073cf57334 (patch)
tree: df77d2f8474a4bc36b316d0ac28c5af886b9aed4
parent: cc366b8241cfc3e41252ecd2624332c15da03377 (diff)
2 files changed, 49 insertions, 2 deletions
diff --git a/doc/design/preferred_content.mdwn b/doc/design/preferred_content.mdwn
new file mode 100644
index 000000000..3972b8b58
--- /dev/null
+++ b/doc/design/preferred_content.mdwn
@@ -0,0 +1,21 @@
+The [[preferred_content]] expressions didn't have a design document, but
+it's a small non-turing complete DSL for expressing which objects a
+repository prefers to contain.
+
+One thing that needs to be written down though is the stability analysis
+that must be done of preferred content expressions. 
+
+It's important that when a set of repositories all look at one-another's
+preferred content expressions, and copy/move/drop objects to satisfy them,
+they end up at a steady state. So, a given preferred content expression
+should ideally evaluate to the same answer for each key, from the
+perspective of each repository.
+
+The best way to ensure that is the case is to only use terms in preferred
+content expressions that rely on state that is shared between all
+repositories. So, state in the git-annex branch, or the master branch
+(assuming all repositories have master checked out).
+
+Since git is eventually consistent, there might be disagreements about
+which object belongs where, but once consistency is reached, things will
+settle down.
diff --git a/doc/todo/Limit_file_revision_history.mdwn b/doc/todo/Limit_file_revision_history.mdwn
index 593e93013..9cdfe5e9b 100644
--- a/doc/todo/Limit_file_revision_history.mdwn
+++ b/doc/todo/Limit_file_revision_history.mdwn
@@ -42,7 +42,8 @@ Finally, how to specify a feature request for git-annex?
 >   to hang on to unused content.
 >   Something like "unused=true" I suppose, because not having a parameter
 >   would complicate preferred content parsing, and I cannot think
->   of a useful parameter.
+>   of a useful parameter. (It cannot be a timestamp, because there's
+>   no way repos can agree on about when a key became unused.)
 > * In order to quickly match that terminal, the Annex monad will need
 >   to keep a Set of unused Keys. This should only be loaded on demand.  
 >   NB: There is some potential for a great many unused Keys to cause
@@ -57,7 +58,7 @@ Finally, how to specify a feature request for git-annex?
 >   for most repos. Note that the assistant could also notice on the
 >   fly when files are removed and mark their keys as unused if that was
 >   the last associated file. (Only currently possible in direct mode.)
-> * It makes sense for the
+> * After scanning for unused files, it makes sense for the
 >   assistant to queue transfers of unused files to any remotes that
 >   do want them (eg, backup remotes). If the files can successfully be
 >   sent to a remote, that will lead to them being dropped locally as
@@ -70,6 +71,7 @@ Finally, how to specify a feature request for git-annex?
 >   time stamp of the object; we could use the mtime of the .map file,
 >   that that's direct mode only and may be replaced with a database
 >   later. Seems best to just keep a unused log file with timestamps.
+>   **done**
 > * After the assistant scans for unused files, if annex.expireunused
 >   is not set, and there is some significant quantity of unused files
 >   (eg, more than 1000, or more than 1 gb, or more than the amount of
@@ -87,3 +89,27 @@ Finally, how to specify a feature request for git-annex?
 > might be. For example, if a file is replicated to 2 clients, and one
 > client directly edits it, or deletes it, it loses the old version,
 > but the other client will still be storing that old version.
+> 
+> ## Stability analysis for unused= in preferred content expressions
+> 
+> This is tricky, because two repos that are otherwise entirely
+> in sync may have differing opinons about whether a key is unused,
+> depending on when each last scanned for unused keys.
+> 
+> So, this preferred content terminal is *not stable*.
+> It may be possible to write preferred content expressions 
+> that constantly moved such keys around without reaching a steady state.
+> 
+> Example:
+> 
+> A and B are clients directly connected, and both also connected
+> to BACKUP.
+> 
+> A deletes F. B syncs with A, and runs unused check; decides F
+> is unused. B sends F to BACKUP. B will then think A doesn't want F,
+> and will drop F from A. Next time A runs a full transfer scan, it will
+> *not* find F (because the file was deleted!). So it won't get F back from
+> BACKUP.
+> 
+> So, it looks like the fact that unused files are not going to be
+> looked for on the full transfer scan seems to make this work out ok.
author	Joey Hess <joey@kitenet.net>	2014-01-22 15:55:44 -0400
committer	Joey Hess <joey@kitenet.net>	2014-01-22 15:55:44 -0400
commit	9a5de318d15f0234080a6f0bd802fe073cf57334 (patch)
tree	df77d2f8474a4bc36b316d0ac28c5af886b9aed4
parent	cc366b8241cfc3e41252ecd2624332c15da03377 (diff)