summaryrefslogtreecommitdiff
path: root/doc/todo/Limit_file_revision_history.mdwn
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2014-01-22 14:35:38 -0400
committerGravatar Joey Hess <joey@kitenet.net>2014-01-22 14:35:38 -0400
commitb09828d1b7bbd7f18a4186b47fbd70b66e682af4 (patch)
treea61607eeb63e0721442c793709172bd537deefb2 /doc/todo/Limit_file_revision_history.mdwn
parentd3f76d9d0f0811f71be39d09a9c1703eb5d4cb4c (diff)
promote forum post to feature request; add design
Diffstat (limited to 'doc/todo/Limit_file_revision_history.mdwn')
-rw-r--r--doc/todo/Limit_file_revision_history.mdwn89
1 files changed, 89 insertions, 0 deletions
diff --git a/doc/todo/Limit_file_revision_history.mdwn b/doc/todo/Limit_file_revision_history.mdwn
new file mode 100644
index 000000000..593e93013
--- /dev/null
+++ b/doc/todo/Limit_file_revision_history.mdwn
@@ -0,0 +1,89 @@
+Hi, I am assuming to use git-annex-assistant for two usecases, but I would like to ask about the options or planed roadmap for dropped/removed files from the repository.
+
+Usecases:
+
+1. sync working directory between laptop, home computer, work komputer
+2. archive functionality for my photograps
+
+Both usecases have one common factor. Some files might become obsolate and
+in long time frame nobody is interested to keep their revisions. Let's
+assume photographs. Usuall workflow I take is to import all photograps to
+filesystem, then assess (select) the good ones I want to keep and then
+process them what ever way.
+
+Problem with git-annex(-assistant) I have is that it start to revision all
+of the files at the time they are added to directory. This is welcome at
+first but might be an issue if you are used to put 80% of the size of your
+imported files to trash.
+
+I am aware of what git-annex is not. I have been reading documentation for
+"git-annex drop" and "unused" options including forums. I do understand
+that I am actually able to delete all revisions of the file if I will drop
+it, remove it and if I will run git annex unused 1..###. (on all synced
+repositories).
+
+I actually miss the option to have above process automated/replicated to the other synced repositories.
+
+I would formulate the 'use case' requirements for git-annex as:
+
+* command to drop an file including revisions from all annex repositories?
+ (for example like moving a file to /trash folder) that will schedulle
+ it's deletition)
+* option to keep like max. 10 last revisions of the file?
+* option to keep only previous revisions if younger than 6 months from now?
+
+Finally, how to specify a feature request for git-annex?
+
+> By moving it here ;-) --[[Joey]]
+
+> So, let's spec out a design.
+>
+> * Add preferred content terminal to configure whether a repository wants
+> to hang on to unused content.
+> Something like "unused=true" I suppose, because not having a parameter
+> would complicate preferred content parsing, and I cannot think
+> of a useful parameter.
+> * In order to quickly match that terminal, the Annex monad will need
+> to keep a Set of unused Keys. This should only be loaded on demand.
+> NB: There is some potential for a great many unused Keys to cause
+> memory usage to balloon.
+> * Client repositories will end their preferred content with
+> `and unused=false`. Transfer repositories too, because typically
+> only client repos connect to them, and so otherwise unused files
+> would build up there. Backup repos would want unused files. I
+> think that archive repos would too.
+> * Make the assistant check for unused files periodically. Exactly
+> how often may need to be tuned, but once per day seems reasonable
+> for most repos. Note that the assistant could also notice on the
+> fly when files are removed and mark their keys as unused if that was
+> the last associated file. (Only currently possible in direct mode.)
+> * It makes sense for the
+> assistant to queue transfers of unused files to any remotes that
+> do want them (eg, backup remotes). If the files can successfully be
+> sent to a remote, that will lead to them being dropped locally as
+> they're not wanted.
+> * Add a git config setting like annex.expireunused=7d. This causes
+> *deletion* of unused files after the specified time period if they are
+> not able to be moved to a repo that wants them.
+> (The default should be annex.expireunused=false.)
+> * How to detect how long a file has been unused? We can't look at the
+> time stamp of the object; we could use the mtime of the .map file,
+> that that's direct mode only and may be replaced with a database
+> later. Seems best to just keep a unused log file with timestamps.
+> * After the assistant scans for unused files, if annex.expireunused
+> is not set, and there is some significant quantity of unused files
+> (eg, more than 1000, or more than 1 gb, or more than the amount of
+> remaining free disk space),
+> it can pop up a webapp alert asking to configure it.
+>
+> This does not cover every use case that was requested.
+> But I don't see a cheap way to ensure it keeps eg the past 10 versions of
+> a file. I guess that if you care about that, you leave
+> annex.expireunused=false, and set up a backup repository where the unused
+> files will be moved to.
+>
+> Note that since the assistant uses direct mode by default, old versions
+> of modififed files are not guaranteed to be retained. But they very well
+> might be. For example, if a file is replicated to 2 clients, and one
+> client directly edits it, or deletes it, it loses the old version,
+> but the other client will still be storing that old version.