summaryrefslogtreecommitdiff
path: root/doc/todo/Limit_file_revision_history.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'doc/todo/Limit_file_revision_history.mdwn')
-rw-r--r--doc/todo/Limit_file_revision_history.mdwn117
1 files changed, 0 insertions, 117 deletions
diff --git a/doc/todo/Limit_file_revision_history.mdwn b/doc/todo/Limit_file_revision_history.mdwn
deleted file mode 100644
index 48b44dea2..000000000
--- a/doc/todo/Limit_file_revision_history.mdwn
+++ /dev/null
@@ -1,117 +0,0 @@
-Hi, I am assuming to use git-annex-assistant for two usecases, but I would like to ask about the options or planed roadmap for dropped/removed files from the repository.
-
-Usecases:
-
-1. sync working directory between laptop, home computer, work komputer
-2. archive functionality for my photograps
-
-Both usecases have one common factor. Some files might become obsolate and
-in long time frame nobody is interested to keep their revisions. Let's
-assume photographs. Usuall workflow I take is to import all photograps to
-filesystem, then assess (select) the good ones I want to keep and then
-process them what ever way.
-
-Problem with git-annex(-assistant) I have is that it start to revision all
-of the files at the time they are added to directory. This is welcome at
-first but might be an issue if you are used to put 80% of the size of your
-imported files to trash.
-
-I am aware of what git-annex is not. I have been reading documentation for
-"git-annex drop" and "unused" options including forums. I do understand
-that I am actually able to delete all revisions of the file if I will drop
-it, remove it and if I will run git annex unused 1..###. (on all synced
-repositories).
-
-I actually miss the option to have above process automated/replicated to the other synced repositories.
-
-I would formulate the 'use case' requirements for git-annex as:
-
-* command to drop an file including revisions from all annex repositories?
- (for example like moving a file to /trash folder) that will schedulle
- it's deletition)
-* option to keep like max. 10 last revisions of the file?
-* option to keep only previous revisions if younger than 6 months from now?
-
-Finally, how to specify a feature request for git-annex?
-
-> By moving it here ;-) --[[Joey]]
-
-> So, let's spec out a design.
->
-> * Add preferred content terminal to configure whether a repository wants
-> to hang on to unused content. Simply `unused`.
-> (It cannot include a timestamp, because there's
-> no way repos can agree on about when a key became unused.) **done**
-> * In order to quickly match that terminal, the Annex monad will need
-> to keep a Set of unused Keys. This should only be loaded on demand.
-> **done**
-> NB: There is some potential for a great many unused Keys to cause
-> memory usage to balloon.
-> * Client repositories will end their preferred content with
-> `and (not unused)`. Transfer repositories too, because typically
-> only client repos connect to them, and so otherwise unused files
-> would build up there. Backup repos would want unused files. I
-> think that archive repos would too. **done**
-> * Make the assistant check for unused files periodically. Exactly
-> how often may need to be tuned, but once per day seems reasonable
-> for most repos. Note that the assistant could also notice on the
-> fly when files are removed and mark their keys as unused if that was
-> the last associated file. (Only currently possible in direct mode.)
-> **done**
-> * After scanning for unused files, it makes sense for the
-> assistant to queue transfers of unused files to any remotes that
-> do want them (eg, backup remotes). If the files can successfully be
-> sent to a remote, that will lead to them being dropped locally as
-> they're not wanted.
-> * Add a git config setting like annex.expireunused=7d. This causes
-> *deletion* of unused files after the specified time period if they are
-> not able to be moved to a repo that wants them.
-> (The default should be annex.expireunused=false.)
-> * How to detect how long a file has been unused? We can't look at the
-> time stamp of the object; we could use the mtime of the .map file,
-> that that's direct mode only and may be replaced with a database
-> later. Seems best to just keep a unused log file with timestamps.
-> **done**
-> * After the assistant scans for unused files, if annex.expireunused
-> is not set, and there is some significant quantity of unused files
-> (eg, more than 1000, or more than 1 gb, or more than the amount of
-> remaining free disk space),
-> it can pop up a webapp alert asking to configure it. **done**
-> * Webapp interface to configure annex.expireunused. Reasonable values
-> are no expiring, or any number of days. **done**
->
-> [[done]] This does not cover every use case that was requested.
-> But I don't see a cheap way to ensure it keeps eg the past 10 versions of
-> a file. I guess that if you care about that, you leave
-> annex.expireunused=false, and set up a backup repository where the unused
-> files will be moved to.
->
-> Note that since the assistant uses direct mode by default, old versions
-> of modififed files are not guaranteed to be retained. But they very well
-> might be. For example, if a file is replicated to 2 clients, and one
-> client directly edits it, or deletes it, it loses the old version,
-> but the other client will still be storing that old version.
->
-> ## Stability analysis for unused in preferred content expressions
->
-> This is tricky, because two repos that are otherwise entirely
-> in sync may have differing opinons about whether a key is unused,
-> depending on when each last scanned for unused keys.
->
-> So, this preferred content terminal is *not stable*.
-> It may be possible to write preferred content expressions
-> that constantly moved such keys around without reaching a steady state.
->
-> Example:
->
-> A and B are clients directly connected, and both also connected
-> to BACKUP.
->
-> A deletes F. B syncs with A, and runs unused check; decides F
-> is unused. B sends F to BACKUP. B will then think A doesn't want F,
-> and will drop F from A. Next time A runs a full transfer scan, it will
-> *not* find F (because the file was deleted!). So it won't get F back from
-> BACKUP.
->
-> So, it looks like the fact that unused files are not going to be
-> looked for on the full transfer scan seems to make this work out ok.