summaryrefslogtreecommitdiff
path: root/doc/design/metadata.mdwn
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2014-02-11 04:15:33 -0400
committerGravatar Joey Hess <joey@kitenet.net>2014-02-11 10:48:52 -0400
commitdbd2093acce3eaabaf678286e634bc7f37798d77 (patch)
tree3349f64d7d44d28f35016eb6dd4047dbcbf8b45b /doc/design/metadata.mdwn
parentf4dee5a48229d6e17c87b7346509bf37811367d5 (diff)
interesting new design just gelled.. almost
Diffstat (limited to 'doc/design/metadata.mdwn')
-rw-r--r--doc/design/metadata.mdwn107
1 files changed, 107 insertions, 0 deletions
diff --git a/doc/design/metadata.mdwn b/doc/design/metadata.mdwn
new file mode 100644
index 000000000..8e409f7d6
--- /dev/null
+++ b/doc/design/metadata.mdwn
@@ -0,0 +1,107 @@
+[[!toc]]
+
+# metadata
+
+Attach an arbitrary set of metadata to a key.
+
+Metadata can be tags, but it can also be fields with values (ie, date=xxx,
+conference=yyy).
+
+Store in git-annex branch, next to location log files.
+
+Storage needs to support union merging, including removing tags, and
+changing values.
+
+## automatically added metadata
+
+git annex add should automatically attach the current mtime of a file
+when adding it.
+
+Could also automatically attach permissions.
+
+A git hook could be run by git annex add to gather more metadata.
+
+Also auto adds metadata when adding files to filter branches. See below.
+
+## derived metadata
+
+From the ctime, some additional
+metadata is derived, at least year=yyyy and probably also month, etc.
+
+Should be a general mechanism for this.
+
+# filtered branches
+
+`git annex filter year=2014 talk` should create a new branch
+filtered/talk/year=2014 containing only files tagged with that, and
+have git check it out. In this example, all files appear in top level
+directory of repo; no subdirs.
+
+`git annex fadd haskell` switches to branch
+filtered/haskell/talk/year=2014 with only the haskell talks.
+
+`git annex fadd year=2013 year=2012` switches to branch
+filtered/haskell/talk/year=2012,2013,2014. This has subdirectories 2012,
+2013 and 2014 with the matching talks.
+
+`git annex frm haskell` switches to
+filtered/talk/year=2012,2013,2014, which has all available talks in it.
+
+`git annex filteradd conference=fosdem conference=icfp` switches to branch
+filtered/conference=fosdem,icfp/talk/year=2012,2013,2014. Now we need
+to either nest the subdirectories, or make fosdem-2014, icfp-2013, etc.
+May need an option to choose this. Note that user may prefer to have year
+first or conference first, so may need an option for that as well.
+
+Note that old filter branches can be deleted when switching to a new one.
+There is no need to retain them. Unless the user has committed non
+git-annexed files to them, In which case, urk.
+
+These command should probably refuse to do anything if run from within a
+subdir of the work tree that would get deleted by checking out the new
+filtered branch.
+
+# operations while on filter branch
+
+* If files are removed and git commit called, git-annex should remove the
+ relevant metadata from the files. **possibly** It's not clear that
+ removing a file should nuke all the metadata used to filter it into the
+ branch (especially if it's derived metadata like the year).
+ Also, this is not usable in direct mode because deleting the
+ file.. actually deletes it.
+* `git annex sync` should avoid pushing out the filter branch, but
+ it should check if there are changes to the metadata pulled in, and update
+ the branch to reflect them.
+* If `git annex add` adds a file, it gets all the metadata of the filter
+ branch it's added to. If it's in a relevent directory (like fosdem-2014),
+ it gets that metadata automatically recorded as well.
+
+# other uses for metadata
+
+Uses are not limited to filter branches.
+
+`git annex checkoutmeta year=2014 talk` in a subdir of master could create the
+same tree of files filter would. The user can then commit that if desired.
+Or, they could run additional commands like `git annex fadd` to refine the
+tree of files in the subdir.
+
+Other programs could query git-annex for the metadata of files in the work
+tree, and do whatever it wants with it.
+
+# filenames
+
+The hard part of this is actually getting a useful filename to put in the
+filter branch, since git-annex only has a key which the user will not
+want to see.
+
+* Could use filename metadata for the key, recorded by git-annex add (which
+ may not correspond to filenames being used in regular git branches like
+ master for the key).
+* Couod use the .map files to get a filename, but this is somewhat
+ arbitrary (.map can contain multiple filenames), and is only
+ currently supported in direct mode.
+
+# efficient metadata lookup
+
+Looking up metadata for filtering so far requires traversing all keys in
+the git-annex branch. This is slow. A fast cache is needed.