diff options
Diffstat (limited to 'doc')
48 files changed, 566 insertions, 47 deletions
diff --git a/doc/bugs/Auto-repair_greatly_slows_down_the_machine.mdwn b/doc/bugs/Auto-repair_greatly_slows_down_the_machine.mdwn new file mode 100644 index 000000000..58d436898 --- /dev/null +++ b/doc/bugs/Auto-repair_greatly_slows_down_the_machine.mdwn @@ -0,0 +1,19 @@ +### Please describe the problem. + +The assistant regulary ends up trying to perform repair (I don't know why, it happens fairly often, once a week or so). When it does so, it ends up creating a huge (2.4G) .git/objects directory, and a git prune-packed process uses so much I/O the machine really slows down. + +### What steps will reproduce the problem? + +I don't have any reliable way to reproduce it. The repository ends up being attempted to be repaired around once a week. This week the repair (and the slowdown) also happened on a second computer. + +### What version of git-annex are you using? On what operating system? + +git-annex version: 5.20140221-gbdfc8e1 (using the standalone 64bit builds) + +This is on an up-to-date Arch Linux. It also happened on Fedora 20. + +### Please provide any additional information below. + +The daemon.log is fairly long, but not particulary interesting: [[https://ssl.zerodogg.org/~zerodogg/private/tmp/daemon.log-2014-02-25.1]] + +The «resource vanished (Broken pipe)» at the end is the result of me killing the prune-packed in order to be able to use the machine again. diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces.mdwn b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces.mdwn index 70a573ff4..c40a90feb 100644 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces.mdwn +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces.mdwn @@ -18,3 +18,5 @@ show these then running, git annex dropunused 1-3 --force reports ok for each drop operation but rerunning git annex unused --from cloud still shows these three files as unused. I am using git-annex on mac os x (current dmg) on a direct repo. I have similar problems dropping files on the current repo even though I drop unused they still show up as unused. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment index fa41b59a7..fa41b59a7 100644 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment index 5f5694c00..5f5694c00 100644 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment index 86e3bd2c1..86e3bd2c1 100644 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment index 6459ee8d7..6459ee8d7 100644 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment index 4ad4d6f8b..4ad4d6f8b 100644 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment index fbd9ed55c..fbd9ed55c 100644 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment diff --git a/doc/bugs/Creating_a_box.com_repository_fails.mdwn b/doc/bugs/Creating_a_box.com_repository_fails.mdwn index 75d59c9bc..ecebd7a00 100644 --- a/doc/bugs/Creating_a_box.com_repository_fails.mdwn +++ b/doc/bugs/Creating_a_box.com_repository_fails.mdwn @@ -35,3 +35,7 @@ ubuntu 13.10 (saucy), i686 > Seems that [DAV-0.6 is badly broken](http://bugs.debian.org/737902). > I have adjusted the cabal file to refuse to build with that broken > version. +> +>> Update: Had to work around additional breakage in DAV-0.6. It's +>> fully tested and working now, although not yet uploaded to Debian +>> unstable. [[done]] --[[Joey]] diff --git a/doc/bugs/Creating_a_box.com_repository_fails/comment_7_73f71386f8eafbb65f4cc9769021710f._comment b/doc/bugs/Creating_a_box.com_repository_fails/comment_7_73f71386f8eafbb65f4cc9769021710f._comment new file mode 100644 index 000000000..016371346 --- /dev/null +++ b/doc/bugs/Creating_a_box.com_repository_fails/comment_7_73f71386f8eafbb65f4cc9769021710f._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawk9nck8WX8-ADF3Fdh5vFo4Qrw1I_bJcR8" + nickname="Jon Ander" + subject="comment 7" + date="2014-02-24T13:20:27Z" + content=""" +This is what I get in the log in version 5.20140221 in Debian Sid: + + 100% 46.5KB/s 0sInternalIOException <socket: 28>: hPutBuf: illegal operation (handle is closed) + InternalIOException <socket: 25>: hPutBuf: illegal operation (handle is closed) + +It seams that the file is being uploaded (folders are being created in box.com) but it crashes when reaching 100% +"""]] diff --git a/doc/bugs/Mac_OS_git_version_still_too_old_for_.gitignore__63__/comment_3_f199ac6ae2448949ef0779177cf0ef58._comment b/doc/bugs/Mac_OS_git_version_still_too_old_for_.gitignore__63__/comment_3_f199ac6ae2448949ef0779177cf0ef58._comment new file mode 100644 index 000000000..591d4e80f --- /dev/null +++ b/doc/bugs/Mac_OS_git_version_still_too_old_for_.gitignore__63__/comment_3_f199ac6ae2448949ef0779177cf0ef58._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawmZgZuUhZlHpd_AbbcixY0QQiutb2I7GWY" + nickname="Jimmy" + subject="comment 3" + date="2014-02-21T22:05:06Z" + content=""" +And yep, it's fixed in 5.20140221-g1a47f5f. Thanks guys! +"""]] diff --git a/doc/bugs/git_annex_sync_--content_not_syncing_all_objects/comment_3_d7349af488008e3ca6557e0c1fbfc5b6._comment b/doc/bugs/git_annex_sync_--content_not_syncing_all_objects/comment_3_d7349af488008e3ca6557e0c1fbfc5b6._comment new file mode 100644 index 000000000..34c2c4c16 --- /dev/null +++ b/doc/bugs/git_annex_sync_--content_not_syncing_all_objects/comment_3_d7349af488008e3ca6557e0c1fbfc5b6._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="stp" + ip="84.56.21.11" + subject="Ídea" + date="2014-02-23T14:25:22Z" + content=""" +I thought about the implementation need for git annex sync --content --all. If preferred content expressions would work it would be needed. Everything else. could be done via a split usage. +Run \"git annex sync --content\" to satisfy the preferred content expressions on the working tree and the numcopies on the working tree and then loop through all backup/archive repositories with \"git annex get --auto\" this should at least prevent archives from getting objects numcopies is already satisfying and sync the objects not yet satisfied right? +"""]] diff --git a/doc/bugs/pages_of_packfile_errors.mdwn b/doc/bugs/pages_of_packfile_errors.mdwn new file mode 100644 index 000000000..9d60dd2aa --- /dev/null +++ b/doc/bugs/pages_of_packfile_errors.mdwn @@ -0,0 +1,30 @@ +### Please describe the problem. + +A repair that runs for ages. In the log file, pages and pages and pages of: + +error: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack does not match index +warning: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack cannot be accessed +error: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack does not match index +warning: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack cannot be accessed +error: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack does not match index +warning: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack cannot be accessed +error: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack does not match index +warning: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack cannot be accessed + +### What steps will reproduce the problem? + +Running git-annex, plugging in my external drive + +### What version of git-annex are you using? On what operating system? + +Auto-updated latest, I thought, but the about page says: Version: 5.20131230-g9a495e6 + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] diff --git a/doc/bugs/pages_of_packfile_errors/comment_1_eb2989112b38bb27ce8f691dd5d318e5._comment b/doc/bugs/pages_of_packfile_errors/comment_1_eb2989112b38bb27ce8f691dd5d318e5._comment new file mode 100644 index 000000000..d74470ffd --- /dev/null +++ b/doc/bugs/pages_of_packfile_errors/comment_1_eb2989112b38bb27ce8f691dd5d318e5._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-24T18:32:31Z" + content=""" +Well, you seem to have a corrupt git repository on your removable drive. git-annex seems to be in the process of repairing it, which can take some time. + +I don't see a bug here, from what you've described so far.. +"""]] diff --git a/doc/bugs/pages_of_packfile_errors/comment_2_69fba53035ebea213ae1c11be5326690._comment b/doc/bugs/pages_of_packfile_errors/comment_2_69fba53035ebea213ae1c11be5326690._comment new file mode 100644 index 000000000..facae6496 --- /dev/null +++ b/doc/bugs/pages_of_packfile_errors/comment_2_69fba53035ebea213ae1c11be5326690._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkQafKy7hNSEolLs6TvbgUnkklTctUY9LI" + nickname="Zellyn" + subject="sounds good" + date="2014-02-24T19:39:12Z" + content=""" +Is it normal for the same error to repeat thousands of times like that in the log? +"""]] diff --git a/doc/bugs/pages_of_packfile_errors/comment_3_73b9f574e8ce36d5e0d0f6c6a89006b7._comment b/doc/bugs/pages_of_packfile_errors/comment_3_73b9f574e8ce36d5e0d0f6c6a89006b7._comment new file mode 100644 index 000000000..f0e6bce0b --- /dev/null +++ b/doc/bugs/pages_of_packfile_errors/comment_3_73b9f574e8ce36d5e0d0f6c6a89006b7._comment @@ -0,0 +1,39 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 3" + date="2014-02-24T23:39:46Z" + content=""" +Well, if there's a bug here, it might be that this particular problem has caused the repair process to loop repeatedly trying to unpack a pack file. +I don't see how that could happen, looking at the code it will try to unpack each pack file only once. + +If you run `git annex repair --debug`, you can see the git commands it runs, and so see if it's somehow looping. When I do this with some corrupt pack files (actually, I swapped one pack file for another one), I see, for example: + +<pre> +[2014-02-24 19:11:42 JEST] feed: git [\"--git-dir=/home/joey/tmp/git/.git\",\"--work-tree=/home/joey/tmp/git\",\"unpack-objects\",\"-r\"] +error: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack claims to have 862 objects while index indicates 1431 objects +warning: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack cannot be accessed +error: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack claims to have 862 objects while index indicates 1431 objects +warning: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack cannot be accessed +error: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack claims to have 862 objects while index indicates 1431 objects +... +</pre> + +Which shows that git-annex only ran `git unpack-objects -r` once, and yet it printed out the same error repeatedly. + +One possibility is a problem using `-r`, which makes it keep going on errors. Which seemed like a good idea at the time to unpack as much as possible from a damaged file. It might be that `git unpack-objects` is itself getting stuck in some kind of loop with the -r. + +In my case, it did not get stuck; it eventually quit and it moved on to the next pack file, after 900-some repitions of the error message: + +<pre> +[2014-02-24 19:16:47 JEST] feed: git [\"--git-dir=/home/joey/tmp/git/.git\",\"--work-tree=/home/joey/tmp/git\",\"unpack-objects\",\"-r\"] +error: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack claims to have 862 objects while index indicates 1431 objects +warning: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack cannot be accessed +</pre> + +Intesting that it's again complaining about the same pack file, despite having moved from one pack file on to the next one. I think what's going on here is while unpacking pack files A..Y (which may all be fine), it's checking pack file Z, which is corrupt, to see if the objects exist in it, and complaining each time. + +So, I can improve this a lot by moving *all* the pack files out of the way before trying to unpack any of them. In my test case, that completely eliminated the errors, and probably also sped it up a bit. + +If I were you, I'd either try stopping your running git-annex and run `git annex repair --debug` and analize the log like I did above, or get the next daily build which has that change, and see if it helps in your case. +"""]] diff --git a/doc/design/metadata.mdwn b/doc/design/metadata.mdwn index db0d51c5c..7d1ff4bfa 100644 --- a/doc/design/metadata.mdwn +++ b/doc/design/metadata.mdwn @@ -29,7 +29,7 @@ directories nest. relevant metadata from the files. TODO: It's not clear that removing a file should nuke all the metadata used to filter it into the - branch (especially if it's derived metadata like the year). + branch Currently, only metadata used for visible subdirs is added and removed this way. Also, this is not usable in direct mode because deleting the @@ -56,21 +56,9 @@ For example, by examining MP3 metadata. Also auto add metadata when adding files to view branches. See below. -## derived metadata +## directory hierarchy metadata -This is probably not stored anywhere. It's computed on demand by a pure -function from the other metadata. -(Should be a general mechanism for this. (It probably generalizes to -sql queries if we want to go that far.)) - -### data metadata - -TODO From the ctime, some additional -metadata is derived, at least year=yyyy and probably also month, etc. - -### directory hierarchy metadata - -TODO From the original filename used in the master branch, when +From the original filename used in the master branch, when constructing a view, generate fields. For example foo/bar/baz.mp3 would get /=foo, foo/=bar, foo/bar/=baz, and .=mp3. @@ -82,11 +70,10 @@ This allows using whatever directory hierarchy exists to inform the view, without locking the view into using it. Complication: When refining a view, it only looks at the filenames in -the view, so it would need to map from +the view, so it has to map from those filenames to derive the same metadata, unless there is persistent storage. Luckily, the filenames used in the views currently include the -subdirs (although not quite in a parseable format, would need some small -changes). +subdirs. # other uses for metadata @@ -185,14 +172,15 @@ So, possible approaches: * Git has a complex set of rules for what is legal in a ref name. View branch names will need to filter out any illegal stuff. **done** +* Metadata should be copied to the new key when adding a modified version + of a file. **done** + * Filesystems that are not case sensative (including case preserving OSX) will cause problems if view branches try to use different cases for - 2 directories representing the value of some metadata. But, users - probably want at least case-preserving metadata values. + 2 directories representing a metadata field. - Solution might be to compare metadata case-insensitively, and - pick one representation consistently, so if, for example an author - field uses mixed case, it will be used in the view branch. + Solution might be to compare fields names case-insensitively, and + pick one representation consistently. Alternatively, it could escape `A` to `_A` when such a filesystem is detected and avoid collisions that way (double `_` to escape it). @@ -207,3 +195,7 @@ So, possible approaches: * What happens if git annex add or the assistant add a new file while on a view? If the file is not also added to the master branch, it will be lost when exiting the view. TODO + +* The filename mangling can result in a filename in a view + that is too long for its containing filesystem. Should detect and do + something reasonable to avoid. TODO diff --git a/doc/design/metadata/comment_1_22ed80bd8eabaa836e9dfc2432531f04._comment b/doc/design/metadata/comment_1_22ed80bd8eabaa836e9dfc2432531f04._comment new file mode 100644 index 000000000..493db4339 --- /dev/null +++ b/doc/design/metadata/comment_1_22ed80bd8eabaa836e9dfc2432531f04._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawm3vKzS4eOWYpKMoYXqMIjNsIg_nYF-loU" + nickname="Konubinix" + subject="Already existing metadata implementation " + date="2014-02-22T21:45:25Z" + content=""" +Hi, + +I love the idea behing storing metadata. + +I suggest to exchange ideas (and maybe code) with projects already implementing metadata systems. + +I have tried several implementations and particularly noticed tmsu (http://tmsu.org/). This tool stores tags into a sqlite database and uses also a SHA-256 fingerprint of the file to be aware of file moves. It provides a fuse view of the tags with the ability to change tags by moving files (like in the git annex metadata view). + +Paul Ruane is particularly responsive on the mailing list and he already supports git annexed files (with SHAE-256 fingerprint) (see the end of the thread https://groups.google.com/forum/#!topic/tmsu/A5EGpnCcJ2w). + +Even if you cannot reuse the project, they are interresting ideas that might be worth looking at like the implications of tags: a file tagged \"film\" being automatically tagged \"video\". + +Tagsistant (http://www.tagsistant.net/) may also be a good source of inspirations. I just don't like the fact that it uses a backstore of tagged files. + +Thanks for reading. +"""]] diff --git a/doc/design/metadata/comment_2_03ae28acedbe1fa45c366b30b58fcf48._comment b/doc/design/metadata/comment_2_03ae28acedbe1fa45c366b30b58fcf48._comment new file mode 100644 index 000000000..c222f75d3 --- /dev/null +++ b/doc/design/metadata/comment_2_03ae28acedbe1fa45c366b30b58fcf48._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" + nickname="Jimmy" + subject="comment 2" + date="2014-02-25T09:51:17Z" + content=""" +Some additional ideas for metadata... + +Instead of having a simplistic scheme like 'field=value' it might be advantageous to consider a scheme like 'attribute=XXX, value=YYY, unit=ZZZ' that way you could do intesting things with the metadata like adding counters to things, and allow for doing interesting queries like give me all 'things' tagged with a unit of \"audio_file\", this assumes one had trawled through an entire annex and then tagged all files based on type with the unix file tool or something like that. + +The above idea is already in use in irods and its a really nice and powerful way to let users add meta-data and to build up more interesting use cases and tools. + +btw, I plan on taking a look at seeing if I can map some of the meta that we have in work into this new git-annex feature to see how well/bad it works. Either way this feature looks cool! +1!!! +"""]] diff --git a/doc/design/metadata/comment_3_ee850df7d3fa4c56194f13a6e3890a30._comment b/doc/design/metadata/comment_3_ee850df7d3fa4c56194f13a6e3890a30._comment new file mode 100644 index 000000000..f77cd8611 --- /dev/null +++ b/doc/design/metadata/comment_3_ee850df7d3fa4c56194f13a6e3890a30._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" + nickname="Jimmy" + subject="comment 3" + date="2014-02-25T09:57:09Z" + content=""" +actually in your mp3 example you could have .... + +ATTRIBUTE=sample_rate, VALUE=22100, UNIT=Hertz + +another example use case is to always be consistent with the AVU order then you could stick in ntriples from RDF to do other cool things by looking up various linked data sources -- see http://www.w3.org/2001/sw/RDFCore/ntriples/ and http://www.freebase.com/, actually this would be quite cool if git-annex examined the mp3's id3 tag, the created an ntriple styled entry can be automatically parsed with the web-based annex gui and automatically pull in additional meta-data from the likes of freebase. I guess the list of ideas can just only get bigger with this potential metadata capability. +"""]] diff --git a/doc/design/roadmap.mdwn b/doc/design/roadmap.mdwn index e6ad21fee..0f0df4496 100644 --- a/doc/design/roadmap.mdwn +++ b/doc/design/roadmap.mdwn @@ -6,10 +6,10 @@ Now in the * Month 1 [[!traillink assistant/encrypted_git_remotes]] * Month 2 [[!traillink assistant/disaster_recovery]] -* Month 3 user-driven features and polishing [[todo/direct_mode_guard]] [[assistant/upgrading]] -* Month 4 [[Windows_webapp|assistant/Windows]], Linux arm, [[todo/support_for_writing_external_special_remotes]] +* Month 3 user-driven features and polishing [[!traillink todo/direct_mode_guard]] [[!traillink assistant/upgrading]] +* Month 4 [[!traillink assistant/windows text="Windows webapp"]], Linux arm, [[!traillink todo/support_for_writing_external_special_remotes]] * Month 5 user-driven features and polishing -* **Month 6 get Windows out of beta, [[metadata and views|design/metadata]]** +* **Month 6 get Windows out of beta, [[!traillink design/metadata text="metadata and views"]]** * Month 7 user-driven features and polishing * Month 8 [[!traillink assistant/telehash]] * Month 9 [[!traillink assistant/gpgkeys]] [[!traillink assistant/sshpassword]] diff --git a/doc/devblog/day_-4__forgetting/comment_7_a865216046aa91a47d0d2b2f0668ea89._comment b/doc/devblog/day_-4__forgetting/comment_7_a865216046aa91a47d0d2b2f0668ea89._comment new file mode 100644 index 000000000..dc142edc0 --- /dev/null +++ b/doc/devblog/day_-4__forgetting/comment_7_a865216046aa91a47d0d2b2f0668ea89._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="stp" + ip="84.56.21.11" + subject="New findings" + date="2014-02-24T12:28:03Z" + content=""" +Another thing I found, which was annoying is that I have objects in my annex not tracked anywhere it seems. +\"git annex fsck --all\" complains about not having access to the object. \"git log --stat -S '$key'\" doesn't have any record. \"git annex fsck\" has no issues and \"git annex unused\" comes up empty too. +I'm not sure where these objects still reside or why how to remove this annoying failure. + +So not only should \"git annex forget $key\" remove references from within all branches, but should also clean up the aforementioned loose objects, which are neither unused, nor available, nor referenced. +"""]] diff --git a/doc/devblog/day_120__more_metadata.mdwn b/doc/devblog/day_120__more_metadata.mdwn new file mode 100644 index 000000000..daff68e37 --- /dev/null +++ b/doc/devblog/day_120__more_metadata.mdwn @@ -0,0 +1,17 @@ +When generating a view, there's now a way to reuse part of the directory +hierarchy of the parent branch. For example, `git annex view tag=* podcasts/=*` +makes a view where the first level is the tags, and the second level is +whatever `podcasts/*` directories the files were in. + +Also, year and month metadata can be automatically recorded when +adding files to the annex. I made this only be done when annex.genmetadata +is turned on, to avoid polluting repositories that don't want to use metadata. + +It would be nice if there was a way to add a hook script that's run +when files are added, to collect their metadata. I am not sure yet if +I am going to add that to git-annex though. It's already possible to do via +the regular git `post-commit` hook. Just make it look at the commit to see +what files were added, and then run `git annex metadata` to set their +metadata appropriately. It would be good to at least have an example of +such a script to eg, extract EXIF or ID3 metadata. Perhaps someone can +contribute one? diff --git a/doc/devblog/day_121__special_remote_maintenance.mdwn b/doc/devblog/day_121__special_remote_maintenance.mdwn new file mode 100644 index 000000000..551704885 --- /dev/null +++ b/doc/devblog/day_121__special_remote_maintenance.mdwn @@ -0,0 +1,23 @@ +Turns out that in the last release I broke making box.com, Amazon S3 and +Glacier remotes from the webapp. Fixed that. + +Also, dealt with changes in the haskell DAV library that broke support for +box.com, and worked around an exception handling bug in the library. + +I think I should try to enhance the test suite so it can run live tests +on special remotes, which would at least have caught the some of these +recent problems... + +---- + +Since metadata is tied to a particular key, editing an annexed file, +which causes the key to change, made the metadata seem to get lost. + +I've now fixed this; it copies the metadata from the old version to the new +one. (Taking care to copy the log file identically, so git can reuse its +blob.) + +That meant that `git annex add` has to check every file it adds to see if +there's an old version. Happily, that check is fairly fast; I benchmarked my +laptop running 2500 such checks a second. So it's not going to slow things +down appreciably. diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo.mdwn b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo.mdwn new file mode 100644 index 000000000..c94172535 --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo.mdwn @@ -0,0 +1 @@ +Is it possible to convert a regular git annex repo (git clone then git annex init in the folder), to an rsync remote. I have an annex with alot of remotes which makes the sync operation take a really long time. I would like to convert some of those remotes to rsync. This particular repo has a TB of data so I would like to avoid dropping content from the remote than re download everything. diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_1_e6065f9c44c85030c7628e2cfa0fd0fa._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_1_e6065f9c44c85030c7628e2cfa0fd0fa._comment new file mode 100644 index 000000000..46ba67c30 --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_1_e6065f9c44c85030c7628e2cfa0fd0fa._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-23T18:57:42Z" + content=""" +This is doable. It works best if the remote repo is a bare git repository, because then the filenames line up 100% with the filenames used in a rsync special remote. If the git repo is not bare, the rsync special remote will first try the paths it expects, and only then fall back to the right paths, so a little extra work done. (If this became a big problem, it would not be infesable to move the files around with a script.) + +Anyway, if it's a bare repo, then repo.git/annex/objects is where you want to point the rsync special remote at. With a non-bare repo, repo/.git/annex/objects/ is the location. I'd recommend moving the objects directory out to a new location, and pointing the rsyncurl at that. This way, there's no possibility of git-annex thinking one files accessed 2 ways is 2 copies. + +Of course, you can't use encryption for the rsync special remote. +"""]] diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_2_76bfb11396dc20a5105376b22e7e773b._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_2_76bfb11396dc20a5105376b22e7e773b._comment new file mode 100644 index 000000000..8ed4f6508 --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_2_76bfb11396dc20a5105376b22e7e773b._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 2" + date="2014-02-23T19:07:59Z" + content=""" +However, if the only problem is that pushing and pulling with a git repository makes `git annex sync` take too long, another option is setting `git config remote.$foo.annex-sync false`. You can still then use git-annex commands to get and push data to the remote, and can even `git annex sync $foo` from time to time, but it won't slow down the normal `git annex sync`. + +However, this also prevents the assistant from uploading new files to the remote automatically. +"""]] diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_3_b34d6ae0718ab0ff6bc1d7b8f2470b9b._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_3_b34d6ae0718ab0ff6bc1d7b8f2470b9b._comment new file mode 100644 index 000000000..a8040cd3f --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_3_b34d6ae0718ab0ff6bc1d7b8f2470b9b._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" + nickname="Hamza" + subject="comment 3" + date="2014-02-23T19:39:28Z" + content=""" +Thanks for the reply, just to make sure I got you right, + +It is indeed a non bare git repo. So I will move the folder repo/.git/annex/objects/ to repo/ + +then run, + +git annex initremote myrsync type=rsync rsyncurl=ssh.example.com:~/repo + +and enable the remote on other annexes (disks are connected to an ssh server there is no encryption setup right now so I do not mind not having it.). And everything should be setup correctly. +"""]] diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_4_8f5e323b29745591f9f2f0f867353f69._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_4_8f5e323b29745591f9f2f0f867353f69._comment new file mode 100644 index 000000000..c8305aaa9 --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_4_8f5e323b29745591f9f2f0f867353f69._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" + nickname="Hamza" + subject="comment 4" + date="2014-02-23T19:45:09Z" + content=""" +and as a follow up do I have rename the repos or can I reuse the same names for the repos? +"""]] diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_5_9824c953694770afa0611ff7276737bf._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_5_9824c953694770afa0611ff7276737bf._comment new file mode 100644 index 000000000..c6052232b --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_5_9824c953694770afa0611ff7276737bf._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 5" + date="2014-02-24T19:07:33Z" + content=""" +That looks all-right, although initremote will ask you to tell it what encryption to use, and you'll need to specify `encryption=none` + +One thing I forgot to mention is that the UUID of the new rsync repository won't be the same, so git-annex won't know about the files in there. This can be fixed by `git annex fsck --fast --from myrsync`. Which doesn't re-download all the files, but you still may want to run it on a repository close to or on the server for speed. + +You can re-use the name you're currently using for the git remote for the new rsync special remote if you like. +"""]] diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_6_5899741cb7f83e1b22c5ee3509c5ff21._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_6_5899741cb7f83e1b22c5ee3509c5ff21._comment new file mode 100644 index 000000000..8b7c3b52e --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_6_5899741cb7f83e1b22c5ee3509c5ff21._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" + nickname="Hamza" + subject="comment 6" + date="2014-02-25T09:34:16Z" + content=""" +assuming the remote I am converting is called some-repo should mark it as dead before converting and reinitting as rsync some-repo again? +"""]] diff --git a/doc/forum/Find_files_that_lack_a_certain_field_in_metadata.mdwn b/doc/forum/Find_files_that_lack_a_certain_field_in_metadata.mdwn new file mode 100644 index 000000000..b0fe9ddaa --- /dev/null +++ b/doc/forum/Find_files_that_lack_a_certain_field_in_metadata.mdwn @@ -0,0 +1,5 @@ +Is there any way to find all files that do not have a certain field assigned in metadata. E.g. I want to find all files that do not have an author field set and + + git-annex find --not --metadata "author=*" + +doesn't give any results. diff --git a/doc/forum/Find_files_that_lack_a_certain_field_in_metadata/comment_1_476e52563ccd3ad1b43e3a2da4dfaa82._comment b/doc/forum/Find_files_that_lack_a_certain_field_in_metadata/comment_1_476e52563ccd3ad1b43e3a2da4dfaa82._comment new file mode 100644 index 000000000..fbfb6ebb6 --- /dev/null +++ b/doc/forum/Find_files_that_lack_a_certain_field_in_metadata/comment_1_476e52563ccd3ad1b43e3a2da4dfaa82._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-21T22:36:51Z" + content=""" +--metadata does not support globs, so your example is asking for all files that don't have an author field with a literal \"*\" value. When I try that command, it lists all files ... as expected. + +It seems that adding glob support to it would get to the result you want, and makes sense to parallel git annex view. Change made in git! +"""]] diff --git a/doc/forum/Too_big_to_fsck.mdwn b/doc/forum/Too_big_to_fsck.mdwn new file mode 100644 index 000000000..975674b5c --- /dev/null +++ b/doc/forum/Too_big_to_fsck.mdwn @@ -0,0 +1,20 @@ +Hi, + +My Webapp isn't working: + + $ git-annex webapp error: refs/gcrypt/gitception+ does not point to a valid object! + error: refs/remotes/Beta/git-annex does not point to a valid object! + error: refs/remotes/Beta/master does not point to a valid object! + fatal: unable to read tree 656e7db5be172f01c0b6994d01f1a08d1273af12 + +So I tried to repair it: + + $ git-annex repair Running git fsck ... + Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it. + +So I tried to follow your advice here and increase the stack: + + $ git-annex +RTS -K35000000 -RTS fsck + git-annex: Most RTS options are disabled. Link with -rtsopts to enable them. + +I wasn't sure what to do next, so any help would be appreciated. diff --git a/doc/forum/Too_big_to_fsck/comment_1_490b8bfe95b01a23408ecb5d63dcd40b._comment b/doc/forum/Too_big_to_fsck/comment_1_490b8bfe95b01a23408ecb5d63dcd40b._comment new file mode 100644 index 000000000..73d218bac --- /dev/null +++ b/doc/forum/Too_big_to_fsck/comment_1_490b8bfe95b01a23408ecb5d63dcd40b._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-23T18:51:45Z" + content=""" +I suspect that git fsck is outputting so many lines about problems that it's taking more memory than it's limited to using to hold them all. + +Can you paste the output of: git fsck --no-dangling --no-reflogs +"""]] diff --git a/doc/forum/Too_big_to_fsck/comment_2_2666c135dd3378cf6301aa4957049fbd._comment b/doc/forum/Too_big_to_fsck/comment_2_2666c135dd3378cf6301aa4957049fbd._comment new file mode 100644 index 000000000..f2028c91f --- /dev/null +++ b/doc/forum/Too_big_to_fsck/comment_2_2666c135dd3378cf6301aa4957049fbd._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 2" + date="2014-02-23T19:09:30Z" + content=""" +Erm, that output is liable to be big, I only care how many lines and characters of output there are! + + git fsck --no-dangling --no-reflogs |wc +"""]] diff --git a/doc/forum/Too_big_to_fsck/comment_3_dfb169c441215b671f8c971184de3e16._comment b/doc/forum/Too_big_to_fsck/comment_3_dfb169c441215b671f8c971184de3e16._comment new file mode 100644 index 000000000..96b1ac9cd --- /dev/null +++ b/doc/forum/Too_big_to_fsck/comment_3_dfb169c441215b671f8c971184de3e16._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 3" + date="2014-02-23T19:12:10Z" + content=""" +Also, you can build git-annex from source with the RTS options enabled by running `cabal install git-annex --ghc-options=-rtsopts` + +(or just build git-repair which has the repository repair parts of git-annex) +"""]] diff --git a/doc/forum/performance_and_multiple_replication_problems/comment_3_ad7cb4c510e2ab26959ea7cb40a43fef._comment b/doc/forum/performance_and_multiple_replication_problems/comment_3_ad7cb4c510e2ab26959ea7cb40a43fef._comment new file mode 100644 index 000000000..6e4e1b1c6 --- /dev/null +++ b/doc/forum/performance_and_multiple_replication_problems/comment_3_ad7cb4c510e2ab26959ea7cb40a43fef._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawnNqLKszWk9EoD4CDCqNXJRIklKFBCN1Ao" + nickname="maurizio" + subject="the startup check is not a small issue" + date="2014-02-25T11:37:15Z" + content=""" +I would like to add that this startup check has probably been a blocker for my use case for a long long time. I tried to use git-annex to synchronize a huge number of files, most of them never changing. My plan was to have a few tens of GB of data which more or less never change in an archive directory and then add from time to time new data (by batches of a few hundreds of files, each of them not necessarily very large) to the annex. Once this new data has been processed or otherwise become less immediately useful, it would be shifted to the archive. It would have been very useful to have such a setup, because the amount of data is too large to be replicated everywhere, especially on a laptop. After finding this post I finally understand that the seemingly never ending \"performing startup scan\" that I observed are probably not due to the assistant somehow hanging, contrary to what I thought. It seems it is just normal operation. The problem is that this normal operation makes it unusable for the use case I was considering, since it does not make much sense to have git-annex scanning about 10^6 files or links on every boot of a laptop. On my workstation this \"startup scan\" has now been running for close to one hour now and is not finished yet, this is not thinkable on laptop boot. + +Maybe an analysis of how well git-annex operation scales with number of files should be part of the documentation, since \"large files\" is not the only issue when trying to sync different computers. One finds references to \"very large number of files\" about annex.queuesize, but \"very large\" has no clear meaning. One also finds a reference to \"1 million files\" being a bit of a git limitation on comments of a bug report <https://git-annex.branchable.com/bugs/Stress_test/>. + +Orders of magnitude of the number of files that git-annex is supposed to be able to handle would be very useful. + + +"""]] diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 0912e0b2a..441da7b98 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -715,20 +715,29 @@ subdirectories). git annex metadata annexscreencast.ogv -t video -t screencast -s author+=Alice -* `view [field=value ...] [tag ...]` +* `view [tag ...] [field=value ...] [location/=value]` Uses metadata to build a view branch of the files in the current branch, and checks out the view branch. Only files in the current branch whose metadata matches all the specified field values and tags will be shown in the view. + + Once within a view, you can make additional directories, and + copy or move files into them. When you commit, the metadata will + be updated to correspond to your changes. Multiple values for a metadata field can be specified, either by using a glob (`field="*"`) or by listing each wanted value. The resulting view will put files in subdirectories according to the value of their fields. - Once within a view, you can make additional directories, and - copy or move files into them. When you commit, the metadata will - be updated to correspond to your changes. + There are fields corresponding to the path to the file. So a file + "foo/bar/baz/file" has fields "/=foo", "foo/=bar", and "foo/bar/=baz". + These location fields can be used the same as other metadata to construct + the view. + + For example, `/=podcasts` will only include files from the podcasts + directory in the view, while `podcasts/=*` will preserve the + subdirectories of the podcasts directory in the view. * `vpop [N]` @@ -737,12 +746,12 @@ subdirectories). The optional number tells how many views to pop. -* `vfilter [field=value ...] [tag ...]` +* `vfilter [tag ...] [field=value ...] [location/=value]` Filters the current view to only the files that have the - specified values and tags. + specified field values, tags, and locations. -* `vadd [field=glob ...]` +* `vadd [field=glob ...] [location/=glob]` Changes the current view, adding an additional level of directories to categorize the files. @@ -942,7 +951,7 @@ subdirectories). Rather than the normal output, generate JSON. This is intended to be parsed by programs that use git-annex. Each line of output is a JSON object. Note that JSON output is only usable with some git-annex commands, - like info, find, and whereis. + like info, find, whereis, and metadata. * `--debug` @@ -1133,10 +1142,11 @@ file contents are present at either of two repositories. The size can be specified with any commonly used units, for example, "0.5 gb" or "100 KiloBytes" -* `--metadata field=value` +* `--metadata field=glob` - Matches only files that have a metadata field attached with the specified - value. + Matches only files that have a metadata field attached with a value that + matches the glob. The values of metadata fields are matched case + insensitively. * `--want-get` @@ -1269,6 +1279,12 @@ Here are all the supported configuration settings. Note that setting numcopies to 0 is very unsafe. +* `annex.genmetadata` + + Set this to `true` to make git-annex automatically generate some metadata + when adding files to the repository. In particular, it stores + year and month metadata, from the file's modification date. + * `annex.queuesize` git-annex builds a queue of git commands, in order to combine similar diff --git a/doc/install/fromscratch.mdwn b/doc/install/fromscratch.mdwn index 2c8bf4b71..6cc2d90c6 100644 --- a/doc/install/fromscratch.mdwn +++ b/doc/install/fromscratch.mdwn @@ -5,6 +5,7 @@ quite a lot. * [The Haskell Platform](http://haskell.org/platform/) (GHC 7.4 or newer) * [mtl](http://hackage.haskell.org.package/mtl) (2.1.1 or newer) * [MissingH](http://github.com/jgoerzen/missingh/wiki) + * [data-default](http://hackage.haskell.org/package/data-default) * [utf8-string](http://hackage.haskell.org/package/utf8-string) * [SHA](http://hackage.haskell.org/package/SHA) * [cryptohash](http://hackage.haskell.org/package/cryptohash) (optional but recommended) diff --git a/doc/metadata.mdwn b/doc/metadata.mdwn new file mode 100644 index 000000000..d3c3b748e --- /dev/null +++ b/doc/metadata.mdwn @@ -0,0 +1,41 @@ +git-annex allows you to store arbitrary metadata about files stored in the +git-annex repository. The metadata is stored in the `git-annex` branch, and +so is automatically kept in sync with the rest of git-annex's state, such +as [[location_tracking]] information. + +Some of the things you can do with metadata include: + +* Using `git annex metadata file` to show all + the metadata associated with a file. +* [[tips/metadata_driven_views]] +* Limiting the files git-annex commands act on to those with + or without particular metadata. + For example `git annex find --metadata tag=foo --or --metadata tag=bar` +* Using it in [[preferred_content]] expressions. + For example "tag=important or not author=me" + +Each file (actually the underlying key) can have any number of metadata +fields, which each can have any number of values. For example, to tag +files, the `tag` field is typically used, with values set to each tag that +applies to the file. + +The field names are limited to alphanumerics (and `[_-.]`). The metadata +values can contain absolutely anything you like -- but you're recommended +to keep it simple and reasonably short. + +Here are some recommended metadata fields to use: + +* `tag` - With each tag being a different value. +* `year`, `month` - When this particular version of the file came into + being. + +To make git-annex automatically set the year and month when adding files, +run `git config annex.genmetadata true` + +git-annex's metadata can be updated in a distributed fashion. For example, +two users, each with their own clone of a repository, can set and unset +metadata at the same time, even for the same field of the same file. +When they push their changes, `git annex merge` will combine their +metadata changes in a consistent and (probably) intuitive way. + +See [[the metadata design page|design/metadata]] for more details. diff --git a/doc/tips/metadata_driven_views.mdwn b/doc/tips/metadata_driven_views.mdwn index 7b46ca974..17ebc6869 100644 --- a/doc/tips/metadata_driven_views.mdwn +++ b/doc/tips/metadata_driven_views.mdwn @@ -1,5 +1,5 @@ git-annex now has support for storing -[[arbitrary metadata|design/metadata]] about annexed files. For example, this can be +[[arbitrary metadata|metadata]] about annexed files. For example, this can be used to tag files, to record the author of a file, etc. The metadata is synced around between repositories with the other information git-annex keeps track of. @@ -14,6 +14,12 @@ refine or reorder a view. Let's get started by setting some tags on files. No views yet, just some metadata: +[[!template id=note text=""" +To avoid needing to manually tag files with the year (and month), +run `annex.genmetadata true`, and git-annex will do it for you +when adding files. +"""]] + # git annex metadata --tag todo work/2014/* # git annex metadata --untag todo work/2014/done/* # git annex metadata --tag urgent work/2014/presentation_for_tomorrow.odt @@ -24,8 +30,8 @@ metadata: # git annex metadata --tag done videos/old # git annex metadata --tag new videos/lotsofcats.ogv # git annex metadata --tag sound podcasts - # git annex metadata --tag done podcasts/old - # git annex metadata --tag new podcasts/recent + # git annex metadata --tag done podcasts/*/old + # git annex metadata --tag new podcasts/*/recent So, you had a bunch of different kinds of files sorted into a directory structure. But that didn't really reflect how you approach the files. @@ -39,6 +45,12 @@ Ok, metadata is in place, but how to use it? Time to change views! Switched to branch 'views/_' ok +[[!template id=note text=""" +Notice that a single file may appear in multiple directories +depending on its tags. For example, `lotsofcats.ogv` is in +both `new/` and `video/`. +"""]] + This searched for all files with any tag, and created a new git branch that sorts the files according to their tags. @@ -51,10 +63,6 @@ that sorts the files according to their tags. video sound -Notice that a single file may appear in multiple directories -depending on its tags. For example, `lotsofcats.ogv` is in -both `new/` and `video/`. - Ah, but you're at work now, and don't want to be distracted by cat videos. Time to filter the view: @@ -81,9 +89,11 @@ all the way out of all views, you'll be back on the regular git branch you originally started from. You can also use `git checkout` to switch between views and other branches. -Beyond simple tags, you can add whatever kinds of metadata you like, and -use that metadata in more elaborate views. For example, let's add a year -field. +## fields + +Beyond simple tags and directories, you can add whatever kinds of metadata +you like, and use that metadata in more elaborate views. For example, let's +add a year field. # git checkout master # git annex metadata --set year=2014 work/2014 @@ -118,4 +128,25 @@ Oh, did you want it the other way around? Easy! |-- 2014 `-- 2013 +## location fields + +Let's switch to a view containing only new podcasts. And since the +podcasts are organized into one subdirectory per show, let's +include those subdirectories in the view. + + # git checkout master + # git annex view tag=new podcasts/=* + # tree -d + This_Developers_Life + Escape_Pod + GitMinutes + The_Haskell_Cast + StarShipSofa + +That's an example of using part of the directory layout of the original +branch to inform the view. Every file gets fields automatically set up +corresponding to the directory it's in. So a file"foo/bar/baz/file" has +fields "/=foo", "foo/=bar", and "foo/bar/=baz". These location fields +can be used the same as other metadata to construct the view. + This has probably only scratched the surface of what you can do with views. diff --git a/doc/todo/Views_Demo.mdwn b/doc/todo/Views_Demo.mdwn new file mode 100644 index 000000000..2587642e3 --- /dev/null +++ b/doc/todo/Views_Demo.mdwn @@ -0,0 +1,13 @@ +Joey, + +I've been thinking about leveraging git-annex for a workgroup document repository and I have just watched your views demo. The timing of the demo is great because I need to deploy a document repository with per-document metadata and your views concept seems like a great mechanism for associating metadata to documents and for displaying that metadata. + +While I don't expect to use your views concept for my workgroup repostory, a later iteration might do. + +The metadata in my use case begins with all the weird metadata seen on a book's copyright page. In addition, per-document provenance, like how one found the document and (if we're lucky) a URL where the latest version of the document may be found. Metadata values may be simple strings or may be markdown text. + +So, are you considering a metadata syntax that can support complex metadata? One example is multiple authors. Another issue is complex metadata values, like key=abstract and value="markdown text...". + +FWIW, + +Bob diff --git a/doc/todo/Views_Demo/comment_1_d7c83a0e9a83e4a05aa74a34a7e1cf19._comment b/doc/todo/Views_Demo/comment_1_d7c83a0e9a83e4a05aa74a34a7e1cf19._comment new file mode 100644 index 000000000..4c9b05635 --- /dev/null +++ b/doc/todo/Views_Demo/comment_1_d7c83a0e9a83e4a05aa74a34a7e1cf19._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-24T18:17:04Z" + content=""" +All that should work fine. All metadata fields are multivalued, and the value can be any arbitrary data. +"""]] diff --git a/doc/todo/ctrl_c_handling.mdwn b/doc/todo/ctrl_c_handling.mdwn new file mode 100644 index 000000000..7101d578f --- /dev/null +++ b/doc/todo/ctrl_c_handling.mdwn @@ -0,0 +1,5 @@ +Sometimes I start off a large file transfer to a new remote (a la "git-annex copy . --to glacier"). + +I believe all of the special remotes transfer the files one at a time, which is good, and provides a sensible place to interrupt a copy/move operation. + +Wish: When I press ctrl+c in the terminal, git-annex will catch that and finish it's current transfer and then exit cleanly (ie: no odd backtraces in the special remote code). For the case where the file currently being transfered also needs to be killed (ie: it's a big .iso) then subsequent ctrl+c's can do that. diff --git a/doc/todo/ctrl_c_handling/comment_1_3addbe33817db5de836c014287b14c07._comment b/doc/todo/ctrl_c_handling/comment_1_3addbe33817db5de836c014287b14c07._comment new file mode 100644 index 000000000..16139c78d --- /dev/null +++ b/doc/todo/ctrl_c_handling/comment_1_3addbe33817db5de836c014287b14c07._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-21T21:36:14Z" + content=""" +This really depends on the remote, some can resume where they were interrupted, such as rsync, and some cannot, such as glacier (and, er, encrypted rsync). +"""]] diff --git a/doc/todo/ctrl_c_handling/comment_2_cc2776dc4805421180edcdf96a89fcaa._comment b/doc/todo/ctrl_c_handling/comment_2_cc2776dc4805421180edcdf96a89fcaa._comment new file mode 100644 index 000000000..827b99afa --- /dev/null +++ b/doc/todo/ctrl_c_handling/comment_2_cc2776dc4805421180edcdf96a89fcaa._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://grossmeier.net/" + nickname="greg" + subject="very remote specific" + date="2014-02-21T22:11:16Z" + content=""" +Yeah, this is very remote specific and probably means adding the functionality there as well (eg: in the glacier.py code, not only in git-annex haskell). Maybe I should file bugs there accordingly :) +"""]] diff --git a/doc/todo/ctrl_c_handling/comment_3_8d7d357368987f5d5d59b4d8d99a0e06._comment b/doc/todo/ctrl_c_handling/comment_3_8d7d357368987f5d5d59b4d8d99a0e06._comment new file mode 100644 index 000000000..ed7e4d3b6 --- /dev/null +++ b/doc/todo/ctrl_c_handling/comment_3_8d7d357368987f5d5d59b4d8d99a0e06._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 3" + date="2014-02-21T22:34:14Z" + content=""" +Hmm, I forget if it's possible for git-annex to mask SIGINT when it runs glacier or rsync, so that the child process does not receive it, but the parent git-annex does. +"""]] |