From 9d2610f22bc8529f91de758ebc68935272cee46e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 21 Feb 2014 13:43:20 -0400 Subject: move bug from forum and close --- .../Can_not_Drop_Unused_Files_With_Spaces.mdwn | 22 ++++++++++++++++++++++ ...ent_1_b909ed9f474601587b2adad7ad4f674d._comment | 8 ++++++++ ...ent_2_b2735a6e03db3f77a87a0f7d87347685._comment | 16 ++++++++++++++++ ...ent_3_dd82a0cd698b0688ff08f0462af0275f._comment | 8 ++++++++ ...ent_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment | 8 ++++++++ ...ent_5_106c271d5174342055910bf57c0a34c5._comment | 8 ++++++++ ...ent_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment | 8 ++++++++ .../Can_not_Drop_Unused_Files_With_Spaces.mdwn | 20 -------------------- ...ent_1_b909ed9f474601587b2adad7ad4f674d._comment | 8 -------- ...ent_2_b2735a6e03db3f77a87a0f7d87347685._comment | 16 ---------------- ...ent_3_dd82a0cd698b0688ff08f0462af0275f._comment | 8 -------- ...ent_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment | 8 -------- ...ent_5_106c271d5174342055910bf57c0a34c5._comment | 8 -------- ...ent_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment | 8 -------- 14 files changed, 78 insertions(+), 76 deletions(-) create mode 100644 doc/bugs/Can_not_Drop_Unused_Files_With_Spaces.mdwn create mode 100644 doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment create mode 100644 doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment create mode 100644 doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment create mode 100644 doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment create mode 100644 doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment create mode 100644 doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment delete mode 100644 doc/forum/Can_not_Drop_Unused_Files_With_Spaces.mdwn delete mode 100644 doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment delete mode 100644 doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment delete mode 100644 doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment delete mode 100644 doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment delete mode 100644 doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment delete mode 100644 doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment diff --git a/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces.mdwn b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces.mdwn new file mode 100644 index 000000000..c40a90feb --- /dev/null +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces.mdwn @@ -0,0 +1,22 @@ +I have a repository at rsync.net, even though following files are shown as unused I can not drop them. + +Running unused, + + git annex unused --from cloud + unused cloud (checking for unused data...) (checking annex/direct/master...) + Some annexed data on cloud is not used by any files: + NUMBER KEY + 1 SHA256E-s4189547--43aef42540e7f50fc454ab3a2ce4aa28a13b57cccff725359cea0470eb88704b. Bir.mp3 + 2 SHA256E-s853765--c0964d3af493d78b7b8393a2aefdd8c290390a03c8cb5cccdcac4647c0fc52a0. 1.jpg + 3 SHA256E-s8706267--e34988b70048a512ad0f431a2a91fa7dd553f96c2bd6caca0bcef928bdfafb93. 3.mp3 + (To see where data was previously used, try: git log --stat -S'KEY') + + To remove unwanted data: git-annex dropunused --from cloud NUMBER + +show these then running, + + git annex dropunused 1-3 --force + +reports ok for each drop operation but rerunning git annex unused --from cloud still shows these three files as unused. I am using git-annex on mac os x (current dmg) on a direct repo. I have similar problems dropping files on the current repo even though I drop unused they still show up as unused. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment new file mode 100644 index 000000000..fa41b59a7 --- /dev/null +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.163" + subject="doubt this has anything to do with spaces" + date="2014-02-08T17:44:45Z" + content=""" +If you want to drop the files from the remote, you need to also pass the --from option to dropunused. Otherwise, it defaults to dropping any of the unused files that are present in the local repository. +"""]] diff --git a/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment new file mode 100644 index 000000000..5f5694c00 --- /dev/null +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" + nickname="Hamza" + subject="comment 2" + date="2014-02-08T18:30:47Z" + content=""" +I tried with/without --from still does not drop the files. + + git annex dropunused --from cloud 1-3 --force + dropunused 1 (from cloud...) (gpg) ok + dropunused 2 (from cloud...) ok + dropunused 3 (from cloud...) ok + (Recording state in git...) + +still running unused shows the files as unused. +"""]] diff --git a/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment new file mode 100644 index 000000000..86e3bd2c1 --- /dev/null +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.163" + subject="comment 3" + date="2014-02-08T19:17:46Z" + content=""" +Ok, you're right and this *does* involve spaces. Some bug in the unused log parser. +"""]] diff --git a/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment new file mode 100644 index 000000000..6459ee8d7 --- /dev/null +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.163" + subject="comment 4" + date="2014-02-08T19:28:18Z" + content=""" +Fixed in git. +"""]] diff --git a/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment new file mode 100644 index 000000000..4ad4d6f8b --- /dev/null +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" + nickname="Hamza" + subject="comment 5" + date="2014-02-08T19:46:49Z" + content=""" +Are the files dropped on the rsync repo? Or are they gonna be dropped when fix propagates to dmg build? +"""]] diff --git a/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment new file mode 100644 index 000000000..fbd9ed55c --- /dev/null +++ b/doc/bugs/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.163" + subject="comment 6" + date="2014-02-08T19:49:11Z" + content=""" +You will need to upgrade git-annex to the next autobuild or an upcoming release to get the fix. +"""]] diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces.mdwn b/doc/forum/Can_not_Drop_Unused_Files_With_Spaces.mdwn deleted file mode 100644 index 70a573ff4..000000000 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces.mdwn +++ /dev/null @@ -1,20 +0,0 @@ -I have a repository at rsync.net, even though following files are shown as unused I can not drop them. - -Running unused, - - git annex unused --from cloud - unused cloud (checking for unused data...) (checking annex/direct/master...) - Some annexed data on cloud is not used by any files: - NUMBER KEY - 1 SHA256E-s4189547--43aef42540e7f50fc454ab3a2ce4aa28a13b57cccff725359cea0470eb88704b. Bir.mp3 - 2 SHA256E-s853765--c0964d3af493d78b7b8393a2aefdd8c290390a03c8cb5cccdcac4647c0fc52a0. 1.jpg - 3 SHA256E-s8706267--e34988b70048a512ad0f431a2a91fa7dd553f96c2bd6caca0bcef928bdfafb93. 3.mp3 - (To see where data was previously used, try: git log --stat -S'KEY') - - To remove unwanted data: git-annex dropunused --from cloud NUMBER - -show these then running, - - git annex dropunused 1-3 --force - -reports ok for each drop operation but rerunning git annex unused --from cloud still shows these three files as unused. I am using git-annex on mac os x (current dmg) on a direct repo. I have similar problems dropping files on the current repo even though I drop unused they still show up as unused. diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment b/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment deleted file mode 100644 index fa41b59a7..000000000 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_1_b909ed9f474601587b2adad7ad4f674d._comment +++ /dev/null @@ -1,8 +0,0 @@ -[[!comment format=mdwn - username="http://joeyh.name/" - ip="209.250.56.163" - subject="doubt this has anything to do with spaces" - date="2014-02-08T17:44:45Z" - content=""" -If you want to drop the files from the remote, you need to also pass the --from option to dropunused. Otherwise, it defaults to dropping any of the unused files that are present in the local repository. -"""]] diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment b/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment deleted file mode 100644 index 5f5694c00..000000000 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_2_b2735a6e03db3f77a87a0f7d87347685._comment +++ /dev/null @@ -1,16 +0,0 @@ -[[!comment format=mdwn - username="https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" - nickname="Hamza" - subject="comment 2" - date="2014-02-08T18:30:47Z" - content=""" -I tried with/without --from still does not drop the files. - - git annex dropunused --from cloud 1-3 --force - dropunused 1 (from cloud...) (gpg) ok - dropunused 2 (from cloud...) ok - dropunused 3 (from cloud...) ok - (Recording state in git...) - -still running unused shows the files as unused. -"""]] diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment b/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment deleted file mode 100644 index 86e3bd2c1..000000000 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_3_dd82a0cd698b0688ff08f0462af0275f._comment +++ /dev/null @@ -1,8 +0,0 @@ -[[!comment format=mdwn - username="http://joeyh.name/" - ip="209.250.56.163" - subject="comment 3" - date="2014-02-08T19:17:46Z" - content=""" -Ok, you're right and this *does* involve spaces. Some bug in the unused log parser. -"""]] diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment b/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment deleted file mode 100644 index 6459ee8d7..000000000 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_4_bbebb1d0dc5fbc1f6a0bb75b47bd4986._comment +++ /dev/null @@ -1,8 +0,0 @@ -[[!comment format=mdwn - username="http://joeyh.name/" - ip="209.250.56.163" - subject="comment 4" - date="2014-02-08T19:28:18Z" - content=""" -Fixed in git. -"""]] diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment b/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment deleted file mode 100644 index 4ad4d6f8b..000000000 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_5_106c271d5174342055910bf57c0a34c5._comment +++ /dev/null @@ -1,8 +0,0 @@ -[[!comment format=mdwn - username="https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" - nickname="Hamza" - subject="comment 5" - date="2014-02-08T19:46:49Z" - content=""" -Are the files dropped on the rsync repo? Or are they gonna be dropped when fix propagates to dmg build? -"""]] diff --git a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment b/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment deleted file mode 100644 index fbd9ed55c..000000000 --- a/doc/forum/Can_not_Drop_Unused_Files_With_Spaces/comment_6_3a2d3cc3e018beaf2eb44b86ce7e1a7f._comment +++ /dev/null @@ -1,8 +0,0 @@ -[[!comment format=mdwn - username="http://joeyh.name/" - ip="209.250.56.163" - subject="comment 6" - date="2014-02-08T19:49:11Z" - content=""" -You will need to upgrade git-annex to the next autobuild or an upcoming release to get the fix. -"""]] -- cgit v1.2.3 From 7d82f8010614df75d24cbcedf4bd54f6b7fc1289 Mon Sep 17 00:00:00 2001 From: Greg Grossmeier Date: Fri, 21 Feb 2014 13:28:17 -0800 Subject: Added todo for handling ctrl c in transfers --- doc/todo/ctrl_c_handling.mdwn | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 doc/todo/ctrl_c_handling.mdwn diff --git a/doc/todo/ctrl_c_handling.mdwn b/doc/todo/ctrl_c_handling.mdwn new file mode 100644 index 000000000..ddce8eca0 --- /dev/null +++ b/doc/todo/ctrl_c_handling.mdwn @@ -0,0 +1,5 @@ +Sometimes I start off a large file transfer to a new remote (a la "git-annex copy . --to glacier"). + +I believe all of the special remotes transfer the files one at a time, which is good, and provides a sensible place to interrupt a copy/move operation. + +Wish: When I press ctrl+c in the terminal, git-annex will catch that and finish it's current transfer and then exit. For the case where the file currently being transfered also needs to be killed (ie: it's a big .iso) then subsequent ctrl+c's can do that. -- cgit v1.2.3 From d31bf5cc03ab2f6d0328929f219ebc3cbaf2a7ec Mon Sep 17 00:00:00 2001 From: Greg Grossmeier Date: Fri, 21 Feb 2014 13:30:11 -0800 Subject: clarification --- doc/todo/ctrl_c_handling.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/todo/ctrl_c_handling.mdwn b/doc/todo/ctrl_c_handling.mdwn index ddce8eca0..7101d578f 100644 --- a/doc/todo/ctrl_c_handling.mdwn +++ b/doc/todo/ctrl_c_handling.mdwn @@ -2,4 +2,4 @@ Sometimes I start off a large file transfer to a new remote (a la "git-annex cop I believe all of the special remotes transfer the files one at a time, which is good, and provides a sensible place to interrupt a copy/move operation. -Wish: When I press ctrl+c in the terminal, git-annex will catch that and finish it's current transfer and then exit. For the case where the file currently being transfered also needs to be killed (ie: it's a big .iso) then subsequent ctrl+c's can do that. +Wish: When I press ctrl+c in the terminal, git-annex will catch that and finish it's current transfer and then exit cleanly (ie: no odd backtraces in the special remote code). For the case where the file currently being transfered also needs to be killed (ie: it's a big .iso) then subsequent ctrl+c's can do that. -- cgit v1.2.3 From 1f5fa851948bb47f3346d444f9b1099fdc43d83c Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Fri, 21 Feb 2014 21:36:14 +0000 Subject: Added a comment --- .../comment_1_3addbe33817db5de836c014287b14c07._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/todo/ctrl_c_handling/comment_1_3addbe33817db5de836c014287b14c07._comment diff --git a/doc/todo/ctrl_c_handling/comment_1_3addbe33817db5de836c014287b14c07._comment b/doc/todo/ctrl_c_handling/comment_1_3addbe33817db5de836c014287b14c07._comment new file mode 100644 index 000000000..16139c78d --- /dev/null +++ b/doc/todo/ctrl_c_handling/comment_1_3addbe33817db5de836c014287b14c07._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-21T21:36:14Z" + content=""" +This really depends on the remote, some can resume where they were interrupted, such as rsync, and some cannot, such as glacier (and, er, encrypted rsync). +"""]] -- cgit v1.2.3 From f4546620ebbb58fd91fa85fb9212ce493d8ba2b0 Mon Sep 17 00:00:00 2001 From: Dennis Date: Fri, 21 Feb 2014 21:51:55 +0000 Subject: --- doc/forum/Find_files_that_lack_a_certain_field_in_metadata.mdwn | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 doc/forum/Find_files_that_lack_a_certain_field_in_metadata.mdwn diff --git a/doc/forum/Find_files_that_lack_a_certain_field_in_metadata.mdwn b/doc/forum/Find_files_that_lack_a_certain_field_in_metadata.mdwn new file mode 100644 index 000000000..b0fe9ddaa --- /dev/null +++ b/doc/forum/Find_files_that_lack_a_certain_field_in_metadata.mdwn @@ -0,0 +1,5 @@ +Is there any way to find all files that do not have a certain field assigned in metadata. E.g. I want to find all files that do not have an author field set and + + git-annex find --not --metadata "author=*" + +doesn't give any results. -- cgit v1.2.3 From 1b0310840506d033aa53ac0d36406ec4fbe5a789 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawmZgZuUhZlHpd_AbbcixY0QQiutb2I7GWY" Date: Fri, 21 Feb 2014 22:05:06 +0000 Subject: Added a comment --- .../comment_3_f199ac6ae2448949ef0779177cf0ef58._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/bugs/Mac_OS_git_version_still_too_old_for_.gitignore__63__/comment_3_f199ac6ae2448949ef0779177cf0ef58._comment diff --git a/doc/bugs/Mac_OS_git_version_still_too_old_for_.gitignore__63__/comment_3_f199ac6ae2448949ef0779177cf0ef58._comment b/doc/bugs/Mac_OS_git_version_still_too_old_for_.gitignore__63__/comment_3_f199ac6ae2448949ef0779177cf0ef58._comment new file mode 100644 index 000000000..591d4e80f --- /dev/null +++ b/doc/bugs/Mac_OS_git_version_still_too_old_for_.gitignore__63__/comment_3_f199ac6ae2448949ef0779177cf0ef58._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawmZgZuUhZlHpd_AbbcixY0QQiutb2I7GWY" + nickname="Jimmy" + subject="comment 3" + date="2014-02-21T22:05:06Z" + content=""" +And yep, it's fixed in 5.20140221-g1a47f5f. Thanks guys! +"""]] -- cgit v1.2.3 From 37efb733070169238e453ce979904e5d440b7099 Mon Sep 17 00:00:00 2001 From: "http://grossmeier.net/" Date: Fri, 21 Feb 2014 22:11:16 +0000 Subject: Added a comment: very remote specific --- .../comment_2_cc2776dc4805421180edcdf96a89fcaa._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/todo/ctrl_c_handling/comment_2_cc2776dc4805421180edcdf96a89fcaa._comment diff --git a/doc/todo/ctrl_c_handling/comment_2_cc2776dc4805421180edcdf96a89fcaa._comment b/doc/todo/ctrl_c_handling/comment_2_cc2776dc4805421180edcdf96a89fcaa._comment new file mode 100644 index 000000000..827b99afa --- /dev/null +++ b/doc/todo/ctrl_c_handling/comment_2_cc2776dc4805421180edcdf96a89fcaa._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://grossmeier.net/" + nickname="greg" + subject="very remote specific" + date="2014-02-21T22:11:16Z" + content=""" +Yeah, this is very remote specific and probably means adding the functionality there as well (eg: in the glacier.py code, not only in git-annex haskell). Maybe I should file bugs there accordingly :) +"""]] -- cgit v1.2.3 From 00fdde0fec260af4fa861efa5e769f75a7bdea86 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Fri, 21 Feb 2014 22:34:14 +0000 Subject: Added a comment --- .../comment_3_8d7d357368987f5d5d59b4d8d99a0e06._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/todo/ctrl_c_handling/comment_3_8d7d357368987f5d5d59b4d8d99a0e06._comment diff --git a/doc/todo/ctrl_c_handling/comment_3_8d7d357368987f5d5d59b4d8d99a0e06._comment b/doc/todo/ctrl_c_handling/comment_3_8d7d357368987f5d5d59b4d8d99a0e06._comment new file mode 100644 index 000000000..ed7e4d3b6 --- /dev/null +++ b/doc/todo/ctrl_c_handling/comment_3_8d7d357368987f5d5d59b4d8d99a0e06._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 3" + date="2014-02-21T22:34:14Z" + content=""" +Hmm, I forget if it's possible for git-annex to mask SIGINT when it runs glacier or rsync, so that the child process does not receive it, but the parent git-annex does. +"""]] -- cgit v1.2.3 From a8e8573a721d28ec3168d051d9a04edcbd279800 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 21 Feb 2014 18:34:34 -0400 Subject: --metadata field=value can now use globs to match, and matches case insensatively, the same as git annex view field=value does. Also refactored glob code into its own module. --- Annex/View.hs | 41 +++++---------------------------------- Limit.hs | 47 +++++++++++++++----------------------------- Utility/Glob.hs | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ debian/changelog | 2 ++ doc/git-annex.mdwn | 6 +++--- 5 files changed, 83 insertions(+), 70 deletions(-) create mode 100644 Utility/Glob.hs diff --git a/Annex/View.hs b/Annex/View.hs index 78b4da589..fe32fdaa3 100644 --- a/Annex/View.hs +++ b/Annex/View.hs @@ -5,8 +5,6 @@ - Licensed under the GNU GPL version 3 or higher. -} -{-# LANGUAGE CPP #-} - module Annex.View where import Common.Annex @@ -28,22 +26,15 @@ import Annex.Link import Annex.CatFile import Logs.MetaData import Logs.View +import Utility.Glob import Utility.FileMode import Types.Command import Config import CmdLine.Action import qualified Data.Set as S -import System.Path.WildMatch import "mtl" Control.Monad.Writer -#ifdef WITH_TDFA -import Text.Regex.TDFA -import Text.Regex.TDFA.String -#else -import Text.Regex -#endif - {- Each visible ViewFilter in a view results in another level of - subdirectory nesting. When a file matches multiple ways, it will appear - in multiple subdirectories. This means there is a bit of an exponential @@ -127,11 +118,11 @@ combineViewFilter old@(FilterValues olds) (FilterValues news) combineViewFilter (FilterValues _) newglob@(FilterGlob _) = (newglob, Widening) combineViewFilter (FilterGlob oldglob) new@(FilterValues s) - | all (matchGlob (compileGlob oldglob) . fromMetaValue) (S.toList s) = (new, Narrowing) + | all (matchGlob (compileGlob oldglob CaseInsensative) . fromMetaValue) (S.toList s) = (new, Narrowing) | otherwise = (new, Widening) combineViewFilter (FilterGlob old) newglob@(FilterGlob new) | old == new = (newglob, Unchanged) - | matchGlob (compileGlob old) new = (newglob, Narrowing) + | matchGlob (compileGlob old CaseInsensative) new = (newglob, Narrowing) | otherwise = (newglob, Widening) {- Converts a filepath used in a reference branch to the @@ -205,31 +196,9 @@ viewComponentMatcher viewcomponent = \metadata -> matcher = case viewFilter viewcomponent of FilterValues s -> \values -> S.intersection s values FilterGlob glob -> - let regex = compileGlob glob + let cglob = compileGlob glob CaseInsensative in \values -> - S.filter (matchGlob regex . fromMetaValue) values - -compileGlob :: String -> Regex -compileGlob glob = -#ifdef WITH_TDFA - case compile (defaultCompOpt {caseSensitive = False}) defaultExecOpt regex of - Right r -> r - Left _ -> error $ "failed to compile regex: " ++ regex -#else - mkRegexWithOpts regex False True -#endif - where - regex = '^':wildToRegex glob - -matchGlob :: Regex -> String -> Bool -matchGlob regex val = -#ifdef WITH_TDFA - case execute regex val of - Right (Just _) -> True - _ -> False -#else - isJust $ matchRegex regex val -#endif + S.filter (matchGlob cglob . fromMetaValue) values toViewPath :: MetaValue -> FilePath toViewPath = concatMap escapeslash . fromMetaValue diff --git a/Limit.hs b/Limit.hs index bee92889d..62c5456fe 100644 --- a/Limit.hs +++ b/Limit.hs @@ -5,8 +5,6 @@ - Licensed under the GNU GPL version 3 or higher. -} -{-# LANGUAGE CPP #-} - module Limit where import Common.Annex @@ -29,18 +27,13 @@ import Logs.Group import Logs.Unused import Logs.Location import Git.Types (RefDate(..)) +import Utility.Glob import Utility.HumanTime import Utility.DataUnits import Data.Time.Clock.POSIX import qualified Data.Set as S import qualified Data.Map as M -import System.Path.WildMatch - -#ifdef WITH_TDFA -import Text.Regex.TDFA -import Text.Regex.TDFA.String -#endif {- Checks if there are user-specified limits. -} limited :: Annex Bool @@ -82,33 +75,21 @@ addInclude :: String -> Annex () addInclude = addLimit . limitInclude limitInclude :: MkLimit -limitInclude glob = Right $ const $ return . matchglob glob +limitInclude glob = Right $ const $ return . matchGlobFile glob {- Add a limit to skip files that match the glob. -} addExclude :: String -> Annex () addExclude = addLimit . limitExclude limitExclude :: MkLimit -limitExclude glob = Right $ const $ return . not . matchglob glob - -{- Could just use wildCheckCase, but this way the regex is only compiled - - once. Also, we use regex-TDFA when available, because it's less buggy - - in its support of non-unicode characters. -} -matchglob :: String -> MatchInfo -> Bool -matchglob glob (MatchingFile fi) = -#ifdef WITH_TDFA - case cregex of - Right r -> case execute r (matchFile fi) of - Right (Just _) -> True - _ -> False - Left _ -> error $ "failed to compile regex: " ++ regex - where - cregex = compile defaultCompOpt defaultExecOpt regex - regex = '^':wildToRegex glob -#else - wildCheckCase glob (matchFile fi) -#endif -matchglob _ (MatchingKey _) = False +limitExclude glob = Right $ const $ return . not . matchGlobFile glob + +matchGlobFile :: String -> (MatchInfo -> Bool) +matchGlobFile glob = go + where + cglob = compileGlob glob CaseSensative -- memoized + go (MatchingKey _) = False + go (MatchingFile fi) = matchGlob cglob (matchFile fi) {- Adds a limit to skip files not believed to be present - in a specfied repository. Optionally on a prior date. -} @@ -270,9 +251,13 @@ addMetaData = addLimit . limitMetaData limitMetaData :: MkLimit limitMetaData s = case parseMetaData s of Left e -> Left e - Right (f, v) -> Right $ const $ checkKey (check f v) + Right (f, v) -> + let cglob = compileGlob (fromMetaValue v) CaseInsensative + in Right $ const $ checkKey (check f cglob) where - check f v k = S.member v . metaDataValues f <$> getCurrentMetaData k + check f cglob k = not . S.null + . S.filter (matchGlob cglob . fromMetaValue) + . metaDataValues f <$> getCurrentMetaData k addTimeLimit :: String -> Annex () addTimeLimit s = do diff --git a/Utility/Glob.hs b/Utility/Glob.hs new file mode 100644 index 000000000..5291af452 --- /dev/null +++ b/Utility/Glob.hs @@ -0,0 +1,57 @@ +{- file globbing + - + - This uses TDFA when available, with a fallback to regex-compat. + - TDFA is less buggy in its support for non-unicode characters. + - + - Copyright 2014 Joey Hess + - + - Licensed under the GNU GPL version 3 or higher. + -} + +{-# LANGUAGE CPP #-} + +module Utility.Glob ( + Glob, + GlobCase(..), + compileGlob, + matchGlob +) where + +import System.Path.WildMatch + +#ifdef WITH_TDFA +import Text.Regex.TDFA +import Text.Regex.TDFA.String +#else +import Text.Regex +#endif + +newtype Glob = Glob Regex + +data GlobCase = CaseSensative | CaseInsensative + +{- Compiles a glob to a regex, that can be repeatedly used. -} +compileGlob :: String -> GlobCase -> Glob +compileGlob glob globcase = Glob $ +#ifdef WITH_TDFA + case compile (defaultCompOpt {caseSensitive = casesentitive}) defaultExecOpt regex of + Right r -> r + Left _ -> error $ "failed to compile regex: " ++ regex +#else + mkRegexWithOpts regex casesentitive True +#endif + where + regex = '^':wildToRegex glob + casesentitive = case globcase of + CaseSensative -> True + CaseInsensative -> False + +matchGlob :: Glob -> String -> Bool +matchGlob (Glob regex) val = +#ifdef WITH_TDFA + case execute regex val of + Right (Just _) -> True + _ -> False +#else + isJust $ matchRegex regex val +#endif diff --git a/debian/changelog b/debian/changelog index d39e4fbd1..37193891a 100644 --- a/debian/changelog +++ b/debian/changelog @@ -2,6 +2,8 @@ git-annex (5.20140222) UNRELEASED; urgency=medium * Fix handling of rsync remote urls containing a username, including rsync.net. + * --metadata field=value can now use globs to match, and matches + case insensatively, the same as git annex view field=value does. -- Joey Hess Fri, 21 Feb 2014 13:03:04 -0400 diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 0912e0b2a..97f2b2918 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -1133,10 +1133,10 @@ file contents are present at either of two repositories. The size can be specified with any commonly used units, for example, "0.5 gb" or "100 KiloBytes" -* `--metadata field=value` +* `--metadata field=glob` - Matches only files that have a metadata field attached with the specified - value. + Matches only files that have a metadata field attached with a value that + matches the glob. * `--want-get` -- cgit v1.2.3 From 067aaa4b4ef6a56e67caad1933812083ac09ce16 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 21 Feb 2014 18:36:36 -0400 Subject: note case insensative matching --- doc/git-annex.mdwn | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 97f2b2918..f3bbe0502 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -1136,7 +1136,8 @@ file contents are present at either of two repositories. * `--metadata field=glob` Matches only files that have a metadata field attached with a value that - matches the glob. + matches the glob. The values of metadata fields are matched case + insensitively. * `--want-get` -- cgit v1.2.3 From 6e6ab515b3485d381e01b009721badeb93919448 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Fri, 21 Feb 2014 22:36:51 +0000 Subject: Added a comment --- .../comment_1_476e52563ccd3ad1b43e3a2da4dfaa82._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/forum/Find_files_that_lack_a_certain_field_in_metadata/comment_1_476e52563ccd3ad1b43e3a2da4dfaa82._comment diff --git a/doc/forum/Find_files_that_lack_a_certain_field_in_metadata/comment_1_476e52563ccd3ad1b43e3a2da4dfaa82._comment b/doc/forum/Find_files_that_lack_a_certain_field_in_metadata/comment_1_476e52563ccd3ad1b43e3a2da4dfaa82._comment new file mode 100644 index 000000000..fbfb6ebb6 --- /dev/null +++ b/doc/forum/Find_files_that_lack_a_certain_field_in_metadata/comment_1_476e52563ccd3ad1b43e3a2da4dfaa82._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-21T22:36:51Z" + content=""" +--metadata does not support globs, so your example is asking for all files that don't have an author field with a literal \"*\" value. When I try that command, it lists all files ... as expected. + +It seems that adding glob support to it would get to the result you want, and makes sense to parallel git annex view. Change made in git! +"""]] -- cgit v1.2.3 From eb3aa609e32a91aa102cc4225d6a6225c3fcec09 Mon Sep 17 00:00:00 2001 From: "https://id.koumbit.net/anarcat" Date: Sat, 22 Feb 2014 05:19:52 +0000 Subject: add missing traillink, so that we don't jump from disaster recovery to telehash.. --- doc/design/roadmap.mdwn | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/design/roadmap.mdwn b/doc/design/roadmap.mdwn index e6ad21fee..0f0df4496 100644 --- a/doc/design/roadmap.mdwn +++ b/doc/design/roadmap.mdwn @@ -6,10 +6,10 @@ Now in the * Month 1 [[!traillink assistant/encrypted_git_remotes]] * Month 2 [[!traillink assistant/disaster_recovery]] -* Month 3 user-driven features and polishing [[todo/direct_mode_guard]] [[assistant/upgrading]] -* Month 4 [[Windows_webapp|assistant/Windows]], Linux arm, [[todo/support_for_writing_external_special_remotes]] +* Month 3 user-driven features and polishing [[!traillink todo/direct_mode_guard]] [[!traillink assistant/upgrading]] +* Month 4 [[!traillink assistant/windows text="Windows webapp"]], Linux arm, [[!traillink todo/support_for_writing_external_special_remotes]] * Month 5 user-driven features and polishing -* **Month 6 get Windows out of beta, [[metadata and views|design/metadata]]** +* **Month 6 get Windows out of beta, [[!traillink design/metadata text="metadata and views"]]** * Month 7 user-driven features and polishing * Month 8 [[!traillink assistant/telehash]] * Month 9 [[!traillink assistant/gpgkeys]] [[!traillink assistant/sshpassword]] -- cgit v1.2.3 From 8e609f7a026795a106a819916e7d6756544b528e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sat, 22 Feb 2014 13:35:50 -0400 Subject: factor out new module --- Annex/View.hs | 64 +++++++++++++-------------------------------- Annex/View/ViewedFile.hs | 67 ++++++++++++++++++++++++++++++++++++++++++++++++ Command/PreCommit.hs | 5 ++-- Types/View.hs | 4 --- 4 files changed, 88 insertions(+), 52 deletions(-) create mode 100644 Annex/View/ViewedFile.hs diff --git a/Annex/View.hs b/Annex/View.hs index fe32fdaa3..0985e930f 100644 --- a/Annex/View.hs +++ b/Annex/View.hs @@ -8,6 +8,7 @@ module Annex.View where import Common.Annex +import Annex.View.ViewedFile import Types.View import Types.MetaData import qualified Git @@ -125,35 +126,6 @@ combineViewFilter (FilterGlob old) newglob@(FilterGlob new) | matchGlob (compileGlob old CaseInsensative) new = (newglob, Narrowing) | otherwise = (newglob, Widening) -{- Converts a filepath used in a reference branch to the - - filename that will be used in the view. - - - - No two filepaths from the same branch should yeild the same result, - - so all directory structure needs to be included in the output file - - in some way. However, the branch's directory structure is not relevant - - in the view. - - - - So, from dir/subdir/file.foo, generate file_{dir;subdir}.foo - - - - (To avoid collisions with a filename that already contains {foo}, - - that is doubled to {{foo}}.) - -} -fileViewFromReference :: MkFileView -fileViewFromReference f = concat - [ double base - , if null dirs then "" else "_{" ++ double (intercalate ";" dirs) ++ "}" - , double $ concat extensions - ] - where - (path, basefile) = splitFileName f - dirs = filter (/= ".") $ map dropTrailingPathSeparator (splitPath path) - (base, extensions) = splitShortExtensions basefile - - double = replace "{" "{{" . replace "}" "}}" - -fileViewReuse :: MkFileView -fileViewReuse = takeFileName - {- Generates views for a file from a branch, based on its metadata - and the filename used in the branch. - @@ -167,10 +139,10 @@ fileViewReuse = takeFileName - evaluate this function with the view parameter and reuse - the result. The globs in the view will then be compiled and memoized. -} -fileViews :: View -> MkFileView -> FilePath -> MetaData -> [FileView] -fileViews view = +viewedFiles :: View -> MkViewedFile -> FilePath -> MetaData -> [ViewedFile] +viewedFiles view = let matchers = map viewComponentMatcher (viewComponents view) - in \mkfileview file metadata -> + in \mkviewedfile file metadata -> let matches = map (\m -> m metadata) matchers in if any isNothing matches then [] @@ -178,8 +150,8 @@ fileViews view = let paths = pathProduct $ map (map toViewPath) (visible matches) in if null paths - then [mkfileview file] - else map ( mkfileview file) paths + then [mkviewedfile file] + else map ( mkviewedfile file) paths where visible = map (fromJust . snd) . filter (viewVisible . fst) . @@ -237,9 +209,9 @@ pathProduct (l:ls) = foldl combinel l ls where combinel xs ys = [combine x y | x <- xs, y <- ys] -{- Extracts the metadata from a fileview, based on the view that was used +{- Extracts the metadata from a ViewedFile, based on the view that was used - to construct it. -} -fromView :: View -> FileView -> MetaData +fromView :: View -> ViewedFile -> MetaData fromView view f = foldr (uncurry updateMetaData) newMetaData (zip fields values) where visible = filter viewVisible (viewComponents view) @@ -248,12 +220,12 @@ fromView view f = foldr (uncurry updateMetaData) newMetaData (zip fields values) values = map fromViewPath paths {- Constructing a view that will match arbitrary metadata, and applying - - it to a file yields a set of FileViews which all contain the same + - it to a file yields a set of ViewedFile which all contain the same - MetaFields that were present in the input metadata - (excluding fields that are not visible). -} prop_view_roundtrips :: FilePath -> MetaData -> Bool -> Bool prop_view_roundtrips f metadata visible = null f || viewTooLarge view || - all hasfields (fileViews view fileViewFromReference f metadata) + all hasfields (viewedFiles view viewedFileFromReference f metadata) where view = View (Git.Ref "master") $ map (\(mf, mv) -> ViewComponent mf (FilterValues $ S.filter (not . null . fromMetaValue) mv) visible) @@ -265,7 +237,7 @@ prop_view_roundtrips f metadata visible = null f || viewTooLarge view || - branch for the view. -} applyView :: View -> Annex Git.Branch -applyView view = applyView' fileViewFromReference view +applyView view = applyView' viewedFileFromReference view {- Generates a new branch for a View, which must be a more narrow - version of the View originally used to generate the currently @@ -273,18 +245,18 @@ applyView view = applyView' fileViewFromReference view - in view, not any others. -} narrowView :: View -> Annex Git.Branch -narrowView = applyView' fileViewReuse +narrowView = applyView' viewedFileReuse {- Go through each file in the currently checked out branch. - If the file is not annexed, skip it, unless it's a dotfile in the top. - - Look up the metadata of annexed files, and generate any FileViews, + - Look up the metadata of annexed files, and generate any ViewedFiles, - and stage them. - - Currently only works in indirect mode. Must be run from top of - repository. -} -applyView' :: MkFileView -> View -> Annex Git.Branch -applyView' mkfileview view = do +applyView' :: MkViewedFile -> View -> Annex Git.Branch +applyView' mkviewedfile view = do top <- fromRepo Git.repoPath (l, clean) <- inRepo $ Git.LsFiles.inRepo [top] liftIO . nukeFile =<< fromRepo gitAnnexViewIndex @@ -298,10 +270,10 @@ applyView' mkfileview view = do void $ stopUpdateIndex uh void clean where - genfileviews = fileViews view mkfileview -- enables memoization + genviewedfiles = viewedFiles view mkviewedfile -- enables memoization go uh hasher f (Just (k, _)) = do metadata <- getCurrentMetaData k - forM_ (genfileviews f metadata) $ \fv -> do + forM_ (genviewedfiles f metadata) $ \fv -> do stagesymlink uh hasher fv =<< inRepo (gitAnnexLink fv k) go uh hasher f Nothing | "." `isPrefixOf` f = do @@ -350,7 +322,7 @@ updateView view ref oldref = genViewBranch view $ do - Note that removes must be handled before adds. This is so - that moving a file from x/foo/ to x/bar/ adds back the metadata for x. -} -withViewChanges :: (FileView -> Key -> CommandStart) -> (FileView -> Key -> CommandStart) -> Annex () +withViewChanges :: (ViewedFile -> Key -> CommandStart) -> (ViewedFile -> Key -> CommandStart) -> Annex () withViewChanges addmeta removemeta = do makeabs <- flip fromTopFilePath <$> gitRepo (diffs, cleanup) <- inRepo $ DiffTree.diffIndex Git.Ref.headRef diff --git a/Annex/View/ViewedFile.hs b/Annex/View/ViewedFile.hs new file mode 100644 index 000000000..b773e728d --- /dev/null +++ b/Annex/View/ViewedFile.hs @@ -0,0 +1,67 @@ +{- filenames (not paths) used in views + - + - Copyright 2014 Joey Hess + - + - Licensed under the GNU GPL version 3 or higher. + -} + +module Annex.View.ViewedFile where + +import Common.Annex +import Types.View +import Types.MetaData +import qualified Git +import qualified Git.DiffTree as DiffTree +import qualified Git.Branch +import qualified Git.LsFiles +import qualified Git.Ref +import Git.UpdateIndex +import Git.Sha +import Git.HashObject +import Git.Types +import Git.FilePath +import qualified Backend +import Annex.Index +import Annex.Link +import Annex.CatFile +import Logs.MetaData +import Logs.View +import Utility.Glob +import Utility.FileMode +import Types.Command +import Config +import CmdLine.Action + +type FileName = String +type ViewedFile = FileName + +type MkViewedFile = FilePath -> ViewedFile + +{- Converts a filepath used in a reference branch to the + - filename that will be used in the view. + - + - No two filepaths from the same branch should yeild the same result, + - so all directory structure needs to be included in the output file + - in some way. However, the branch's directory structure is not replicated + - in the view. + - + - So, from dir/subdir/file.foo, generate file_{dir;subdir}.foo + - + - (To avoid collisions with a filename that already contains {foo}, + - that is doubled to {{foo}}.) + -} +viewedFileFromReference :: MkViewedFile +viewedFileFromReference f = concat + [ double base + , if null dirs then "" else "_{" ++ double (intercalate ";" dirs) ++ "}" + , double $ concat extensions + ] + where + (path, basefile) = splitFileName f + dirs = filter (/= ".") $ map dropTrailingPathSeparator (splitPath path) + (base, extensions) = splitShortExtensions basefile + + double = replace "{" "{{" . replace "}" "}}" + +viewedFileReuse :: MkViewedFile +viewedFileReuse = takeFileName diff --git a/Command/PreCommit.hs b/Command/PreCommit.hs index 4b90b5c2e..07d958de1 100644 --- a/Command/PreCommit.hs +++ b/Command/PreCommit.hs @@ -14,6 +14,7 @@ import qualified Command.Add import qualified Command.Fix import Annex.Direct import Annex.View +import Annex.View.ViewedFile import Logs.View import Logs.MetaData import Types.View @@ -52,12 +53,12 @@ startIndirect f = next $ do startDirect :: [String] -> CommandStart startDirect _ = next $ next $ preCommitDirect -addViewMetaData :: View -> FileView -> Key -> CommandStart +addViewMetaData :: View -> ViewedFile -> Key -> CommandStart addViewMetaData v f k = do showStart "metadata" f next $ next $ changeMetaData k $ fromView v f -removeViewMetaData :: View -> FileView -> Key -> CommandStart +removeViewMetaData :: View -> ViewedFile -> Key -> CommandStart removeViewMetaData v f k = do showStart "metadata" f next $ next $ changeMetaData k $ unsetMetaData $ fromView v f diff --git a/Types/View.hs b/Types/View.hs index 04b002879..618193cf9 100644 --- a/Types/View.hs +++ b/Types/View.hs @@ -35,10 +35,6 @@ data ViewComponent = ViewComponent instance Arbitrary ViewComponent where arbitrary = ViewComponent <$> arbitrary <*> arbitrary <*> arbitrary -{- Only files with metadata matching the view are displayed. -} -type FileView = FilePath -type MkFileView = FilePath -> FileView - data ViewFilter = FilterValues (S.Set MetaValue) | FilterGlob String -- cgit v1.2.3 From 0430ac2a21a7e1fefb959d2b370e39252b27da01 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sat, 22 Feb 2014 14:54:53 -0400 Subject: change directory encoding in ViewedFile such that the original directory can be extracted from it --- Annex/View/ViewedFile.hs | 53 +++++++++++++++++++++++++++++++++++++----------- Test.hs | 2 ++ 2 files changed, 43 insertions(+), 12 deletions(-) diff --git a/Annex/View/ViewedFile.hs b/Annex/View/ViewedFile.hs index b773e728d..9f6a3748b 100644 --- a/Annex/View/ViewedFile.hs +++ b/Annex/View/ViewedFile.hs @@ -5,7 +5,14 @@ - Licensed under the GNU GPL version 3 or higher. -} -module Annex.View.ViewedFile where +module Annex.View.ViewedFile ( + ViewedFile, + MkViewedFile, + viewedFileFromReference, + viewedFileReuse, + dirFromViewedFile, + prop_viewedFile_roundtrips, +) where import Common.Annex import Types.View @@ -41,27 +48,49 @@ type MkViewedFile = FilePath -> ViewedFile - filename that will be used in the view. - - No two filepaths from the same branch should yeild the same result, - - so all directory structure needs to be included in the output file - - in some way. However, the branch's directory structure is not replicated - - in the view. + - so all directory structure needs to be included in the output filename + - in some way. - - - So, from dir/subdir/file.foo, generate file_{dir;subdir}.foo - - - - (To avoid collisions with a filename that already contains {foo}, - - that is doubled to {{foo}}.) + - So, from dir/subdir/file.foo, generate file_%dir%subdir%.foo -} viewedFileFromReference :: MkViewedFile viewedFileFromReference f = concat - [ double base - , if null dirs then "" else "_{" ++ double (intercalate ";" dirs) ++ "}" - , double $ concat extensions + [ escape base + , if null dirs then "" else "_%" ++ intercalate "%" (map escape dirs) ++ "%" + , escape $ concat extensions ] where (path, basefile) = splitFileName f dirs = filter (/= ".") $ map dropTrailingPathSeparator (splitPath path) (base, extensions) = splitShortExtensions basefile - double = replace "{" "{{" . replace "}" "}}" + {- To avoid collisions with filenames or directories that contain + - '%', and to allow the original directories to be extracted + - from the ViewedFile, '%' is escaped to '\%' (and '\' to '\\'). + -} + escape :: String -> String + escape = replace "%" "\\%" . replace "\\" "\\\\" viewedFileReuse :: MkViewedFile viewedFileReuse = takeFileName + +{- Extracts from a ViewedFile the directory where the file is located on + - in the reference branch. -} +dirFromViewedFile :: ViewedFile -> FilePath +dirFromViewedFile = joinPath . drop 1 . sep [] "" + where + sep l _ [] = reverse l + sep l curr (c:cs) + | c == '%' = sep (reverse curr:l) "" cs + | c == '\\' = case cs of + (c':cs') -> sep l (c':curr) cs' + [] -> sep l curr cs + | otherwise = sep l (c:curr) cs + +prop_viewedFile_roundtrips :: FilePath -> Bool +prop_viewedFile_roundtrips f + | isAbsolute f = True -- Only relative paths are encoded. + | any (isPathSeparator) (end f) = True -- Filenames wanted, not directories. + | otherwise = dir == dirFromViewedFile (viewedFileFromReference f) + where + dir = joinPath $ beginning $ splitDirectories f diff --git a/Test.hs b/Test.hs index 624636ed5..a93d9e4c9 100644 --- a/Test.hs +++ b/Test.hs @@ -55,6 +55,7 @@ import qualified Crypto import qualified Annex.Init import qualified Annex.CatFile import qualified Annex.View +import qualified Annex.View.ViewedFile import qualified Logs.View import qualified Utility.Path import qualified Utility.FileMode @@ -151,6 +152,7 @@ properties = localOption (QuickCheckTests 1000) $ testGroup "QuickCheck" , testProperty "prop_metadata_serialize" Types.MetaData.prop_metadata_serialize , testProperty "prop_branchView_legal" Logs.View.prop_branchView_legal , testProperty "prop_view_roundtrips" Annex.View.prop_view_roundtrips + , testProperty "prop_viewedFile_rountrips" Annex.View.ViewedFile.prop_viewedFile_roundtrips ] {- These tests set up the test environment, but also test some basic parts -- cgit v1.2.3 From ad42a4de9c5fa08456de01b5e41350fff4499189 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sat, 22 Feb 2014 14:58:05 -0400 Subject: prune imports --- Annex/View/ViewedFile.hs | 23 ----------------------- 1 file changed, 23 deletions(-) diff --git a/Annex/View/ViewedFile.hs b/Annex/View/ViewedFile.hs index 9f6a3748b..61be3cffc 100644 --- a/Annex/View/ViewedFile.hs +++ b/Annex/View/ViewedFile.hs @@ -15,29 +15,6 @@ module Annex.View.ViewedFile ( ) where import Common.Annex -import Types.View -import Types.MetaData -import qualified Git -import qualified Git.DiffTree as DiffTree -import qualified Git.Branch -import qualified Git.LsFiles -import qualified Git.Ref -import Git.UpdateIndex -import Git.Sha -import Git.HashObject -import Git.Types -import Git.FilePath -import qualified Backend -import Annex.Index -import Annex.Link -import Annex.CatFile -import Logs.MetaData -import Logs.View -import Utility.Glob -import Utility.FileMode -import Types.Command -import Config -import CmdLine.Action type FileName = String type ViewedFile = FileName -- cgit v1.2.3 From 5463fc4d8df52be0c0232f1c44956b2227ad193c Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sat, 22 Feb 2014 16:09:00 -0400 Subject: views: add automatically constructed file location metadata MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When constructing views, metadata is available about the location of the file in the view's reference branch. Allows incorporating parts of the directory hierarchy in a view. For example `git annex view tag=* podcasts/=*` makes a view in the form tag/showname. Performance impact: I benchmarked git annex view tag=* in the conference proceedings repo to take 6.459s before this change, and 6.544s after. FWIW, I considered making the syntax for this be podcasts/*, which might be easier for the user to learn. However, I think it's not as good: * The user has to then juggle two different syntaxes, and podcasts/* will be expanded by the shell so they also need to quote it, while podcasts/=* is unlikely to be expanded by the shell. * It would allow for things like podcasts/*/* and *.mp3 which do not map well into views. This commit was sponsored by Aurélien Pinceaux. --- Annex/View.hs | 34 +++++++++++++++++++++++++++++----- Annex/View/ViewedFile.hs | 2 ++ Command/VAdd.hs | 6 +++--- Command/View.hs | 2 +- debian/changelog | 5 +++++ doc/design/metadata.mdwn | 9 +++++---- doc/git-annex.mdwn | 23 ++++++++++++++++------- doc/tips/metadata_driven_views.mdwn | 33 ++++++++++++++++++++++++++++----- 8 files changed, 89 insertions(+), 25 deletions(-) diff --git a/Annex/View.hs b/Annex/View.hs index 0985e930f..ece5d099e 100644 --- a/Annex/View.hs +++ b/Annex/View.hs @@ -34,6 +34,7 @@ import Config import CmdLine.Action import qualified Data.Set as S +import qualified Data.Map as M import "mtl" Control.Monad.Writer {- Each visible ViewFilter in a view results in another level of @@ -233,11 +234,32 @@ prop_view_roundtrips f metadata visible = null f || viewTooLarge view || visiblefields = sort (map viewField $ filter viewVisible (viewComponents view)) hasfields fv = sort (map fst (fromMetaData (fromView view fv))) == visiblefields +{- A directory foo/bar/baz/ is turned into metadata fields + - /=foo, foo/=bar, foo/bar/=baz. + - + - Note that this may generate MetaFields that legalField rejects. + - This is necessary to have a 1:1 mapping between directory names and + - fields. So this MetaData cannot safely be serialized. -} +getDirMetaData :: FilePath -> MetaData +getDirMetaData d = MetaData $ M.fromList $ zip fields values + where + dirs = splitDirectories d + fields = map (MetaField . addTrailingPathSeparator . joinPath) + (inits dirs) + values = map (S.singleton . toMetaValue . fromMaybe "" . headMaybe) + (tails dirs) + +getWorkTreeMetaData :: FilePath -> MetaData +getWorkTreeMetaData = getDirMetaData . dropFileName + +getViewedFileMetaData :: FilePath -> MetaData +getViewedFileMetaData = getDirMetaData . dirFromViewedFile . takeFileName + {- Applies a view to the currently checked out branch, generating a new - branch for the view. -} applyView :: View -> Annex Git.Branch -applyView view = applyView' viewedFileFromReference view +applyView view = applyView' viewedFileFromReference getWorkTreeMetaData view {- Generates a new branch for a View, which must be a more narrow - version of the View originally used to generate the currently @@ -245,7 +267,7 @@ applyView view = applyView' viewedFileFromReference view - in view, not any others. -} narrowView :: View -> Annex Git.Branch -narrowView = applyView' viewedFileReuse +narrowView = applyView' viewedFileReuse getViewedFileMetaData {- Go through each file in the currently checked out branch. - If the file is not annexed, skip it, unless it's a dotfile in the top. @@ -255,8 +277,8 @@ narrowView = applyView' viewedFileReuse - Currently only works in indirect mode. Must be run from top of - repository. -} -applyView' :: MkViewedFile -> View -> Annex Git.Branch -applyView' mkviewedfile view = do +applyView' :: MkViewedFile -> (FilePath -> MetaData) -> View -> Annex Git.Branch +applyView' mkviewedfile getfilemetadata view = do top <- fromRepo Git.repoPath (l, clean) <- inRepo $ Git.LsFiles.inRepo [top] liftIO . nukeFile =<< fromRepo gitAnnexViewIndex @@ -273,7 +295,9 @@ applyView' mkviewedfile view = do genviewedfiles = viewedFiles view mkviewedfile -- enables memoization go uh hasher f (Just (k, _)) = do metadata <- getCurrentMetaData k - forM_ (genviewedfiles f metadata) $ \fv -> do + let dirmetadata = getfilemetadata f + let metadata' = unionMetaData dirmetadata metadata + forM_ (genviewedfiles f metadata') $ \fv -> do stagesymlink uh hasher fv =<< inRepo (gitAnnexLink fv k) go uh hasher f Nothing | "." `isPrefixOf` f = do diff --git a/Annex/View/ViewedFile.hs b/Annex/View/ViewedFile.hs index 61be3cffc..5733d4c94 100644 --- a/Annex/View/ViewedFile.hs +++ b/Annex/View/ViewedFile.hs @@ -48,6 +48,8 @@ viewedFileFromReference f = concat escape :: String -> String escape = replace "%" "\\%" . replace "\\" "\\\\" +{- For use when operating already within a view, so whatever filepath + - is present in the work tree is already a ViewedFile. -} viewedFileReuse :: MkViewedFile viewedFileReuse = takeFileName diff --git a/Command/VAdd.hs b/Command/VAdd.hs index 3dc1fd4cf..6b53aa7ea 100644 --- a/Command/VAdd.hs +++ b/Command/VAdd.hs @@ -10,11 +10,11 @@ module Command.VAdd where import Common.Annex import Command import Annex.View -import Command.View (paramView, parseViewParam, checkoutViewBranch) +import Command.View (parseViewParam, checkoutViewBranch) def :: [Command] -def = [notBareRepo $ notDirect $ - command "vadd" paramView seek SectionMetaData "add subdirs to current view"] +def = [notBareRepo $ notDirect $ command "vadd" (paramRepeating "FIELD=GLOB") + seek SectionMetaData "add subdirs to current view"] seek :: CommandSeek seek = withWords start diff --git a/Command/View.hs b/Command/View.hs index 17e136f7b..ea63d39fd 100644 --- a/Command/View.hs +++ b/Command/View.hs @@ -43,7 +43,7 @@ perform view = do next $ checkoutViewBranch view applyView paramView :: String -paramView = paramPair (paramRepeating "FIELD=VALUE") (paramRepeating "TAG") +paramView = paramPair (paramRepeating "TAG") (paramRepeating "FIELD=VALUE") parseViewParam :: String -> (MetaField, String) parseViewParam s = case separate (== '=') s of diff --git a/debian/changelog b/debian/changelog index 37193891a..4d5589ed0 100644 --- a/debian/changelog +++ b/debian/changelog @@ -4,6 +4,11 @@ git-annex (5.20140222) UNRELEASED; urgency=medium including rsync.net. * --metadata field=value can now use globs to match, and matches case insensatively, the same as git annex view field=value does. + * When constructing views, metadata is available about the location of the + file in the view's reference branch. Allows incorporating parts of the + directory hierarchy in a view. + For example `git annex view tag=* podcasts/=*` makes a view in the form + tag/showname. -- Joey Hess Fri, 21 Feb 2014 13:03:04 -0400 diff --git a/doc/design/metadata.mdwn b/doc/design/metadata.mdwn index db0d51c5c..c700e3674 100644 --- a/doc/design/metadata.mdwn +++ b/doc/design/metadata.mdwn @@ -70,7 +70,7 @@ metadata is derived, at least year=yyyy and probably also month, etc. ### directory hierarchy metadata -TODO From the original filename used in the master branch, when +From the original filename used in the master branch, when constructing a view, generate fields. For example foo/bar/baz.mp3 would get /=foo, foo/=bar, foo/bar/=baz, and .=mp3. @@ -82,11 +82,12 @@ This allows using whatever directory hierarchy exists to inform the view, without locking the view into using it. Complication: When refining a view, it only looks at the filenames in -the view, so it would need to map from +the view, so it has to map from those filenames to derive the same metadata, unless there is persistent storage. Luckily, the filenames used in the views currently include the -subdirs (although not quite in a parseable format, would need some small -changes). +subdirs. + +**done**! # other uses for metadata diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index f3bbe0502..3c233f378 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -715,20 +715,29 @@ subdirectories). git annex metadata annexscreencast.ogv -t video -t screencast -s author+=Alice -* `view [field=value ...] [tag ...]` +* `view [tag ...] [field=value ...] [location/=value]` Uses metadata to build a view branch of the files in the current branch, and checks out the view branch. Only files in the current branch whose metadata matches all the specified field values and tags will be shown in the view. + + Once within a view, you can make additional directories, and + copy or move files into them. When you commit, the metadata will + be updated to correspond to your changes. Multiple values for a metadata field can be specified, either by using a glob (`field="*"`) or by listing each wanted value. The resulting view will put files in subdirectories according to the value of their fields. - Once within a view, you can make additional directories, and - copy or move files into them. When you commit, the metadata will - be updated to correspond to your changes. + There are fields corresponding to the path to the file. So a file + "foo/bar/baz/file" has fields "/=foo", "foo/=bar", and "foo/bar/=baz". + These location fields can be used the same as other metadata to construct + the view. + + For example, `/=podcasts` will only include files from the podcasts + directory in the view, while `podcasts/=*` will preserve the + subdirectories of the podcasts directory in the view. * `vpop [N]` @@ -737,12 +746,12 @@ subdirectories). The optional number tells how many views to pop. -* `vfilter [field=value ...] [tag ...]` +* `vfilter [tag ...] [field=value ...] [location/=value]` Filters the current view to only the files that have the - specified values and tags. + specified field values, tags, and locations. -* `vadd [field=glob ...]` +* `vadd [field=glob ...] [location/=glob]` Changes the current view, adding an additional level of directories to categorize the files. diff --git a/doc/tips/metadata_driven_views.mdwn b/doc/tips/metadata_driven_views.mdwn index 7b46ca974..e24bf29ae 100644 --- a/doc/tips/metadata_driven_views.mdwn +++ b/doc/tips/metadata_driven_views.mdwn @@ -24,8 +24,8 @@ metadata: # git annex metadata --tag done videos/old # git annex metadata --tag new videos/lotsofcats.ogv # git annex metadata --tag sound podcasts - # git annex metadata --tag done podcasts/old - # git annex metadata --tag new podcasts/recent + # git annex metadata --tag done podcasts/*/old + # git annex metadata --tag new podcasts/*/recent So, you had a bunch of different kinds of files sorted into a directory structure. But that didn't really reflect how you approach the files. @@ -81,9 +81,11 @@ all the way out of all views, you'll be back on the regular git branch you originally started from. You can also use `git checkout` to switch between views and other branches. -Beyond simple tags, you can add whatever kinds of metadata you like, and -use that metadata in more elaborate views. For example, let's add a year -field. +## fields + +Beyond simple tags and directories, you can add whatever kinds of metadata +you like, and use that metadata in more elaborate views. For example, let's +add a year field. # git checkout master # git annex metadata --set year=2014 work/2014 @@ -118,4 +120,25 @@ Oh, did you want it the other way around? Easy! |-- 2014 `-- 2013 +## location fields + +Let's switch to a view containing only new podcasts. And since the +podcasts are organized into one subdirectory per show, let's +include those subdirectories in the view. + + # git checkout master + # git annex view tag=new podcasts/=* + # tree -d + This_Developers_Life + Escape_Pod + GitMinutes + The_Haskell_Cast + StarShipSofa + +That's an example of using part of the directory layout of the original +branch to inform the view. Every file gets fields automatically set up +corresponding to the directory it's in. So a file"foo/bar/baz/file" has +fields "/=foo", "foo/=bar", and "foo/bar/=baz". These location fields +can be used the same as other metadata to construct the view. + This has probably only scratched the surface of what you can do with views. -- cgit v1.2.3 From a4746a14ce077b18f962b91f415335b0042120bd Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sat, 22 Feb 2014 16:32:39 -0400 Subject: rather ugly workaround for directories with spaces as MetaFields --- Command/View.hs | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/Command/View.hs b/Command/View.hs index ea63d39fd..f123e3812 100644 --- a/Command/View.hs +++ b/Command/View.hs @@ -45,10 +45,17 @@ perform view = do paramView :: String paramView = paramPair (paramRepeating "TAG") (paramRepeating "FIELD=VALUE") +{- Parse field=value + - + - Note that the field may not be a legal metadata field name, + - but it's let through anywa (using MetaField rather than mkMetaField). + - This is useful when matching on directory names with spaces, + - which are not legal MetaFields. + -} parseViewParam :: String -> (MetaField, String) parseViewParam s = case separate (== '=') s of (tag, []) -> (tagMetaField, tag) - (field, wanted) -> either error (\f -> (f, wanted)) (mkMetaField field) + (field, wanted) -> (MetaField field, wanted) mkView :: [String] -> Annex View mkView params = do -- cgit v1.2.3 From 71bd7a787166087a1a0bb8372027dd8cf94d0a2e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sat, 22 Feb 2014 16:37:13 -0400 Subject: etoolong --- doc/design/metadata.mdwn | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/design/metadata.mdwn b/doc/design/metadata.mdwn index c700e3674..a648f5fe6 100644 --- a/doc/design/metadata.mdwn +++ b/doc/design/metadata.mdwn @@ -208,3 +208,7 @@ So, possible approaches: * What happens if git annex add or the assistant add a new file while on a view? If the file is not also added to the master branch, it will be lost when exiting the view. TODO + +* The filename mangling can result in a filename in a view + that is too long for its containing filesystem. Should detect and do + something reasonable to avoid. TODO -- cgit v1.2.3 From b16ceaf7bd1256aee31434b7fde69014d6181bd7 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawm3vKzS4eOWYpKMoYXqMIjNsIg_nYF-loU" Date: Sat, 22 Feb 2014 21:45:26 +0000 Subject: Added a comment: Already existing metadata implementation --- ...ent_1_22ed80bd8eabaa836e9dfc2432531f04._comment | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 doc/design/metadata/comment_1_22ed80bd8eabaa836e9dfc2432531f04._comment diff --git a/doc/design/metadata/comment_1_22ed80bd8eabaa836e9dfc2432531f04._comment b/doc/design/metadata/comment_1_22ed80bd8eabaa836e9dfc2432531f04._comment new file mode 100644 index 000000000..493db4339 --- /dev/null +++ b/doc/design/metadata/comment_1_22ed80bd8eabaa836e9dfc2432531f04._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawm3vKzS4eOWYpKMoYXqMIjNsIg_nYF-loU" + nickname="Konubinix" + subject="Already existing metadata implementation " + date="2014-02-22T21:45:25Z" + content=""" +Hi, + +I love the idea behing storing metadata. + +I suggest to exchange ideas (and maybe code) with projects already implementing metadata systems. + +I have tried several implementations and particularly noticed tmsu (http://tmsu.org/). This tool stores tags into a sqlite database and uses also a SHA-256 fingerprint of the file to be aware of file moves. It provides a fuse view of the tags with the ability to change tags by moving files (like in the git annex metadata view). + +Paul Ruane is particularly responsive on the mailing list and he already supports git annexed files (with SHAE-256 fingerprint) (see the end of the thread https://groups.google.com/forum/#!topic/tmsu/A5EGpnCcJ2w). + +Even if you cannot reuse the project, they are interresting ideas that might be worth looking at like the implications of tags: a file tagged \"film\" being automatically tagged \"video\". + +Tagsistant (http://www.tagsistant.net/) may also be a good source of inspirations. I just don't like the fact that it uses a backstore of tagged files. + +Thanks for reading. +"""]] -- cgit v1.2.3 From e0ae2f6bba897367ae23bfb1a6584ebd4843452e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sat, 22 Feb 2014 18:16:28 -0400 Subject: exclude derived metadata when extracting metadata from a viewed file --- Annex/View.hs | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/Annex/View.hs b/Annex/View.hs index ece5d099e..69d064753 100644 --- a/Annex/View.hs +++ b/Annex/View.hs @@ -211,14 +211,19 @@ pathProduct (l:ls) = foldl combinel l ls combinel xs ys = [combine x y | x <- xs, y <- ys] {- Extracts the metadata from a ViewedFile, based on the view that was used - - to construct it. -} + - to construct it. + - + - Derived metadata is excluded. + -} fromView :: View -> ViewedFile -> MetaData -fromView view f = foldr (uncurry updateMetaData) newMetaData (zip fields values) +fromView view f = MetaData $ + M.fromList (zip fields values) `M.difference` derived where visible = filter viewVisible (viewComponents view) fields = map viewField visible - paths = splitDirectories $ dropFileName f - values = map fromViewPath paths + paths = splitDirectories (dropFileName f) + values = map (S.singleton . fromViewPath) paths + MetaData derived = getViewedFileMetaData f {- Constructing a view that will match arbitrary metadata, and applying - it to a file yields a set of ViewedFile which all contain the same -- cgit v1.2.3 From 724edf1f153fa53496cfe5cb65420ed7ec9dcf21 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawnFjuvfPpi1kf6l54bxfFUm0Aw_Gf_IO0o" Date: Sat, 22 Feb 2014 23:39:43 +0000 Subject: --- doc/forum/Too_big_to_fsck.mdwn | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 doc/forum/Too_big_to_fsck.mdwn diff --git a/doc/forum/Too_big_to_fsck.mdwn b/doc/forum/Too_big_to_fsck.mdwn new file mode 100644 index 000000000..975674b5c --- /dev/null +++ b/doc/forum/Too_big_to_fsck.mdwn @@ -0,0 +1,20 @@ +Hi, + +My Webapp isn't working: + + $ git-annex webapp error: refs/gcrypt/gitception+ does not point to a valid object! + error: refs/remotes/Beta/git-annex does not point to a valid object! + error: refs/remotes/Beta/master does not point to a valid object! + fatal: unable to read tree 656e7db5be172f01c0b6994d01f1a08d1273af12 + +So I tried to repair it: + + $ git-annex repair Running git fsck ... + Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it. + +So I tried to follow your advice here and increase the stack: + + $ git-annex +RTS -K35000000 -RTS fsck + git-annex: Most RTS options are disabled. Link with -rtsopts to enable them. + +I wasn't sure what to do next, so any help would be appreciated. -- cgit v1.2.3 From e88883479ae49f3436c952632a573ad2925688d8 Mon Sep 17 00:00:00 2001 From: "https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" Date: Sun, 23 Feb 2014 03:17:39 +0000 Subject: --- doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo.mdwn | 1 + 1 file changed, 1 insertion(+) create mode 100644 doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo.mdwn diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo.mdwn b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo.mdwn new file mode 100644 index 000000000..c94172535 --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo.mdwn @@ -0,0 +1 @@ +Is it possible to convert a regular git annex repo (git clone then git annex init in the folder), to an rsync remote. I have an annex with alot of remotes which makes the sync operation take a really long time. I would like to convert some of those remotes to rsync. This particular repo has a TB of data so I would like to avoid dropping content from the remote than re download everything. -- cgit v1.2.3 From 704fa2fa789b453d354f2f375b0b42481c8990f6 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 23 Feb 2014 00:08:29 -0400 Subject: annex.genmetadata can be set to make git-annex automatically set metadata (year and month) when adding files --- Annex/MetaData.hs | 51 +++++++++++++++++++++++++++++++++++++ Annex/View.hs | 3 +-- Command/Add.hs | 25 +++++++++++------- Command/MetaData.hs | 3 ++- Command/View.hs | 1 + Logs/MetaData.hs | 4 +-- Types/GitConfig.hs | 2 ++ Types/MetaData.hs | 20 ++++++--------- debian/changelog | 2 ++ doc/design/metadata.mdwn | 18 ++----------- doc/git-annex.mdwn | 6 +++++ doc/metadata.mdwn | 41 +++++++++++++++++++++++++++++ doc/tips/metadata_driven_views.mdwn | 2 +- 13 files changed, 135 insertions(+), 43 deletions(-) create mode 100644 Annex/MetaData.hs create mode 100644 doc/metadata.mdwn diff --git a/Annex/MetaData.hs b/Annex/MetaData.hs new file mode 100644 index 000000000..ef235b51f --- /dev/null +++ b/Annex/MetaData.hs @@ -0,0 +1,51 @@ +{- git-annex metadata + - + - Copyright 2014 Joey Hess + - + - Licensed under the GNU GPL version 3 or higher. + -} + +module Annex.MetaData where + +import Common.Annex +import qualified Annex +import Types.MetaData +import Logs.MetaData + +import qualified Data.Set as S +import qualified Data.Map as M +import Data.Time.Calendar +import Data.Time.Clock +import Data.Time.Clock.POSIX + +tagMetaField :: MetaField +tagMetaField = MetaField "tag" + +yearMetaField :: MetaField +yearMetaField = MetaField "year" + +monthMetaField :: MetaField +monthMetaField = MetaField "month" + +{- Generates metadata for a file that has just been ingested into the + - annex. Passed the FileStatus of the content file. + - + - Does not overwrite any existing metadata values for the key. + -} +genMetaData :: Key -> FileStatus -> Annex () +genMetaData key status = whenM (annexGenMetaData <$> Annex.getGitConfig) $ do + metadata <- getCurrentMetaData key + let metadata' = genMetaData' status metadata + unless (metadata' == emptyMetaData) $ + addMetaData key metadata' + +genMetaData' :: FileStatus -> MetaData -> MetaData +genMetaData' status old = MetaData $ M.fromList $ filter isnew + [ (yearMetaField, S.singleton $ toMetaValue $ show y) + , (monthMetaField, S.singleton $ toMetaValue $ show m) + ] + where + isnew (f, _) = S.null (currentMetaDataValues f old) + (y, m, _d) = toGregorian $ utctDay $ + posixSecondsToUTCTime $ realToFrac $ + modificationTime status diff --git a/Annex/View.hs b/Annex/View.hs index 69d064753..9d1a763e2 100644 --- a/Annex/View.hs +++ b/Annex/View.hs @@ -300,8 +300,7 @@ applyView' mkviewedfile getfilemetadata view = do genviewedfiles = viewedFiles view mkviewedfile -- enables memoization go uh hasher f (Just (k, _)) = do metadata <- getCurrentMetaData k - let dirmetadata = getfilemetadata f - let metadata' = unionMetaData dirmetadata metadata + let metadata' = getfilemetadata f `unionMetaData` metadata forM_ (genviewedfiles f metadata') $ \fv -> do stagesymlink uh hasher fv =<< inRepo (gitAnnexLink fv k) go uh hasher f Nothing diff --git a/Command/Add.hs b/Command/Add.hs index d1dcb6025..0906ae531 100644 --- a/Command/Add.hs +++ b/Command/Add.hs @@ -19,6 +19,7 @@ import Annex.Content import Annex.Content.Direct import Annex.Perms import Annex.Link +import Annex.MetaData import qualified Annex import qualified Annex.Queue #ifdef WITH_CLIBS @@ -145,26 +146,32 @@ ingest Nothing = return (Nothing, Nothing) ingest (Just source) = do backend <- chooseBackend $ keyFilename source k <- genKey source backend - cache <- liftIO $ genInodeCache $ contentLocation source - case (cache, inodeCache source) of - (_, Nothing) -> go k cache - (Just newc, Just c) | compareStrong c newc -> go k cache + ms <- liftIO $ catchMaybeIO $ getFileStatus $ contentLocation source + let mcache = toInodeCache =<< ms + case (mcache, inodeCache source) of + (_, Nothing) -> go k mcache ms + (Just newc, Just c) | compareStrong c newc -> go k mcache ms _ -> failure "changed while it was being added" where - go k cache = ifM isDirect ( godirect k cache , goindirect k cache ) + go k mcache ms = ifM isDirect + ( godirect k mcache ms + , goindirect k mcache ms + ) - goindirect (Just (key, _)) mcache = do + goindirect (Just (key, _)) mcache ms = do catchAnnex (moveAnnex key $ contentLocation source) (undo (keyFilename source) key) + maybe noop (genMetaData key) ms liftIO $ nukeFile $ keyFilename source return $ (Just key, mcache) - goindirect Nothing _ = failure "failed to generate a key" + goindirect _ _ _ = failure "failed to generate a key" - godirect (Just (key, _)) (Just cache) = do + godirect (Just (key, _)) (Just cache) ms = do addInodeCache key cache + maybe noop (genMetaData key) ms finishIngestDirect key source return $ (Just key, Just cache) - godirect _ _ = failure "failed to generate a key" + godirect _ _ _ = failure "failed to generate a key" failure msg = do warning $ keyFilename source ++ " " ++ msg diff --git a/Command/MetaData.hs b/Command/MetaData.hs index 6112dd095..651cb4944 100644 --- a/Command/MetaData.hs +++ b/Command/MetaData.hs @@ -10,6 +10,7 @@ module Command.MetaData where import Common.Annex import qualified Annex import Command +import Annex.MetaData import Logs.MetaData import Types.MetaData @@ -55,7 +56,7 @@ perform :: POSIXTime -> [ModMeta] -> Key -> CommandPerform perform _ [] k = next $ cleanup k perform now ms k = do oldm <- getCurrentMetaData k - let m = foldl' unionMetaData newMetaData $ map (modMeta oldm) ms + let m = foldl' unionMetaData emptyMetaData $ map (modMeta oldm) ms addMetaData' k m now next $ cleanup k diff --git a/Command/View.hs b/Command/View.hs index f123e3812..e5182e852 100644 --- a/Command/View.hs +++ b/Command/View.hs @@ -14,6 +14,7 @@ import qualified Git.Command import qualified Git.Ref import qualified Git.Branch import Types.MetaData +import Annex.MetaData import Types.View import Annex.View import Logs.View diff --git a/Logs/MetaData.hs b/Logs/MetaData.hs index 77c1b56a5..63314bcef 100644 --- a/Logs/MetaData.hs +++ b/Logs/MetaData.hs @@ -55,7 +55,7 @@ getMetaData = readLog . metaDataLogFile getCurrentMetaData :: Key -> Annex MetaData getCurrentMetaData = currentMetaData . collect <$$> getMetaData where - collect = foldl' unionMetaData newMetaData . map value . S.toAscList + collect = foldl' unionMetaData emptyMetaData . map value . S.toAscList {- Adds in some metadata, which can override existing values, or unset - them, but otherwise leaves any existing metadata as-is. -} @@ -129,7 +129,7 @@ simplifyLog s = case sl of go c _ [] = c go c newer (l:ls) - | unique == newMetaData = go c newer ls + | unique == emptyMetaData = go c newer ls | otherwise = go (l { value = unique } : c) (unionMetaData unique newer) ls where diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs index ab3dbd7b9..65984a108 100644 --- a/Types/GitConfig.hs +++ b/Types/GitConfig.hs @@ -49,6 +49,7 @@ data GitConfig = GitConfig , annexAutoUpgrade :: AutoUpgrade , annexExpireUnused :: Maybe (Maybe Duration) , annexSecureEraseCommand :: Maybe String + , annexGenMetaData :: Bool , coreSymlinks :: Bool , gcryptId :: Maybe String } @@ -81,6 +82,7 @@ extractGitConfig r = GitConfig , annexExpireUnused = maybe Nothing Just . parseDuration <$> getmaybe (annex "expireunused") , annexSecureEraseCommand = getmaybe (annex "secure-erase-command") + , annexGenMetaData = getbool (annex "genmetadata") False , coreSymlinks = getbool "core.symlinks" True , gcryptId = getmaybe "core.gcrypt-id" } diff --git a/Types/MetaData.hs b/Types/MetaData.hs index 617c122a6..b941cb59b 100644 --- a/Types/MetaData.hs +++ b/Types/MetaData.hs @@ -17,7 +17,6 @@ module Types.MetaData ( MetaSerializable, toMetaField, mkMetaField, - tagMetaField, fromMetaField, toMetaValue, mkMetaValue, @@ -25,7 +24,7 @@ module Types.MetaData ( unsetMetaData, fromMetaValue, fromMetaData, - newMetaData, + emptyMetaData, updateMetaData, unionMetaData, differenceMetaData, @@ -81,7 +80,7 @@ instance MetaSerializable MetaData where serialize (MetaData m) = unwords $ concatMap go $ M.toList m where go (f, vs) = serialize f : map serialize (S.toList vs) - deserialize = Just . getfield newMetaData . words + deserialize = Just . getfield emptyMetaData . words where getfield m [] = m getfield m (w:ws) = maybe m (getvalues m ws) (deserialize w) @@ -152,8 +151,8 @@ fromMetaValue (MetaValue _ f) = f fromMetaData :: MetaData -> [(MetaField, S.Set MetaValue)] fromMetaData (MetaData m) = M.toList m -newMetaData :: MetaData -newMetaData = MetaData M.empty +emptyMetaData :: MetaData +emptyMetaData = MetaData M.empty {- Can be used to set a value, or to unset it, depending on whether - the MetaValue has CurrentlySet or not. -} @@ -202,10 +201,10 @@ data ModMeta - Note that the new MetaData does not include all the - values set in the input metadata. It only contains changed values. -} modMeta :: MetaData -> ModMeta -> MetaData -modMeta _ (AddMeta f v) = updateMetaData f v newMetaData -modMeta _ (DelMeta f oldv) = updateMetaData f (unsetMetaValue oldv) newMetaData +modMeta _ (AddMeta f v) = updateMetaData f v emptyMetaData +modMeta _ (DelMeta f oldv) = updateMetaData f (unsetMetaValue oldv) emptyMetaData modMeta m (SetMeta f v) = updateMetaData f v $ - foldr (updateMetaData f) newMetaData $ + foldr (updateMetaData f) emptyMetaData $ map unsetMetaValue $ S.toList $ currentMetaDataValues f m {- Parses field=value, field+=value, field-=value -} @@ -233,9 +232,6 @@ mkMetaField f = maybe (Left $ badField f) Right (toMetaField f) badField :: String -> String badField f = "Illegal metadata field name, \"" ++ f ++ "\"" -tagMetaField :: MetaField -tagMetaField = MetaField "tag" - {- Avoid putting too many fields in the map; extremely large maps make - the seriaization test slow due to the sheer amount of data. - It's unlikely that more than 100 fields of metadata will be used. -} @@ -254,7 +250,7 @@ prop_metadata_sane :: MetaData -> MetaField -> MetaValue -> Bool prop_metadata_sane m f v = and [ S.member v $ metaDataValues f m' , not (isSet v) || S.member v (currentMetaDataValues f m') - , differenceMetaData m' newMetaData == m' + , differenceMetaData m' emptyMetaData == m' ] where m' = updateMetaData f v m diff --git a/debian/changelog b/debian/changelog index 4d5589ed0..edb203402 100644 --- a/debian/changelog +++ b/debian/changelog @@ -9,6 +9,8 @@ git-annex (5.20140222) UNRELEASED; urgency=medium directory hierarchy in a view. For example `git annex view tag=* podcasts/=*` makes a view in the form tag/showname. + * annex.genmetadata can be set to make git-annex automatically set + metadata (year and month) when adding files. -- Joey Hess Fri, 21 Feb 2014 13:03:04 -0400 diff --git a/doc/design/metadata.mdwn b/doc/design/metadata.mdwn index a648f5fe6..992ae1ca7 100644 --- a/doc/design/metadata.mdwn +++ b/doc/design/metadata.mdwn @@ -29,7 +29,7 @@ directories nest. relevant metadata from the files. TODO: It's not clear that removing a file should nuke all the metadata used to filter it into the - branch (especially if it's derived metadata like the year). + branch Currently, only metadata used for visible subdirs is added and removed this way. Also, this is not usable in direct mode because deleting the @@ -56,19 +56,7 @@ For example, by examining MP3 metadata. Also auto add metadata when adding files to view branches. See below. -## derived metadata - -This is probably not stored anywhere. It's computed on demand by a pure -function from the other metadata. -(Should be a general mechanism for this. (It probably generalizes to -sql queries if we want to go that far.)) - -### data metadata - -TODO From the ctime, some additional -metadata is derived, at least year=yyyy and probably also month, etc. - -### directory hierarchy metadata +## directory hierarchy metadata From the original filename used in the master branch, when constructing a view, generate fields. For example foo/bar/baz.mp3 @@ -87,8 +75,6 @@ those filenames to derive the same metadata, unless there is persistent storage. Luckily, the filenames used in the views currently include the subdirs. -**done**! - # other uses for metadata Uses are not limited to view branches. diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 3c233f378..ab158a952 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -1279,6 +1279,12 @@ Here are all the supported configuration settings. Note that setting numcopies to 0 is very unsafe. +* `annex.genmetadata` + + Set this to `true` to make git-annex automatically generate some metadata + when adding files to the repository. In particular, it stores + year and month metadata, from the file's modification date. + * `annex.queuesize` git-annex builds a queue of git commands, in order to combine similar diff --git a/doc/metadata.mdwn b/doc/metadata.mdwn new file mode 100644 index 000000000..414a91053 --- /dev/null +++ b/doc/metadata.mdwn @@ -0,0 +1,41 @@ +git-annex allows you to store arbitrary metadata about files stored in the +git-annex repository. The metadata is stored in the `git-annex` branch, and +so is automatically kept in sync with the rest of git-annex's state, such +as [[location_tracking]] information. + +Some of the things you can do with metadata include: + +* Using `git annex metadata file` to show all + the metadata associated with a file. +* [[tips/metadata_driven_views]] +* Limiting the files git-annex commands act on to those with + or without particular metadata. + For example `git annex find --metadata tag=foo --or --metadata tag=bar` +* Using it in [[preferred_content]] expressions. + For example "tag=important or not author=me" + +Each file (actually the underlying key) can have any number of metadata +fields, which each can have any number of values. For example, to tag +files, the `tag` field is typically used, with values set to each tag that +applies to the file. + +The field names are freeform (but cannot include spaces). The metadata +values can contain absolutely anything you like -- but you're recommended +to keep it simple and reasonably short. + +Here are some recommended metadata fields to use: + +* `tag` - With each tag being a different value. +* `year`, `month` - When this particular version of the file came into + being. + +To make git-annex automatically set the year and month when adding files, +run `git config annex.genmetadata true` + +git-annex's metadata can be updated in a distributed fashion. For example, +two users, each with their own clone of a repository, can set and unset +metadata at the same time, even for the same field of the same file. +When they push their changes, `git annex merge` will combine their +metadata changes in a consistent and (probably) intuitive way. + +See [[the metadata design page|design/metadata]] for more details. diff --git a/doc/tips/metadata_driven_views.mdwn b/doc/tips/metadata_driven_views.mdwn index e24bf29ae..0af05bd3f 100644 --- a/doc/tips/metadata_driven_views.mdwn +++ b/doc/tips/metadata_driven_views.mdwn @@ -1,5 +1,5 @@ git-annex now has support for storing -[[arbitrary metadata|design/metadata]] about annexed files. For example, this can be +[[arbitrary metadata|metadata]] about annexed files. For example, this can be used to tag files, to record the author of a file, etc. The metadata is synced around between repositories with the other information git-annex keeps track of. -- cgit v1.2.3 From e480c4a059d5bcaf295aed4b63d99ffc8d8101a3 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 23 Feb 2014 00:13:19 -0400 Subject: add note --- doc/tips/metadata_driven_views.mdwn | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/doc/tips/metadata_driven_views.mdwn b/doc/tips/metadata_driven_views.mdwn index 0af05bd3f..3a31fc8e6 100644 --- a/doc/tips/metadata_driven_views.mdwn +++ b/doc/tips/metadata_driven_views.mdwn @@ -14,6 +14,12 @@ refine or reorder a view. Let's get started by setting some tags on files. No views yet, just some metadata: +[[!template id=note text=""" +To avoid needing to manually tag files with the year (and month), +run `annex.genmetadata true`, and git-annex will do it for you +when adding files. +"""]] + # git annex metadata --tag todo work/2014/* # git annex metadata --untag todo work/2014/done/* # git annex metadata --tag urgent work/2014/presentation_for_tomorrow.odt -- cgit v1.2.3 From e99bf1bdae9e4ffc5abe0c293713ed7660687a88 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 23 Feb 2014 00:25:35 -0400 Subject: more todos --- doc/design/metadata.mdwn | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/design/metadata.mdwn b/doc/design/metadata.mdwn index 992ae1ca7..719d46323 100644 --- a/doc/design/metadata.mdwn +++ b/doc/design/metadata.mdwn @@ -198,3 +198,7 @@ So, possible approaches: * The filename mangling can result in a filename in a view that is too long for its containing filesystem. Should detect and do something reasonable to avoid. TODO + +* When a file is edited and the new version committed, the new key + does not inherit the metadata of the old key. Which may be right + sometimes, but more generally violates least surprise. -- cgit v1.2.3 From 531e99258f149dbd4a7ef049f1a1c89aecb61ea6 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 23 Feb 2014 00:25:46 -0400 Subject: devblog --- doc/devblog/day_120__more_metadata.mdwn | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 doc/devblog/day_120__more_metadata.mdwn diff --git a/doc/devblog/day_120__more_metadata.mdwn b/doc/devblog/day_120__more_metadata.mdwn new file mode 100644 index 000000000..daff68e37 --- /dev/null +++ b/doc/devblog/day_120__more_metadata.mdwn @@ -0,0 +1,17 @@ +When generating a view, there's now a way to reuse part of the directory +hierarchy of the parent branch. For example, `git annex view tag=* podcasts/=*` +makes a view where the first level is the tags, and the second level is +whatever `podcasts/*` directories the files were in. + +Also, year and month metadata can be automatically recorded when +adding files to the annex. I made this only be done when annex.genmetadata +is turned on, to avoid polluting repositories that don't want to use metadata. + +It would be nice if there was a way to add a hook script that's run +when files are added, to collect their metadata. I am not sure yet if +I am going to add that to git-annex though. It's already possible to do via +the regular git `post-commit` hook. Just make it look at the commit to see +what files were added, and then run `git annex metadata` to set their +metadata appropriately. It would be good to at least have an example of +such a script to eg, extract EXIF or ID3 metadata. Perhaps someone can +contribute one? -- cgit v1.2.3 From 82b230213dc35f17ed26dbc5074cb8bc14444ba1 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 23 Feb 2014 00:27:18 -0400 Subject: layout --- doc/tips/metadata_driven_views.mdwn | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/doc/tips/metadata_driven_views.mdwn b/doc/tips/metadata_driven_views.mdwn index 3a31fc8e6..17ebc6869 100644 --- a/doc/tips/metadata_driven_views.mdwn +++ b/doc/tips/metadata_driven_views.mdwn @@ -45,6 +45,12 @@ Ok, metadata is in place, but how to use it? Time to change views! Switched to branch 'views/_' ok +[[!template id=note text=""" +Notice that a single file may appear in multiple directories +depending on its tags. For example, `lotsofcats.ogv` is in +both `new/` and `video/`. +"""]] + This searched for all files with any tag, and created a new git branch that sorts the files according to their tags. @@ -57,10 +63,6 @@ that sorts the files according to their tags. video sound -Notice that a single file may appear in multiple directories -depending on its tags. For example, `lotsofcats.ogv` is in -both `new/` and `video/`. - Ah, but you're at work now, and don't want to be distracted by cat videos. Time to filter the view: -- cgit v1.2.3 From 79d7d7833ff54a7694a78e5f84d78651266fa735 Mon Sep 17 00:00:00 2001 From: stp Date: Sun, 23 Feb 2014 14:25:23 +0000 Subject: Added a comment: Ídea MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../comment_3_d7349af488008e3ca6557e0c1fbfc5b6._comment | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 doc/bugs/git_annex_sync_--content_not_syncing_all_objects/comment_3_d7349af488008e3ca6557e0c1fbfc5b6._comment diff --git a/doc/bugs/git_annex_sync_--content_not_syncing_all_objects/comment_3_d7349af488008e3ca6557e0c1fbfc5b6._comment b/doc/bugs/git_annex_sync_--content_not_syncing_all_objects/comment_3_d7349af488008e3ca6557e0c1fbfc5b6._comment new file mode 100644 index 000000000..34c2c4c16 --- /dev/null +++ b/doc/bugs/git_annex_sync_--content_not_syncing_all_objects/comment_3_d7349af488008e3ca6557e0c1fbfc5b6._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="stp" + ip="84.56.21.11" + subject="Ídea" + date="2014-02-23T14:25:22Z" + content=""" +I thought about the implementation need for git annex sync --content --all. If preferred content expressions would work it would be needed. Everything else. could be done via a split usage. +Run \"git annex sync --content\" to satisfy the preferred content expressions on the working tree and the numcopies on the working tree and then loop through all backup/archive repositories with \"git annex get --auto\" this should at least prevent archives from getting objects numcopies is already satisfying and sync the objects not yet satisfied right? +"""]] -- cgit v1.2.3 From 84884101e3c19b0db4c8c5975fc47575dc0782fb Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 23 Feb 2014 13:34:59 -0400 Subject: metadata: Field names limited to alphanumerics and a few whitelisted punctuation characters to avoid issues with views, etc. --- Types/MetaData.hs | 24 +++++++++++++++++------- debian/changelog | 2 ++ doc/metadata.mdwn | 2 +- 3 files changed, 20 insertions(+), 8 deletions(-) diff --git a/Types/MetaData.hs b/Types/MetaData.hs index b941cb59b..7c4028a2d 100644 --- a/Types/MetaData.hs +++ b/Types/MetaData.hs @@ -115,19 +115,29 @@ instance MetaSerializable CurrentlySet where deserialize "-" = Just (CurrentlySet False) deserialize _ = Nothing -{- Fields cannot be empty, contain whitespace, or start with "+-" as - - that would break the serialization. -} toMetaField :: String -> Maybe MetaField toMetaField f | legalField f = Just $ MetaField f | otherwise = Nothing +{- Fields cannot be empty, contain whitespace, or start with "+-" as + - that would break the serialization. + - + - Additionally, fields should not contain any form of path separator, as + - that would break views. + - + - So, require they have an alphanumeric first letter, with the remainder + - being either alphanumeric or a small set of shitelisted common punctuation. + -} legalField :: String -> Bool -legalField f - | null f = False - | any isSpace f = False - | any (`isPrefixOf` f) ["+", "-"] = False - | otherwise = True +legalField [] = False +legalField (c1:cs) + | not (isAlphaNum c1) = False + | otherwise = all legalchars cs + where + legalchars c + | isAlphaNum c = True + | otherwise = c `elem` "_-." toMetaValue :: String -> MetaValue toMetaValue = MetaValue (CurrentlySet True) diff --git a/debian/changelog b/debian/changelog index edb203402..c336fc481 100644 --- a/debian/changelog +++ b/debian/changelog @@ -11,6 +11,8 @@ git-annex (5.20140222) UNRELEASED; urgency=medium tag/showname. * annex.genmetadata can be set to make git-annex automatically set metadata (year and month) when adding files. + * metadata: Field names limited to alphanumerics and a few whitelisted + punctuation characters to avoid issues with views, etc. -- Joey Hess Fri, 21 Feb 2014 13:03:04 -0400 diff --git a/doc/metadata.mdwn b/doc/metadata.mdwn index 414a91053..d3c3b748e 100644 --- a/doc/metadata.mdwn +++ b/doc/metadata.mdwn @@ -19,7 +19,7 @@ fields, which each can have any number of values. For example, to tag files, the `tag` field is typically used, with values set to each tag that applies to the file. -The field names are freeform (but cannot include spaces). The metadata +The field names are limited to alphanumerics (and `[_-.]`). The metadata values can contain absolutely anything you like -- but you're recommended to keep it simple and reasonably short. -- cgit v1.2.3 From d811d5697259aaf0fe68f424edf33321ef2b0d5e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 23 Feb 2014 13:58:16 -0400 Subject: metadata: Support --json --- Command/MetaData.hs | 10 ++++++---- debian/changelog | 1 + doc/git-annex.mdwn | 2 +- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/Command/MetaData.hs b/Command/MetaData.hs index 651cb4944..55d67c6b7 100644 --- a/Command/MetaData.hs +++ b/Command/MetaData.hs @@ -18,7 +18,7 @@ import qualified Data.Set as S import Data.Time.Clock.POSIX def :: [Command] -def = [withOptions [setOption, tagOption, untagOption] $ +def = [withOptions [setOption, tagOption, untagOption, jsonOption] $ command "metadata" paramPaths seek SectionMetaData "sets metadata of a file"] @@ -62,8 +62,10 @@ perform now ms k = do cleanup :: Key -> CommandCleanup cleanup k = do - m <- getCurrentMetaData k - showLongNote $ unlines $ concatMap showmeta $ fromMetaData $ currentMetaData m + l <- map unwrapmeta . fromMetaData <$> getCurrentMetaData k + maybeShowJSON l + showLongNote $ unlines $ concatMap showmeta l return True where - showmeta (f, vs) = map (\v -> fromMetaField f ++ "=" ++ fromMetaValue v) $ S.toList vs + unwrapmeta (f, v) = (fromMetaField f, map fromMetaValue (S.toList v)) + showmeta (f, vs) = map ((f ++ "=") ++) vs diff --git a/debian/changelog b/debian/changelog index c336fc481..543504c12 100644 --- a/debian/changelog +++ b/debian/changelog @@ -13,6 +13,7 @@ git-annex (5.20140222) UNRELEASED; urgency=medium metadata (year and month) when adding files. * metadata: Field names limited to alphanumerics and a few whitelisted punctuation characters to avoid issues with views, etc. + * metadata: Support --json -- Joey Hess Fri, 21 Feb 2014 13:03:04 -0400 diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index ab158a952..441da7b98 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -951,7 +951,7 @@ subdirectories). Rather than the normal output, generate JSON. This is intended to be parsed by programs that use git-annex. Each line of output is a JSON object. Note that JSON output is only usable with some git-annex commands, - like info, find, and whereis. + like info, find, whereis, and metadata. * `--debug` -- cgit v1.2.3 From 205c05fee0408bd5b704ab77ebd36b67abe7ac5f Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sun, 23 Feb 2014 18:51:45 +0000 Subject: Added a comment --- .../comment_1_490b8bfe95b01a23408ecb5d63dcd40b._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/forum/Too_big_to_fsck/comment_1_490b8bfe95b01a23408ecb5d63dcd40b._comment diff --git a/doc/forum/Too_big_to_fsck/comment_1_490b8bfe95b01a23408ecb5d63dcd40b._comment b/doc/forum/Too_big_to_fsck/comment_1_490b8bfe95b01a23408ecb5d63dcd40b._comment new file mode 100644 index 000000000..73d218bac --- /dev/null +++ b/doc/forum/Too_big_to_fsck/comment_1_490b8bfe95b01a23408ecb5d63dcd40b._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-23T18:51:45Z" + content=""" +I suspect that git fsck is outputting so many lines about problems that it's taking more memory than it's limited to using to hold them all. + +Can you paste the output of: git fsck --no-dangling --no-reflogs +"""]] -- cgit v1.2.3 From 9f645369ab00dbcdb9e1178d6a924f6aa6fc38a3 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sun, 23 Feb 2014 18:57:43 +0000 Subject: Added a comment --- .../comment_1_e6065f9c44c85030c7628e2cfa0fd0fa._comment | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_1_e6065f9c44c85030c7628e2cfa0fd0fa._comment diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_1_e6065f9c44c85030c7628e2cfa0fd0fa._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_1_e6065f9c44c85030c7628e2cfa0fd0fa._comment new file mode 100644 index 000000000..46ba67c30 --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_1_e6065f9c44c85030c7628e2cfa0fd0fa._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-23T18:57:42Z" + content=""" +This is doable. It works best if the remote repo is a bare git repository, because then the filenames line up 100% with the filenames used in a rsync special remote. If the git repo is not bare, the rsync special remote will first try the paths it expects, and only then fall back to the right paths, so a little extra work done. (If this became a big problem, it would not be infesable to move the files around with a script.) + +Anyway, if it's a bare repo, then repo.git/annex/objects is where you want to point the rsync special remote at. With a non-bare repo, repo/.git/annex/objects/ is the location. I'd recommend moving the objects directory out to a new location, and pointing the rsyncurl at that. This way, there's no possibility of git-annex thinking one files accessed 2 ways is 2 copies. + +Of course, you can't use encryption for the rsync special remote. +"""]] -- cgit v1.2.3 From 733c6daeb4c432c8633a6c368ed5650e46690013 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sun, 23 Feb 2014 19:07:59 +0000 Subject: Added a comment --- .../comment_2_76bfb11396dc20a5105376b22e7e773b._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_2_76bfb11396dc20a5105376b22e7e773b._comment diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_2_76bfb11396dc20a5105376b22e7e773b._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_2_76bfb11396dc20a5105376b22e7e773b._comment new file mode 100644 index 000000000..8ed4f6508 --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_2_76bfb11396dc20a5105376b22e7e773b._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 2" + date="2014-02-23T19:07:59Z" + content=""" +However, if the only problem is that pushing and pulling with a git repository makes `git annex sync` take too long, another option is setting `git config remote.$foo.annex-sync false`. You can still then use git-annex commands to get and push data to the remote, and can even `git annex sync $foo` from time to time, but it won't slow down the normal `git annex sync`. + +However, this also prevents the assistant from uploading new files to the remote automatically. +"""]] -- cgit v1.2.3 From dc08f90356ea9273a59d364352a8dfa500ba0445 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawk6I980MaZ8Xm0giW9VYpms1_vG13xurdw" Date: Sun, 23 Feb 2014 19:08:39 +0000 Subject: --- doc/todo/Views_Demo.mdwn | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 doc/todo/Views_Demo.mdwn diff --git a/doc/todo/Views_Demo.mdwn b/doc/todo/Views_Demo.mdwn new file mode 100644 index 000000000..2587642e3 --- /dev/null +++ b/doc/todo/Views_Demo.mdwn @@ -0,0 +1,13 @@ +Joey, + +I've been thinking about leveraging git-annex for a workgroup document repository and I have just watched your views demo. The timing of the demo is great because I need to deploy a document repository with per-document metadata and your views concept seems like a great mechanism for associating metadata to documents and for displaying that metadata. + +While I don't expect to use your views concept for my workgroup repostory, a later iteration might do. + +The metadata in my use case begins with all the weird metadata seen on a book's copyright page. In addition, per-document provenance, like how one found the document and (if we're lucky) a URL where the latest version of the document may be found. Metadata values may be simple strings or may be markdown text. + +So, are you considering a metadata syntax that can support complex metadata? One example is multiple authors. Another issue is complex metadata values, like key=abstract and value="markdown text...". + +FWIW, + +Bob -- cgit v1.2.3 From 32e6d6fd68839ec27eb45eb1ace17194aa762653 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sun, 23 Feb 2014 19:09:30 +0000 Subject: Added a comment --- .../comment_2_2666c135dd3378cf6301aa4957049fbd._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/forum/Too_big_to_fsck/comment_2_2666c135dd3378cf6301aa4957049fbd._comment diff --git a/doc/forum/Too_big_to_fsck/comment_2_2666c135dd3378cf6301aa4957049fbd._comment b/doc/forum/Too_big_to_fsck/comment_2_2666c135dd3378cf6301aa4957049fbd._comment new file mode 100644 index 000000000..f2028c91f --- /dev/null +++ b/doc/forum/Too_big_to_fsck/comment_2_2666c135dd3378cf6301aa4957049fbd._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 2" + date="2014-02-23T19:09:30Z" + content=""" +Erm, that output is liable to be big, I only care how many lines and characters of output there are! + + git fsck --no-dangling --no-reflogs |wc +"""]] -- cgit v1.2.3 From 0cf98e1df7e77a1fd6cd5cbd23d25f4677e1653e Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sun, 23 Feb 2014 19:12:10 +0000 Subject: Added a comment --- .../comment_3_dfb169c441215b671f8c971184de3e16._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/forum/Too_big_to_fsck/comment_3_dfb169c441215b671f8c971184de3e16._comment diff --git a/doc/forum/Too_big_to_fsck/comment_3_dfb169c441215b671f8c971184de3e16._comment b/doc/forum/Too_big_to_fsck/comment_3_dfb169c441215b671f8c971184de3e16._comment new file mode 100644 index 000000000..96b1ac9cd --- /dev/null +++ b/doc/forum/Too_big_to_fsck/comment_3_dfb169c441215b671f8c971184de3e16._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 3" + date="2014-02-23T19:12:10Z" + content=""" +Also, you can build git-annex from source with the RTS options enabled by running `cabal install git-annex --ghc-options=-rtsopts` + +(or just build git-repair which has the repository repair parts of git-annex) +"""]] -- cgit v1.2.3 From abb0b0930efd65a0939092fb899d6f4358abb76b Mon Sep 17 00:00:00 2001 From: "https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" Date: Sun, 23 Feb 2014 19:39:28 +0000 Subject: Added a comment --- .../comment_3_b34d6ae0718ab0ff6bc1d7b8f2470b9b._comment | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_3_b34d6ae0718ab0ff6bc1d7b8f2470b9b._comment diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_3_b34d6ae0718ab0ff6bc1d7b8f2470b9b._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_3_b34d6ae0718ab0ff6bc1d7b8f2470b9b._comment new file mode 100644 index 000000000..a8040cd3f --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_3_b34d6ae0718ab0ff6bc1d7b8f2470b9b._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" + nickname="Hamza" + subject="comment 3" + date="2014-02-23T19:39:28Z" + content=""" +Thanks for the reply, just to make sure I got you right, + +It is indeed a non bare git repo. So I will move the folder repo/.git/annex/objects/ to repo/ + +then run, + +git annex initremote myrsync type=rsync rsyncurl=ssh.example.com:~/repo + +and enable the remote on other annexes (disks are connected to an ssh server there is no encryption setup right now so I do not mind not having it.). And everything should be setup correctly. +"""]] -- cgit v1.2.3 From b8075a4f8c48ff9fc8b228ea8d707214be238afa Mon Sep 17 00:00:00 2001 From: "https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" Date: Sun, 23 Feb 2014 19:45:09 +0000 Subject: Added a comment --- .../comment_4_8f5e323b29745591f9f2f0f867353f69._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_4_8f5e323b29745591f9f2f0f867353f69._comment diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_4_8f5e323b29745591f9f2f0f867353f69._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_4_8f5e323b29745591f9f2f0f867353f69._comment new file mode 100644 index 000000000..c8305aaa9 --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_4_8f5e323b29745591f9f2f0f867353f69._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" + nickname="Hamza" + subject="comment 4" + date="2014-02-23T19:45:09Z" + content=""" +and as a follow up do I have rename the repos or can I reuse the same names for the repos? +"""]] -- cgit v1.2.3 From 35221c8ffb7609a9901f6151a20e27fef988cb77 Mon Sep 17 00:00:00 2001 From: stp Date: Mon, 24 Feb 2014 12:28:08 +0000 Subject: Added a comment: New findings --- .../comment_7_a865216046aa91a47d0d2b2f0668ea89._comment | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 doc/devblog/day_-4__forgetting/comment_7_a865216046aa91a47d0d2b2f0668ea89._comment diff --git a/doc/devblog/day_-4__forgetting/comment_7_a865216046aa91a47d0d2b2f0668ea89._comment b/doc/devblog/day_-4__forgetting/comment_7_a865216046aa91a47d0d2b2f0668ea89._comment new file mode 100644 index 000000000..dc142edc0 --- /dev/null +++ b/doc/devblog/day_-4__forgetting/comment_7_a865216046aa91a47d0d2b2f0668ea89._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="stp" + ip="84.56.21.11" + subject="New findings" + date="2014-02-24T12:28:03Z" + content=""" +Another thing I found, which was annoying is that I have objects in my annex not tracked anywhere it seems. +\"git annex fsck --all\" complains about not having access to the object. \"git log --stat -S '$key'\" doesn't have any record. \"git annex fsck\" has no issues and \"git annex unused\" comes up empty too. +I'm not sure where these objects still reside or why how to remove this annoying failure. + +So not only should \"git annex forget $key\" remove references from within all branches, but should also clean up the aforementioned loose objects, which are neither unused, nor available, nor referenced. +"""]] -- cgit v1.2.3 From 77cc657dcaae697da5712894cc7e893ed28dc1a9 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawk9nck8WX8-ADF3Fdh5vFo4Qrw1I_bJcR8" Date: Mon, 24 Feb 2014 13:20:28 +0000 Subject: Added a comment --- .../comment_7_73f71386f8eafbb65f4cc9769021710f._comment | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 doc/bugs/Creating_a_box.com_repository_fails/comment_7_73f71386f8eafbb65f4cc9769021710f._comment diff --git a/doc/bugs/Creating_a_box.com_repository_fails/comment_7_73f71386f8eafbb65f4cc9769021710f._comment b/doc/bugs/Creating_a_box.com_repository_fails/comment_7_73f71386f8eafbb65f4cc9769021710f._comment new file mode 100644 index 000000000..016371346 --- /dev/null +++ b/doc/bugs/Creating_a_box.com_repository_fails/comment_7_73f71386f8eafbb65f4cc9769021710f._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawk9nck8WX8-ADF3Fdh5vFo4Qrw1I_bJcR8" + nickname="Jon Ander" + subject="comment 7" + date="2014-02-24T13:20:27Z" + content=""" +This is what I get in the log in version 5.20140221 in Debian Sid: + + 100% 46.5KB/s 0sInternalIOException : hPutBuf: illegal operation (handle is closed) + InternalIOException : hPutBuf: illegal operation (handle is closed) + +It seams that the file is being uploaded (folders are being created in box.com) but it crashes when reaching 100% +"""]] -- cgit v1.2.3 From 2df5bcd533e5fd979f98d573643751e8f8b848a3 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawkQafKy7hNSEolLs6TvbgUnkklTctUY9LI" Date: Mon, 24 Feb 2014 17:54:45 +0000 Subject: --- doc/bugs/pages_of_packfile_errors.mdwn | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 doc/bugs/pages_of_packfile_errors.mdwn diff --git a/doc/bugs/pages_of_packfile_errors.mdwn b/doc/bugs/pages_of_packfile_errors.mdwn new file mode 100644 index 000000000..9d60dd2aa --- /dev/null +++ b/doc/bugs/pages_of_packfile_errors.mdwn @@ -0,0 +1,30 @@ +### Please describe the problem. + +A repair that runs for ages. In the log file, pages and pages and pages of: + +error: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack does not match index +warning: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack cannot be accessed +error: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack does not match index +warning: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack cannot be accessed +error: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack does not match index +warning: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack cannot be accessed +error: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack does not match index +warning: packfile /Volumes/BandZbackup2/annex/.git/objects/pack/pack-f0ae2f5cc83f11eab406518b9f06a344acf9c93c.pack cannot be accessed + +### What steps will reproduce the problem? + +Running git-annex, plugging in my external drive + +### What version of git-annex are you using? On what operating system? + +Auto-updated latest, I thought, but the about page says: Version: 5.20131230-g9a495e6 + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] -- cgit v1.2.3 From d56f245b212d6a9ec88ad9d69dee4f0cc22daa5d Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Mon, 24 Feb 2014 18:17:04 +0000 Subject: Added a comment --- .../comment_1_d7c83a0e9a83e4a05aa74a34a7e1cf19._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/todo/Views_Demo/comment_1_d7c83a0e9a83e4a05aa74a34a7e1cf19._comment diff --git a/doc/todo/Views_Demo/comment_1_d7c83a0e9a83e4a05aa74a34a7e1cf19._comment b/doc/todo/Views_Demo/comment_1_d7c83a0e9a83e4a05aa74a34a7e1cf19._comment new file mode 100644 index 000000000..4c9b05635 --- /dev/null +++ b/doc/todo/Views_Demo/comment_1_d7c83a0e9a83e4a05aa74a34a7e1cf19._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-24T18:17:04Z" + content=""" +All that should work fine. All metadata fields are multivalued, and the value can be any arbitrary data. +"""]] -- cgit v1.2.3 From 3972a601b63c3343125dbb1be0c42d11c3a08346 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Mon, 24 Feb 2014 18:32:31 +0000 Subject: Added a comment --- .../comment_1_eb2989112b38bb27ce8f691dd5d318e5._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/bugs/pages_of_packfile_errors/comment_1_eb2989112b38bb27ce8f691dd5d318e5._comment diff --git a/doc/bugs/pages_of_packfile_errors/comment_1_eb2989112b38bb27ce8f691dd5d318e5._comment b/doc/bugs/pages_of_packfile_errors/comment_1_eb2989112b38bb27ce8f691dd5d318e5._comment new file mode 100644 index 000000000..d74470ffd --- /dev/null +++ b/doc/bugs/pages_of_packfile_errors/comment_1_eb2989112b38bb27ce8f691dd5d318e5._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 1" + date="2014-02-24T18:32:31Z" + content=""" +Well, you seem to have a corrupt git repository on your removable drive. git-annex seems to be in the process of repairing it, which can take some time. + +I don't see a bug here, from what you've described so far.. +"""]] -- cgit v1.2.3 From 820750d37839e72f449d364224d23f7035d23e2c Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 24 Feb 2014 14:41:33 -0400 Subject: Preserve metadata when staging a new version of an annexed file. Performance impact: When adding a large tree of new files, this needs to do some git cat-file queries to check if any of the files already existed and might need a metadata copy. I tried a benchmark in a copy of my sound repository (so there was already a significant git tree to check against. Adding 10000 small files, with a cold cache: before: 1m48.539s after: 1m52.791s So, impact is 0.0004 seconds per file added. Which seems acceptable, so did not add some kind of configuration to enable/disable this. This commit was sponsored by Lisa Feilen. --- Annex/CatFile.hs | 3 +-- Annex/MetaData.hs | 30 ++++++++++++++++++++---------- Command/Add.hs | 4 ++-- Logs/MetaData.hs | 19 ++++++++++++++++++- debian/changelog | 1 + 5 files changed, 42 insertions(+), 15 deletions(-) diff --git a/Annex/CatFile.hs b/Annex/CatFile.hs index 54a4d1099..87d179a62 100644 --- a/Annex/CatFile.hs +++ b/Annex/CatFile.hs @@ -87,8 +87,7 @@ catKey' modeguaranteed ref mode | modeguaranteed = catObject ref | otherwise = L.take 8192 <$> catObject ref -{- Looks up the file mode corresponding to the Ref using the running - - cat-file. +{- Looks up the key corresponding to the Ref using the running cat-file. - - Currently this always has to look in HEAD, because cat-file --batch - does not offer a way to specify that we want to look up a tree object diff --git a/Annex/MetaData.hs b/Annex/MetaData.hs index ef235b51f..b7850a868 100644 --- a/Annex/MetaData.hs +++ b/Annex/MetaData.hs @@ -11,6 +11,7 @@ import Common.Annex import qualified Annex import Types.MetaData import Logs.MetaData +import Annex.CatFile import qualified Data.Set as S import qualified Data.Map as M @@ -27,18 +28,27 @@ yearMetaField = MetaField "year" monthMetaField :: MetaField monthMetaField = MetaField "month" -{- Generates metadata for a file that has just been ingested into the - - annex. Passed the FileStatus of the content file. +{- Adds metadata for a file that has just been ingested into the + - annex, but has not yet been committed to git. - - - Does not overwrite any existing metadata values for the key. + - When the file has been modified, the metadata is copied over + - from the old key to the new key. Note that it looks at the old key as + - committed to HEAD -- the new key may or may not have already been staged + - in th annex. + - + - Also, can generate new metadata, if configured to do so. -} -genMetaData :: Key -> FileStatus -> Annex () -genMetaData key status = whenM (annexGenMetaData <$> Annex.getGitConfig) $ do - metadata <- getCurrentMetaData key - let metadata' = genMetaData' status metadata - unless (metadata' == emptyMetaData) $ - addMetaData key metadata' - +genMetaData :: Key -> FilePath -> FileStatus -> Annex () +genMetaData key file status = do + maybe noop (flip copyMetaData key) =<< catKeyFileHEAD file + whenM (annexGenMetaData <$> Annex.getGitConfig) $ do + metadata <- getCurrentMetaData key + let metadata' = genMetaData' status metadata + unless (metadata' == emptyMetaData) $ + addMetaData key metadata' + +{- Generates metadata from the FileStatus. + - Does not overwrite any existing metadata values. -} genMetaData' :: FileStatus -> MetaData -> MetaData genMetaData' status old = MetaData $ M.fromList $ filter isnew [ (yearMetaField, S.singleton $ toMetaValue $ show y) diff --git a/Command/Add.hs b/Command/Add.hs index 0906ae531..662ce4242 100644 --- a/Command/Add.hs +++ b/Command/Add.hs @@ -161,14 +161,14 @@ ingest (Just source) = do goindirect (Just (key, _)) mcache ms = do catchAnnex (moveAnnex key $ contentLocation source) (undo (keyFilename source) key) - maybe noop (genMetaData key) ms + maybe noop (genMetaData key (keyFilename source)) ms liftIO $ nukeFile $ keyFilename source return $ (Just key, mcache) goindirect _ _ _ = failure "failed to generate a key" godirect (Just (key, _)) (Just cache) ms = do addInodeCache key cache - maybe noop (genMetaData key) ms + maybe noop (genMetaData key (keyFilename source)) ms finishIngestDirect key source return $ (Just key, Just cache) godirect _ _ _ = failure "failed to generate a key" diff --git a/Logs/MetaData.hs b/Logs/MetaData.hs index 63314bcef..6702c3733 100644 --- a/Logs/MetaData.hs +++ b/Logs/MetaData.hs @@ -28,10 +28,10 @@ module Logs.MetaData ( getCurrentMetaData, - getMetaData, addMetaData, addMetaData', currentMetaData, + copyMetaData, ) where import Common.Annex @@ -135,3 +135,20 @@ simplifyLog s = case sl of where older = value l unique = older `differenceMetaData` newer + +{- Copies the metadata from the old key to the new key. + - + - The exact content of the metadata file is copied, so that the timestamps + - remain the same, and because this is more space-efficient in the git + - repository. + - + - Any metadata already attached to the new key is not preserved. + -} +copyMetaData :: Key -> Key -> Annex () +copyMetaData oldkey newkey + | oldkey == newkey = noop + | otherwise = do + l <- getMetaData oldkey + unless (S.null l) $ + Annex.Branch.change (metaDataLogFile newkey) $ + const $ showLog l diff --git a/debian/changelog b/debian/changelog index 543504c12..8c157aeb2 100644 --- a/debian/changelog +++ b/debian/changelog @@ -11,6 +11,7 @@ git-annex (5.20140222) UNRELEASED; urgency=medium tag/showname. * annex.genmetadata can be set to make git-annex automatically set metadata (year and month) when adding files. + * Preserve metadata when staging a new version of an annexed file. * metadata: Field names limited to alphanumerics and a few whitelisted punctuation characters to avoid issues with views, etc. * metadata: Support --json -- cgit v1.2.3 From 263204eafaadaeb65ab61eaa0a0f79d999a35bad Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 24 Feb 2014 14:45:19 -0400 Subject: update --- doc/design/metadata.mdwn | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/doc/design/metadata.mdwn b/doc/design/metadata.mdwn index 719d46323..7d1ff4bfa 100644 --- a/doc/design/metadata.mdwn +++ b/doc/design/metadata.mdwn @@ -172,14 +172,15 @@ So, possible approaches: * Git has a complex set of rules for what is legal in a ref name. View branch names will need to filter out any illegal stuff. **done** +* Metadata should be copied to the new key when adding a modified version + of a file. **done** + * Filesystems that are not case sensative (including case preserving OSX) will cause problems if view branches try to use different cases for - 2 directories representing the value of some metadata. But, users - probably want at least case-preserving metadata values. + 2 directories representing a metadata field. - Solution might be to compare metadata case-insensitively, and - pick one representation consistently, so if, for example an author - field uses mixed case, it will be used in the view branch. + Solution might be to compare fields names case-insensitively, and + pick one representation consistently. Alternatively, it could escape `A` to `_A` when such a filesystem is detected and avoid collisions that way (double `_` to escape it). @@ -198,7 +199,3 @@ So, possible approaches: * The filename mangling can result in a filename in a view that is too long for its containing filesystem. Should detect and do something reasonable to avoid. TODO - -* When a file is edited and the new version committed, the new key - does not inherit the metadata of the old key. Which may be right - sometimes, but more generally violates least surprise. -- cgit v1.2.3 From 58b98f169c3a87fcb8974bf2094c3123fe4778d4 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Mon, 24 Feb 2014 19:07:33 +0000 Subject: Added a comment --- .../comment_5_9824c953694770afa0611ff7276737bf._comment | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_5_9824c953694770afa0611ff7276737bf._comment diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_5_9824c953694770afa0611ff7276737bf._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_5_9824c953694770afa0611ff7276737bf._comment new file mode 100644 index 000000000..c6052232b --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_5_9824c953694770afa0611ff7276737bf._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 5" + date="2014-02-24T19:07:33Z" + content=""" +That looks all-right, although initremote will ask you to tell it what encryption to use, and you'll need to specify `encryption=none` + +One thing I forgot to mention is that the UUID of the new rsync repository won't be the same, so git-annex won't know about the files in there. This can be fixed by `git annex fsck --fast --from myrsync`. Which doesn't re-download all the files, but you still may want to run it on a repository close to or on the server for speed. + +You can re-use the name you're currently using for the git remote for the new rsync special remote if you like. +"""]] -- cgit v1.2.3 From 278ac4da12315a90aededea02993c68924bb7a76 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 24 Feb 2014 15:14:44 -0400 Subject: webapp: Fix creation of box.com, S3, and Glacier repositories, broken in 5.20140221. --- Remote/Glacier.hs | 10 +++++----- Remote/S3.hs | 10 +++++----- Remote/WebDAV.hs | 4 ++-- debian/changelog | 2 ++ 4 files changed, 14 insertions(+), 12 deletions(-) diff --git a/Remote/Glacier.hs b/Remote/Glacier.hs index 84557851b..33719926c 100644 --- a/Remote/Glacier.hs +++ b/Remote/Glacier.hs @@ -73,16 +73,16 @@ gen r u c gc = new <$> remoteCost gc veryExpensiveRemoteCost glacierSetup :: Maybe UUID -> Maybe CredPair -> RemoteConfig -> Annex (RemoteConfig, UUID) glacierSetup mu mcreds c = do u <- maybe (liftIO genUUID) return mu - glacierSetup' (isJust mu) u mcreds c -glacierSetup' :: Bool -> UUID -> Maybe CredPair -> RemoteConfig -> Annex (RemoteConfig, UUID) -glacierSetup' enabling u mcreds c = do + c' <- setRemoteCredPair c (AWS.creds u) mcreds + glacierSetup' (isJust mu) u c' +glacierSetup' :: Bool -> UUID -> RemoteConfig -> Annex (RemoteConfig, UUID) +glacierSetup' enabling u c = do c' <- encryptionSetup c let fullconfig = c' `M.union` defaults unless enabling $ genVault fullconfig u gitConfigSpecialRemote u fullconfig "glacier" "true" - c'' <- setRemoteCredPair fullconfig (AWS.creds u) mcreds - return (c'', u) + return (c', u) where remotename = fromJust (M.lookup "name" c) defvault = remotename ++ "-" ++ fromUUID u diff --git a/Remote/S3.hs b/Remote/S3.hs index b217892e7..c1a99abcd 100644 --- a/Remote/S3.hs +++ b/Remote/S3.hs @@ -76,9 +76,10 @@ gen r u c gc = new <$> remoteCost gc expensiveRemoteCost s3Setup :: Maybe UUID -> Maybe CredPair -> RemoteConfig -> Annex (RemoteConfig, UUID) s3Setup mu mcreds c = do u <- maybe (liftIO genUUID) return mu - s3Setup' u mcreds c -s3Setup' :: UUID -> Maybe CredPair -> RemoteConfig -> Annex (RemoteConfig, UUID) -s3Setup' u mcreds c = if isIA c then archiveorg else defaulthost + c' <- setRemoteCredPair c (AWS.creds u) mcreds + s3Setup' u c' +s3Setup' :: UUID -> RemoteConfig -> Annex (RemoteConfig, UUID) +s3Setup' u c = if isIA c then archiveorg else defaulthost where remotename = fromJust (M.lookup "name" c) defbucket = remotename ++ "-" ++ fromUUID u @@ -92,8 +93,7 @@ s3Setup' u mcreds c = if isIA c then archiveorg else defaulthost use fullconfig = do gitConfigSpecialRemote u fullconfig "s3" "true" - c' <- setRemoteCredPair fullconfig (AWS.creds u) mcreds - return (c', u) + return (fullconfig, u) defaulthost = do c' <- encryptionSetup c diff --git a/Remote/WebDAV.hs b/Remote/WebDAV.hs index 6ce83470b..4714f10dd 100644 --- a/Remote/WebDAV.hs +++ b/Remote/WebDAV.hs @@ -82,10 +82,10 @@ webdavSetup mu mcreds c = do let url = fromMaybe (error "Specify url=") $ M.lookup "url" c c' <- encryptionSetup c - creds <- getCreds c' u + creds <- maybe (getCreds c' u) (return . Just) mcreds testDav url creds gitConfigSpecialRemote u c' "webdav" "true" - c'' <- setRemoteCredPair c' (davCreds u) mcreds + c'' <- setRemoteCredPair c' (davCreds u) creds return (c'', u) store :: Remote -> Key -> AssociatedFile -> MeterUpdate -> Annex Bool diff --git a/debian/changelog b/debian/changelog index 8c157aeb2..681747ff9 100644 --- a/debian/changelog +++ b/debian/changelog @@ -15,6 +15,8 @@ git-annex (5.20140222) UNRELEASED; urgency=medium * metadata: Field names limited to alphanumerics and a few whitelisted punctuation characters to avoid issues with views, etc. * metadata: Support --json + * webapp: Fix creation of box.com and Amazon S3 and Glacier + repositories, broken in 5.20140221. -- Joey Hess Fri, 21 Feb 2014 13:03:04 -0400 -- cgit v1.2.3 From 778860aab07fbf68cd9cde072b7cb1eb614b61d3 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawkQafKy7hNSEolLs6TvbgUnkklTctUY9LI" Date: Mon, 24 Feb 2014 19:39:12 +0000 Subject: Added a comment: sounds good --- .../comment_2_69fba53035ebea213ae1c11be5326690._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/bugs/pages_of_packfile_errors/comment_2_69fba53035ebea213ae1c11be5326690._comment diff --git a/doc/bugs/pages_of_packfile_errors/comment_2_69fba53035ebea213ae1c11be5326690._comment b/doc/bugs/pages_of_packfile_errors/comment_2_69fba53035ebea213ae1c11be5326690._comment new file mode 100644 index 000000000..facae6496 --- /dev/null +++ b/doc/bugs/pages_of_packfile_errors/comment_2_69fba53035ebea213ae1c11be5326690._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkQafKy7hNSEolLs6TvbgUnkklTctUY9LI" + nickname="Zellyn" + subject="sounds good" + date="2014-02-24T19:39:12Z" + content=""" +Is it normal for the same error to repeat thousands of times like that in the log? +"""]] -- cgit v1.2.3 From 1587a10c8a2ce9a573fcf1c286649404f365506b Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 24 Feb 2014 18:21:51 -0400 Subject: webdav: When built with DAV 0.6.0, use the new DAV monad to avoid locking files, which is not needed by git-annex's use of webdav, and does not work on Box.com. --- Remote/WebDAV.hs | 204 ++++++++++++++-------- debian/changelog | 3 + doc/bugs/Creating_a_box.com_repository_fails.mdwn | 4 + git-annex.cabal | 2 +- 4 files changed, 136 insertions(+), 77 deletions(-) diff --git a/Remote/WebDAV.hs b/Remote/WebDAV.hs index 4714f10dd..de1b721c9 100644 --- a/Remote/WebDAV.hs +++ b/Remote/WebDAV.hs @@ -16,6 +16,7 @@ import qualified Data.ByteString.Lazy.UTF8 as L8 import qualified Data.ByteString.Lazy as L import Network.URI (normalizePathSegments) import qualified Control.Exception as E +import qualified Control.Exception.Lifted as EL import Network.HTTP.Conduit (HttpException(..)) import Network.HTTP.Types import System.IO.Error @@ -105,7 +106,7 @@ storeEncrypted r (cipher, enck) k p = metered (Just p) k $ \meterupdate -> storeHelper :: Remote -> Key -> DavUrl -> DavUser -> DavPass -> L.ByteString -> IO Bool storeHelper r k baseurl user pass b = catchBoolIO $ do - davMkdir tmpurl user pass + mkdirRecursiveDAV tmpurl user pass storeChunks k tmpurl keyurl chunksize storer recorder finalizer where tmpurl = tmpLocation baseurl k @@ -114,11 +115,10 @@ storeHelper r k baseurl user pass b = catchBoolIO $ do storer urls = storeChunked chunksize urls storehttp b recorder url s = storehttp url (L8.fromString s) finalizer srcurl desturl = do - void $ catchMaybeHttp (deleteContent desturl user pass) - davMkdir (urlParent desturl) user pass - moveContent srcurl (B8.fromString desturl) user pass - storehttp url v = putContent url user pass - (contentType, v) + void $ tryNonAsync (deleteDAV desturl user pass) + mkdirRecursiveDAV (urlParent desturl) user pass + moveDAV srcurl desturl user pass + storehttp url = putDAV url user pass retrieveCheap :: Remote -> Key -> FilePath -> Annex Bool retrieveCheap _ _ _ = return False @@ -128,7 +128,7 @@ retrieve r k _f d p = metered (Just p) k $ \meterupdate -> davAction r False $ \(baseurl, user, pass) -> liftIO $ catchBoolIO $ withStoredFiles r k baseurl user pass onerr $ \urls -> do meteredWriteFileChunks meterupdate d urls $ \url -> do - mb <- davGetUrlContent url user pass + mb <- getDAV url user pass case mb of Nothing -> throwIO "download failed" Just b -> return b @@ -148,7 +148,7 @@ retrieveEncrypted r (cipher, enck) k d p = metered (Just p) k $ \meterupdate -> feeder _ _ [] _ = noop feeder user pass (url:urls) h = do - mb <- davGetUrlContent url user pass + mb <- getDAV url user pass case mb of Nothing -> throwIO "download failed" Just b -> do @@ -160,7 +160,7 @@ remove r k = davAction r False $ \(baseurl, user, pass) -> liftIO $ do -- Delete the key's whole directory, including any chunked -- files, etc, in a single action. let url = davLocation baseurl k - isJust <$> catchMaybeHttp (deleteContent url user pass) + isJust . eitherToMaybe <$> tryNonAsync (deleteDAV url user pass) checkPresent :: Remote -> Key -> Annex (Either String Bool) checkPresent r k = davAction r noconn go @@ -173,7 +173,7 @@ checkPresent r k = davAction r noconn go where check [] = return $ Right True check (url:urls) = do - v <- davUrlExists url user pass + v <- existsDAV url user pass if v == Right True then check urls else return v @@ -182,7 +182,7 @@ checkPresent r k = davAction r noconn go - or if there's a problem accessing it, - or perhaps this was an intermittent error. -} onerr url = do - v <- davUrlExists url user pass + v <- existsDAV url user pass return $ if v == Right True then Left $ "failed to read " ++ url else v @@ -199,11 +199,11 @@ withStoredFiles withStoredFiles r k baseurl user pass onerr a | isJust $ chunkSize $ config r = do let chunkcount = keyurl ++ chunkCount - v <- davGetUrlContent chunkcount user pass + v <- getDAV chunkcount user pass case v of Just s -> a $ listChunks keyurl $ L8.toString s Nothing -> do - chunks <- probeChunks keyurl $ \u -> (== Right True) <$> davUrlExists u user pass + chunks <- probeChunks keyurl $ \u -> (== Right True) <$> existsDAV u user pass if null chunks then onerr chunkcount else a chunks @@ -244,33 +244,12 @@ tmpLocation baseurl k = addTrailingPathSeparator $ davUrl :: DavUrl -> FilePath -> DavUrl davUrl baseurl file = baseurl file -davUrlExists :: DavUrl -> DavUser -> DavPass -> IO (Either String Bool) -davUrlExists url user pass = decode <$> catchHttp get - where - decode (Right _) = Right True -#if ! MIN_VERSION_http_conduit(1,9,0) - decode (Left (Left (StatusCodeException status _))) -#else - decode (Left (Left (StatusCodeException status _ _))) -#endif - | statusCode status == statusCode notFound404 = Right False - decode (Left e) = Left $ showEitherException e -#if ! MIN_VERSION_DAV(0,4,0) - get = getProps url user pass -#else - get = getProps url user pass Nothing -#endif - -davGetUrlContent :: DavUrl -> DavUser -> DavPass -> IO (Maybe L.ByteString) -davGetUrlContent url user pass = fmap (snd . snd) <$> - catchMaybeHttp (getPropsAndContent url user pass) - {- Creates a directory in WebDAV, if not already present; also creating - any missing parent directories. -} -davMkdir :: DavUrl -> DavUser -> DavPass -> IO () -davMkdir url user pass = go url +mkdirRecursiveDAV :: DavUrl -> DavUser -> DavPass -> IO () +mkdirRecursiveDAV url user pass = go url where - make u = makeCollection u user pass + make u = mkdirDAV u user pass go u = do r <- E.try (make u) :: IO (Either E.SomeException Bool) @@ -287,35 +266,6 @@ davMkdir url user pass = go url - to use this directory will fail. -} Left _ -> return () -{- Catches HTTP and IO exceptions. -} -catchMaybeHttp :: IO a -> IO (Maybe a) -catchMaybeHttp a = (Just <$> a) `E.catches` - [ E.Handler $ \(_e :: HttpException) -> return Nothing - , E.Handler $ \(_e :: E.IOException) -> return Nothing - ] - -{- Catches HTTP and IO exceptions -} -catchHttp :: IO a -> IO (Either EitherException a) -catchHttp a = (Right <$> a) `E.catches` - [ E.Handler $ \(e :: HttpException) -> return $ Left $ Left e - , E.Handler $ \(e :: E.IOException) -> return $ Left $ Right e - ] - -type EitherException = Either HttpException E.IOException - -showEitherException :: EitherException -> String -#if ! MIN_VERSION_http_conduit(1,9,0) -showEitherException (Left (StatusCodeException status _)) = -#else -showEitherException (Left (StatusCodeException status _ _)) = -#endif - show $ statusMessage status -showEitherException (Left httpexception) = show httpexception -showEitherException (Right ioexception) = show ioexception - -throwIO :: String -> IO a -throwIO msg = ioError $ mkIOError userErrorType msg Nothing Nothing - urlParent :: DavUrl -> DavUrl urlParent url = dropTrailingPathSeparator $ normalizePathSegments (dropTrailingPathSeparator url ++ "/..") @@ -326,25 +276,20 @@ urlParent url = dropTrailingPathSeparator $ testDav :: String -> Maybe CredPair -> Annex () testDav baseurl (Just (u, p)) = do showSideAction "testing WebDAV server" - test "make directory" $ davMkdir baseurl user pass - test "write file" $ putContent testurl user pass - (contentType, L.empty) - test "delete file" $ deleteContent testurl user pass + test "make directory" $ mkdirRecursiveDAV baseurl user pass + test "write file" $ putDAV testurl user pass L.empty + test "delete file" $ deleteDAV testurl user pass where test desc a = liftIO $ - either (\e -> throwIO $ "WebDAV failed to " ++ desc ++ ": " ++ showEitherException e) + either (\e -> throwIO $ "WebDAV failed to " ++ desc ++ ": " ++ show e) (const noop) - =<< catchHttp a + =<< tryNonAsync a user = toDavUser u pass = toDavPass p testurl = davUrl baseurl "git-annex-test" testDav _ Nothing = error "Need to configure webdav username and password." -{- Content-Type to use for files uploaded to WebDAV. -} -contentType :: Maybe B8.ByteString -contentType = Just $ B8.fromString "application/octet-stream" - getCreds :: RemoteConfig -> UUID -> Annex (Maybe CredPair) getCreds c u = getRemoteCredPairFor "webdav" c (davCreds u) @@ -354,3 +299,110 @@ davCreds u = CredPairStorage , credPairEnvironment = ("WEBDAV_USERNAME", "WEBDAV_PASSWORD") , credPairRemoteKey = Just "davcreds" } + +{- Content-Type to use for files uploaded to WebDAV. -} +contentType :: Maybe B8.ByteString +contentType = Just $ B8.fromString "application/octet-stream" + +throwIO :: String -> IO a +throwIO msg = ioError $ mkIOError userErrorType msg Nothing Nothing + +{--------------------------------------------------------------------- + - Low-level DAV operations, using the new DAV monad when available. + ---------------------------------------------------------------------} + +putDAV :: DavUrl -> DavUser -> DavPass -> L.ByteString -> IO () +putDAV url user pass b = +#if MIN_VERSION_DAV(0,6,0) + goDAV url user pass $ putContentM (contentType, b) +#else + putContent url user pass (contentType, b) +#endif + +getDAV :: DavUrl -> DavUser -> DavPass -> IO (Maybe L.ByteString) +getDAV url user pass = eitherToMaybe <$> tryNonAsync go + where +#if MIN_VERSION_DAV(0,6,0) + go = goDAV url user pass $ snd <$> getContentM +#else + go = snd . snd <$> getPropsAndContent url user pass +#endif + +deleteDAV :: DavUrl -> DavUser -> DavPass -> IO () +deleteDAV url user pass = +#if MIN_VERSION_DAV(0,6,0) + goDAV url user pass delContentM +#else + deleteContent url user pass +#endif + +moveDAV :: DavUrl -> DavUrl -> DavUser -> DavPass -> IO () +moveDAV url newurl user pass = +#if MIN_VERSION_DAV(0,6,0) + goDAV url user pass $ moveContentM newurl' +#else + moveContent url newurl' user pass +#endif + where + newurl' = B8.fromString newurl + +mkdirDAV :: DavUrl -> DavUser -> DavPass -> IO Bool +mkdirDAV url user pass = +#if MIN_VERSION_DAV(0,6,0) + goDAV url user pass mkCol +#else + makeCollection url user pass +#endif + +existsDAV :: DavUrl -> DavUser -> DavPass -> IO (Either String Bool) +existsDAV url user pass = either onerr id <$> tryNonAsync check + where +#if MIN_VERSION_DAV(0,6,0) + check = goDAV url user pass $ do + setDepth Nothing + EL.catchJust + (matchStatusCodeException notFound404) + (getPropsM >> ispresent True) + (const $ ispresent False) +#else + check = E.catchJust + (matchStatusCodeException notFound404) +#if ! MIN_VERSION_DAV(0,4,0) + (getProps url user pass >> ispresent True) +#else + (getProps url user pass Nothing >> ispresent True) +#endif + (const $ ispresent False) +#endif + ispresent = return . Right + {- This is a horrible hack, it seems that the type of the + - HttpException gets screwed up with DAV 0.6.x, and so + - I'm reduced to string matching. :( -} + onerr e + | "StatusCodeException" `isInfixOf` show e + && "statusCode = 404" `isInfixOf` show e = Right False + | otherwise = Left (show e) + +matchStatusCodeException :: Status -> HttpException -> Maybe () +#if ! MIN_VERSION_http_conduit(1,9,0) +matchStatusCodeException want (StatusCodeException s _) +#else +matchStatusCodeException want (StatusCodeException s _ _) +#endif + | s == want = Just () + | otherwise = Nothing +matchStatusCodeException _ _ = Nothing + +#if MIN_VERSION_DAV(0,6,0) +goDAV :: DavUrl -> DavUser -> DavPass -> DAVT IO a -> IO a +goDAV url user pass a = choke $ evalDAVT url $ do + setCreds user pass + a + where + choke :: IO (Either String a) -> IO a + choke f = do + x <- f + case x of + Left e -> error e + Right r -> return r +#endif diff --git a/debian/changelog b/debian/changelog index 681747ff9..a9f2350b6 100644 --- a/debian/changelog +++ b/debian/changelog @@ -17,6 +17,9 @@ git-annex (5.20140222) UNRELEASED; urgency=medium * metadata: Support --json * webapp: Fix creation of box.com and Amazon S3 and Glacier repositories, broken in 5.20140221. + * webdav: When built with DAV 0.6.0, use the new DAV monad to avoid + locking files, which is not needed by git-annex's use of webdav, and + does not work on Box.com. -- Joey Hess Fri, 21 Feb 2014 13:03:04 -0400 diff --git a/doc/bugs/Creating_a_box.com_repository_fails.mdwn b/doc/bugs/Creating_a_box.com_repository_fails.mdwn index 75d59c9bc..ecebd7a00 100644 --- a/doc/bugs/Creating_a_box.com_repository_fails.mdwn +++ b/doc/bugs/Creating_a_box.com_repository_fails.mdwn @@ -35,3 +35,7 @@ ubuntu 13.10 (saucy), i686 > Seems that [DAV-0.6 is badly broken](http://bugs.debian.org/737902). > I have adjusted the cabal file to refuse to build with that broken > version. +> +>> Update: Had to work around additional breakage in DAV-0.6. It's +>> fully tested and working now, although not yet uploaded to Debian +>> unstable. [[done]] --[[Joey]] diff --git a/git-annex.cabal b/git-annex.cabal index 8e3f3f388..a2e082cdc 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -133,7 +133,7 @@ Executable git-annex if flag(WebDAV) Build-Depends: DAV ((>= 0.3 && < 0.6) || > 0.6), - http-conduit, xml-conduit, http-types + http-conduit, xml-conduit, http-types, lifted-base CPP-Options: -DWITH_WEBDAV if flag(Assistant) && ! os(solaris) -- cgit v1.2.3 From 6abebf61806d6fade04dc5c0b46da5d65dcd632d Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 24 Feb 2014 18:48:27 -0400 Subject: devblog --- .../day_121__special_remote_maintenance.mdwn | 23 ++++++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 doc/devblog/day_121__special_remote_maintenance.mdwn diff --git a/doc/devblog/day_121__special_remote_maintenance.mdwn b/doc/devblog/day_121__special_remote_maintenance.mdwn new file mode 100644 index 000000000..551704885 --- /dev/null +++ b/doc/devblog/day_121__special_remote_maintenance.mdwn @@ -0,0 +1,23 @@ +Turns out that in the last release I broke making box.com, Amazon S3 and +Glacier remotes from the webapp. Fixed that. + +Also, dealt with changes in the haskell DAV library that broke support for +box.com, and worked around an exception handling bug in the library. + +I think I should try to enhance the test suite so it can run live tests +on special remotes, which would at least have caught the some of these +recent problems... + +---- + +Since metadata is tied to a particular key, editing an annexed file, +which causes the key to change, made the metadata seem to get lost. + +I've now fixed this; it copies the metadata from the old version to the new +one. (Taking care to copy the log file identically, so git can reuse its +blob.) + +That meant that `git annex add` has to check every file it adds to see if +there's an old version. Happily, that check is fairly fast; I benchmarked my +laptop running 2500 such checks a second. So it's not going to slow things +down appreciably. -- cgit v1.2.3 From d2ff4e12d0ae0d32263e258ea7c2a283d1d92149 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 24 Feb 2014 19:36:58 -0400 Subject: repair: Optimise unpacking of pack files, and avoid repeated error messages about corrupt pack files. --- Git/Repair.hs | 41 ++++++++++++++++++++++++++--------------- debian/changelog | 2 ++ 2 files changed, 28 insertions(+), 15 deletions(-) diff --git a/Git/Repair.hs b/Git/Repair.hs index 96da5ffe7..cdd70329d 100644 --- a/Git/Repair.hs +++ b/Git/Repair.hs @@ -75,24 +75,35 @@ removeLoose r s = do return True else return False +{- Explodes all pack files, and deletes them. + - + - First moves all pack files to a temp dir, before unpacking them each in + - turn. + - + - This is because unpack-objects will not unpack a pack file if it's in the + - git repo. + - + - Also, this prevents unpack-objects from possibly looking at corrupt + - pack files to see if they contain an object, while unpacking a + - non-corrupt pack file. + -} explodePacks :: Repo -> IO Bool -explodePacks r = do - packs <- listPackFiles r - if null packs - then return False - else do - putStrLn "Unpacking all pack files." - mapM_ go packs - return True +explodePacks r = go =<< listPackFiles r where - go packfile = withTmpFileIn (localGitDir r) "pack" $ \tmp _ -> do - moveFile packfile tmp - nukeFile $ packIdxFile packfile - allowRead tmp - -- May fail, if pack file is corrupt. - void $ tryIO $ - pipeWrite [Param "unpack-objects", Param "-r"] r $ \h -> + go [] = return False + go packs = withTmpDir "packs" $ \tmpdir -> do + putStrLn "Unpacking all pack files." + forM_ packs $ \packfile -> do + moveFile packfile (tmpdir takeFileName packfile) + nukeFile $ packIdxFile packfile + forM_ packs $ \packfile -> do + let tmp = tmpdir takeFileName packfile + allowRead tmp + -- May fail, if pack file is corrupt. + void $ tryIO $ + pipeWrite [Param "unpack-objects", Param "-r"] r $ \h -> L.hPut h =<< L.readFile tmp + return True {- Try to retrieve a set of missing objects, from the remotes of a - repository. Returns any that could not be retreived. diff --git a/debian/changelog b/debian/changelog index a9f2350b6..7d0a186fd 100644 --- a/debian/changelog +++ b/debian/changelog @@ -20,6 +20,8 @@ git-annex (5.20140222) UNRELEASED; urgency=medium * webdav: When built with DAV 0.6.0, use the new DAV monad to avoid locking files, which is not needed by git-annex's use of webdav, and does not work on Box.com. + * repair: Optimise unpacking of pack files, and avoid repeated error + messages about corrupt pack files. -- Joey Hess Fri, 21 Feb 2014 13:03:04 -0400 -- cgit v1.2.3 From 2ff89781cb33f80df85191b25e014afa00fe6b3b Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Mon, 24 Feb 2014 23:39:47 +0000 Subject: Added a comment --- ...ent_3_73b9f574e8ce36d5e0d0f6c6a89006b7._comment | 39 ++++++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 doc/bugs/pages_of_packfile_errors/comment_3_73b9f574e8ce36d5e0d0f6c6a89006b7._comment diff --git a/doc/bugs/pages_of_packfile_errors/comment_3_73b9f574e8ce36d5e0d0f6c6a89006b7._comment b/doc/bugs/pages_of_packfile_errors/comment_3_73b9f574e8ce36d5e0d0f6c6a89006b7._comment new file mode 100644 index 000000000..f0e6bce0b --- /dev/null +++ b/doc/bugs/pages_of_packfile_errors/comment_3_73b9f574e8ce36d5e0d0f6c6a89006b7._comment @@ -0,0 +1,39 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.172" + subject="comment 3" + date="2014-02-24T23:39:46Z" + content=""" +Well, if there's a bug here, it might be that this particular problem has caused the repair process to loop repeatedly trying to unpack a pack file. +I don't see how that could happen, looking at the code it will try to unpack each pack file only once. + +If you run `git annex repair --debug`, you can see the git commands it runs, and so see if it's somehow looping. When I do this with some corrupt pack files (actually, I swapped one pack file for another one), I see, for example: + +
+[2014-02-24 19:11:42 JEST] feed: git [\"--git-dir=/home/joey/tmp/git/.git\",\"--work-tree=/home/joey/tmp/git\",\"unpack-objects\",\"-r\"]
+error: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack claims to have 862 objects while index indicates 1431 objects
+warning: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack cannot be accessed
+error: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack claims to have 862 objects while index indicates 1431 objects
+warning: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack cannot be accessed
+error: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack claims to have 862 objects while index indicates 1431 objects
+...
+
+ +Which shows that git-annex only ran `git unpack-objects -r` once, and yet it printed out the same error repeatedly. + +One possibility is a problem using `-r`, which makes it keep going on errors. Which seemed like a good idea at the time to unpack as much as possible from a damaged file. It might be that `git unpack-objects` is itself getting stuck in some kind of loop with the -r. + +In my case, it did not get stuck; it eventually quit and it moved on to the next pack file, after 900-some repitions of the error message: + +
+[2014-02-24 19:16:47 JEST] feed: git [\"--git-dir=/home/joey/tmp/git/.git\",\"--work-tree=/home/joey/tmp/git\",\"unpack-objects\",\"-r\"]
+error: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack claims to have 862 objects while index indicates 1431 objects
+warning: packfile /home/joey/tmp/git/.git/objects/pack/pack-857c07e35d98e8f063fdae6846d1f6f7453e1312.pack cannot be accessed
+
+ +Intesting that it's again complaining about the same pack file, despite having moved from one pack file on to the next one. I think what's going on here is while unpacking pack files A..Y (which may all be fine), it's checking pack file Z, which is corrupt, to see if the objects exist in it, and complaining each time. + +So, I can improve this a lot by moving *all* the pack files out of the way before trying to unpack any of them. In my test case, that completely eliminated the errors, and probably also sped it up a bit. + +If I were you, I'd either try stopping your running git-annex and run `git annex repair --debug` and analize the log like I did above, or get the next daily build which has that change, and see if it helps in your case. +"""]] -- cgit v1.2.3 From 18f3294f19a45c14a292a78036e03c7275b8b515 Mon Sep 17 00:00:00 2001 From: EskildHustvedt Date: Tue, 25 Feb 2014 00:09:47 +0000 Subject: --- .../Auto-repair_greatly_slows_down_the_machine.mdwn | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 doc/bugs/Auto-repair_greatly_slows_down_the_machine.mdwn diff --git a/doc/bugs/Auto-repair_greatly_slows_down_the_machine.mdwn b/doc/bugs/Auto-repair_greatly_slows_down_the_machine.mdwn new file mode 100644 index 000000000..58d436898 --- /dev/null +++ b/doc/bugs/Auto-repair_greatly_slows_down_the_machine.mdwn @@ -0,0 +1,19 @@ +### Please describe the problem. + +The assistant regulary ends up trying to perform repair (I don't know why, it happens fairly often, once a week or so). When it does so, it ends up creating a huge (2.4G) .git/objects directory, and a git prune-packed process uses so much I/O the machine really slows down. + +### What steps will reproduce the problem? + +I don't have any reliable way to reproduce it. The repository ends up being attempted to be repaired around once a week. This week the repair (and the slowdown) also happened on a second computer. + +### What version of git-annex are you using? On what operating system? + +git-annex version: 5.20140221-gbdfc8e1 (using the standalone 64bit builds) + +This is on an up-to-date Arch Linux. It also happened on Fedora 20. + +### Please provide any additional information below. + +The daemon.log is fairly long, but not particulary interesting: [[https://ssl.zerodogg.org/~zerodogg/private/tmp/daemon.log-2014-02-25.1]] + +The «resource vanished (Broken pipe)» at the end is the result of me killing the prune-packed in order to be able to use the machine again. -- cgit v1.2.3 From b6fb2e3a3384d2b9a92002aaf26c0c0d535ee4d3 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 24 Feb 2014 21:29:37 -0400 Subject: Make annex.web-options be used in several places that call curl. --- Annex/Content.hs | 5 ++--- Command/AddUrl.hs | 12 ++++++------ Config.hs | 17 ++++++++++------- Remote/Git.hs | 16 ++++++++-------- Remote/Web.hs | 4 ++-- Utility/Url.hs | 16 ++++++++-------- debian/changelog | 1 + 7 files changed, 37 insertions(+), 34 deletions(-) diff --git a/Annex/Content.hs b/Annex/Content.hs index bffef19f4..60edb4975 100644 --- a/Annex/Content.hs +++ b/Annex/Content.hs @@ -515,9 +515,8 @@ downloadUrl :: [Url.URLString] -> FilePath -> Annex Bool downloadUrl urls file = go =<< annexWebDownloadCommand <$> Annex.getGitConfig where go Nothing = do - opts <- map Param . annexWebOptions <$> Annex.getGitConfig - headers <- getHttpHeaders - anyM (\u -> Url.withUserAgent $ Url.download u headers opts file) urls + (headers, options) <- getHttpHeadersOptions + anyM (\u -> Url.withUserAgent $ Url.download u headers options file) urls go (Just basecmd) = liftIO $ anyM (downloadcmd basecmd) urls downloadcmd basecmd url = boolSystem "sh" [Param "-c", Param $ gencmd url basecmd] diff --git a/Command/AddUrl.hs b/Command/AddUrl.hs index 82b04f07b..da4da414f 100644 --- a/Command/AddUrl.hs +++ b/Command/AddUrl.hs @@ -134,8 +134,8 @@ perform relaxed url file = ifAnnexed file addurl geturl setUrlPresent key url next $ return True | otherwise = do - headers <- getHttpHeaders - (exists, samesize) <- Url.withUserAgent $ Url.check url headers $ keySize key + (headers, options) <- getHttpHeadersOptions + (exists, samesize) <- Url.withUserAgent $ Url.check url headers options (keySize key) if exists && samesize then do setUrlPresent key url @@ -192,8 +192,8 @@ download url file = do -} addSizeUrlKey :: URLString -> Key -> Annex Key addSizeUrlKey url key = do - headers <- getHttpHeaders - size <- snd <$> Url.withUserAgent (Url.exists url headers) + (headers, options) <- getHttpHeadersOptions + size <- snd <$> Url.withUserAgent (Url.exists url headers options) return $ key { keySize = size } cleanup :: URLString -> FilePath -> Key -> Maybe FilePath -> Annex Bool @@ -212,10 +212,10 @@ cleanup url file key mtmp = do nodownload :: Bool -> URLString -> FilePath -> Annex Bool nodownload relaxed url file = do - headers <- getHttpHeaders + (headers, options) <- getHttpHeadersOptions (exists, size) <- if relaxed then pure (True, Nothing) - else Url.withUserAgent $ Url.exists url headers + else Url.withUserAgent $ Url.exists url headers options if exists then do key <- Backend.URL.fromUrl url size diff --git a/Config.hs b/Config.hs index 376a3a488..1510f7a74 100644 --- a/Config.hs +++ b/Config.hs @@ -80,10 +80,13 @@ setCrippledFileSystem b = do setConfig (annexConfig "crippledfilesystem") (Git.Config.boolConfig b) Annex.changeGitConfig $ \c -> c { annexCrippledFileSystem = b } -{- Gets the http headers to use. -} -getHttpHeaders :: Annex [String] -getHttpHeaders = do - v <- annexHttpHeadersCommand <$> Annex.getGitConfig - case v of - Just cmd -> lines <$> liftIO (readProcess "sh" ["-c", cmd]) - Nothing -> annexHttpHeaders <$> Annex.getGitConfig +{- Gets the http headers to use, and any configured command-line options. -} +getHttpHeadersOptions :: Annex ([String], [CommandParam]) +getHttpHeadersOptions = (,) <$> headers <*> options + where + headers = do + v <- annexHttpHeadersCommand <$> Annex.getGitConfig + case v of + Just cmd -> lines <$> liftIO (readProcess "sh" ["-c", cmd]) + Nothing -> annexHttpHeaders <$> Annex.getGitConfig + options = map Param . annexWebOptions <$> Annex.getGitConfig diff --git a/Remote/Git.hs b/Remote/Git.hs index d714cfec5..f3aa2b7f1 100644 --- a/Remote/Git.hs +++ b/Remote/Git.hs @@ -158,9 +158,7 @@ tryGitConfigRead r | haveconfig r' -> return r' | otherwise -> configlist_failed Left _ -> configlist_failed - | Git.repoIsHttp r = do - headers <- getHttpHeaders - store $ geturlconfig headers + | Git.repoIsHttp r = store geturlconfig | Git.GCrypt.isEncrypted r = handlegcrypt =<< getConfigMaybe (remoteConfig r "uuid") | Git.repoIsUrl r = return r | otherwise = store $ safely $ onLocal r $ do @@ -185,11 +183,12 @@ tryGitConfigRead r return $ Right r' Left l -> return $ Left l - geturlconfig headers = do + geturlconfig = do + (headers, options) <- getHttpHeadersOptions ua <- Url.getUserAgent v <- liftIO $ withTmpFile "git-annex.tmp" $ \tmpfile h -> do hClose h - ifM (Url.downloadQuiet (Git.repoLocation r ++ "/config") headers [] tmpfile ua) + ifM (Url.downloadQuiet (Git.repoLocation r ++ "/config") headers options tmpfile ua) ( pipedconfig "git" [Param "config", Param "--null", Param "--list", Param "--file", File tmpfile] , return $ Left undefined ) @@ -255,14 +254,15 @@ tryGitConfigRead r -} inAnnex :: Remote -> Key -> Annex (Either String Bool) inAnnex rmt key - | Git.repoIsHttp r = checkhttp =<< getHttpHeaders + | Git.repoIsHttp r = checkhttp | Git.repoIsUrl r = checkremote | otherwise = checklocal where r = repo rmt - checkhttp headers = do + checkhttp = do showChecking r - ifM (anyM (\u -> Url.withUserAgent $ Url.checkBoth u headers (keySize key)) (keyUrls rmt key)) + (headers, options) <- getHttpHeadersOptions + ifM (anyM (\u -> Url.withUserAgent $ Url.checkBoth u headers options (keySize key)) (keyUrls rmt key)) ( return $ Right True , return $ Left "not found" ) diff --git a/Remote/Web.hs b/Remote/Web.hs index 2863d9d5e..d41b12b6a 100644 --- a/Remote/Web.hs +++ b/Remote/Web.hs @@ -117,9 +117,9 @@ checkKey' key us = firsthit us (Right False) $ \u -> do return $ Left "quvi support needed for this url" #endif DefaultDownloader -> do - headers <- getHttpHeaders + (headers, options) <- getHttpHeadersOptions Url.withUserAgent $ catchMsgIO . - Url.checkBoth u' headers (keySize key) + Url.checkBoth u' headers options (keySize key) where firsthit [] miss _ = return miss firsthit (u:rest) _ a = do diff --git a/Utility/Url.hs b/Utility/Url.hs index 2cbab77c8..49f25c371 100644 --- a/Utility/Url.hs +++ b/Utility/Url.hs @@ -34,12 +34,12 @@ type UserAgent = String {- Checks that an url exists and could be successfully downloaded, - also checking that its size, if available, matches a specified size. -} -checkBoth :: URLString -> Headers -> Maybe Integer -> Maybe UserAgent -> IO Bool -checkBoth url headers expected_size ua = do - v <- check url headers expected_size ua +checkBoth :: URLString -> Headers -> [CommandParam] -> Maybe Integer -> Maybe UserAgent -> IO Bool +checkBoth url headers options expected_size ua = do + v <- check url headers options expected_size ua return (fst v && snd v) -check :: URLString -> Headers -> Maybe Integer -> Maybe UserAgent -> IO (Bool, Bool) -check url headers expected_size = handle <$$> exists url headers +check :: URLString -> Headers -> [CommandParam] -> Maybe Integer -> Maybe UserAgent -> IO (Bool, Bool) +check url headers options expected_size = handle <$$> exists url headers options where handle (False, _) = (False, False) handle (True, Nothing) = (True, True) @@ -55,8 +55,8 @@ check url headers expected_size = handle <$$> exists url headers - Uses curl otherwise, when available, since curl handles https better - than does Haskell's Network.Browser. -} -exists :: URLString -> Headers -> Maybe UserAgent -> IO (Bool, Maybe Integer) -exists url headers ua = case parseURIRelaxed url of +exists :: URLString -> Headers -> [CommandParam] -> Maybe UserAgent -> IO (Bool, Maybe Integer) +exists url headers options ua = case parseURIRelaxed url of Just u | uriScheme u == "file:" -> do s <- catchMaybeIO $ getFileStatus (unEscapeString $ uriPath u) @@ -83,7 +83,7 @@ exists url headers ua = case parseURIRelaxed url of , Param "--head" , Param "-L", Param url , Param "-w", Param "%{http_code}" - ] ++ concatMap (\h -> [Param "-H", Param h]) headers + ] ++ concatMap (\h -> [Param "-H", Param h]) headers ++ options extractsize s = case lastMaybe $ filter ("Content-Length:" `isPrefixOf`) (lines s) of Just l -> case lastMaybe $ words l of diff --git a/debian/changelog b/debian/changelog index 7d0a186fd..1e260a424 100644 --- a/debian/changelog +++ b/debian/changelog @@ -22,6 +22,7 @@ git-annex (5.20140222) UNRELEASED; urgency=medium does not work on Box.com. * repair: Optimise unpacking of pack files, and avoid repeated error messages about corrupt pack files. + * Make annex.web-options be used in several places that call curl. -- Joey Hess Fri, 21 Feb 2014 13:03:04 -0400 -- cgit v1.2.3 From 9f9f1decca4a06d81ce97b64ef1a06fda3b8efad Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 24 Feb 2014 22:00:25 -0400 Subject: add UrlOptions sum type --- Annex/Content.hs | 5 ++-- Annex/Url.hs | 25 ++++++++++++++---- Assistant/Threads/Upgrader.hs | 4 +-- Command/AddUrl.hs | 9 +++---- Command/ImportFeed.hs | 4 +-- Config.hs | 11 -------- Remote/Git.hs | 8 +++--- Remote/Web.hs | 6 ++--- Utility/Url.hs | 59 ++++++++++++++++++++++++++----------------- debian/control | 1 + doc/install/fromscratch.mdwn | 1 + git-annex.cabal | 3 ++- 12 files changed, 74 insertions(+), 62 deletions(-) diff --git a/Annex/Content.hs b/Annex/Content.hs index 60edb4975..45e8e9d47 100644 --- a/Annex/Content.hs +++ b/Annex/Content.hs @@ -514,9 +514,8 @@ saveState nocommit = doSideAction $ do downloadUrl :: [Url.URLString] -> FilePath -> Annex Bool downloadUrl urls file = go =<< annexWebDownloadCommand <$> Annex.getGitConfig where - go Nothing = do - (headers, options) <- getHttpHeadersOptions - anyM (\u -> Url.withUserAgent $ Url.download u headers options file) urls + go Nothing = Url.withUrlOptions $ \uo -> + anyM (\u -> Url.download u file uo) urls go (Just basecmd) = liftIO $ anyM (downloadcmd basecmd) urls downloadcmd basecmd url = boolSystem "sh" [Param "-c", Param $ gencmd url basecmd] diff --git a/Annex/Url.hs b/Annex/Url.hs index 0401ffe07..397a7910b 100644 --- a/Annex/Url.hs +++ b/Annex/Url.hs @@ -1,13 +1,15 @@ -{- Url downloading, with git-annex user agent. +{- Url downloading, with git-annex user agent and configured http + - headers and wget/curl options. - - - Copyright 2013 Joey Hess + - Copyright 2013-2014 Joey Hess - - Licensed under the GNU GPL version 3 or higher. -} module Annex.Url ( module U, - withUserAgent, + withUrlOptions, + getUrlOptions, getUserAgent, ) where @@ -23,5 +25,18 @@ getUserAgent :: Annex (Maybe U.UserAgent) getUserAgent = Annex.getState $ Just . fromMaybe defaultUserAgent . Annex.useragent -withUserAgent :: (Maybe U.UserAgent -> IO a) -> Annex a -withUserAgent a = liftIO . a =<< getUserAgent +getUrlOptions :: Annex U.UrlOptions +getUrlOptions = U.UrlOptions + <$> getUserAgent + <*> headers + <*> options + where + headers = do + v <- annexHttpHeadersCommand <$> Annex.getGitConfig + case v of + Just cmd -> lines <$> liftIO (readProcess "sh" ["-c", cmd]) + Nothing -> annexHttpHeaders <$> Annex.getGitConfig + options = map Param . annexWebOptions <$> Annex.getGitConfig + +withUrlOptions :: (U.UrlOptions -> IO a) -> Annex a +withUrlOptions a = liftIO . a =<< getUrlOptions diff --git a/Assistant/Threads/Upgrader.hs b/Assistant/Threads/Upgrader.hs index f0c47e844..60aeec70b 100644 --- a/Assistant/Threads/Upgrader.hs +++ b/Assistant/Threads/Upgrader.hs @@ -89,10 +89,10 @@ canUpgrade urgency urlrenderer d = ifM autoUpgradeEnabled getDistributionInfo :: Assistant (Maybe GitAnnexDistribution) getDistributionInfo = do - ua <- liftAnnex Url.getUserAgent + uo <- liftAnnex Url.getUrlOptions liftIO $ withTmpFile "git-annex.tmp" $ \tmpfile h -> do hClose h - ifM (Url.downloadQuiet distributionInfoUrl [] [] tmpfile ua) + ifM (Url.downloadQuiet distributionInfoUrl tmpfile uo) ( readish <$> readFileStrict tmpfile , return Nothing ) diff --git a/Command/AddUrl.hs b/Command/AddUrl.hs index da4da414f..f45303416 100644 --- a/Command/AddUrl.hs +++ b/Command/AddUrl.hs @@ -134,8 +134,7 @@ perform relaxed url file = ifAnnexed file addurl geturl setUrlPresent key url next $ return True | otherwise = do - (headers, options) <- getHttpHeadersOptions - (exists, samesize) <- Url.withUserAgent $ Url.check url headers options (keySize key) + (exists, samesize) <- Url.withUrlOptions $ Url.check url (keySize key) if exists && samesize then do setUrlPresent key url @@ -192,8 +191,7 @@ download url file = do -} addSizeUrlKey :: URLString -> Key -> Annex Key addSizeUrlKey url key = do - (headers, options) <- getHttpHeadersOptions - size <- snd <$> Url.withUserAgent (Url.exists url headers options) + size <- snd <$> Url.withUrlOptions (Url.exists url) return $ key { keySize = size } cleanup :: URLString -> FilePath -> Key -> Maybe FilePath -> Annex Bool @@ -212,10 +210,9 @@ cleanup url file key mtmp = do nodownload :: Bool -> URLString -> FilePath -> Annex Bool nodownload relaxed url file = do - (headers, options) <- getHttpHeadersOptions (exists, size) <- if relaxed then pure (True, Nothing) - else Url.withUserAgent $ Url.exists url headers options + else Url.withUrlOptions (Url.exists url) if exists then do key <- Backend.URL.fromUrl url size diff --git a/Command/ImportFeed.hs b/Command/ImportFeed.hs index dfa89b344..005d42d20 100644 --- a/Command/ImportFeed.hs +++ b/Command/ImportFeed.hs @@ -121,10 +121,10 @@ findDownloads u = go =<< downloadFeed u downloadFeed :: URLString -> Annex (Maybe Feed) downloadFeed url = do showOutput - ua <- Url.getUserAgent + uo <- Url.getUrlOptions liftIO $ withTmpFile "feed" $ \f h -> do fileEncoding h - ifM (Url.download url [] [] f ua) + ifM (Url.download url f uo) ( parseFeedString <$> hGetContentsStrict h , return Nothing ) diff --git a/Config.hs b/Config.hs index 1510f7a74..10d4fd190 100644 --- a/Config.hs +++ b/Config.hs @@ -79,14 +79,3 @@ setCrippledFileSystem :: Bool -> Annex () setCrippledFileSystem b = do setConfig (annexConfig "crippledfilesystem") (Git.Config.boolConfig b) Annex.changeGitConfig $ \c -> c { annexCrippledFileSystem = b } - -{- Gets the http headers to use, and any configured command-line options. -} -getHttpHeadersOptions :: Annex ([String], [CommandParam]) -getHttpHeadersOptions = (,) <$> headers <*> options - where - headers = do - v <- annexHttpHeadersCommand <$> Annex.getGitConfig - case v of - Just cmd -> lines <$> liftIO (readProcess "sh" ["-c", cmd]) - Nothing -> annexHttpHeaders <$> Annex.getGitConfig - options = map Param . annexWebOptions <$> Annex.getGitConfig diff --git a/Remote/Git.hs b/Remote/Git.hs index f3aa2b7f1..d7385ef31 100644 --- a/Remote/Git.hs +++ b/Remote/Git.hs @@ -184,11 +184,10 @@ tryGitConfigRead r Left l -> return $ Left l geturlconfig = do - (headers, options) <- getHttpHeadersOptions - ua <- Url.getUserAgent + uo <- Url.getUrlOptions v <- liftIO $ withTmpFile "git-annex.tmp" $ \tmpfile h -> do hClose h - ifM (Url.downloadQuiet (Git.repoLocation r ++ "/config") headers options tmpfile ua) + ifM (Url.downloadQuiet (Git.repoLocation r ++ "/config") tmpfile uo) ( pipedconfig "git" [Param "config", Param "--null", Param "--list", Param "--file", File tmpfile] , return $ Left undefined ) @@ -261,8 +260,7 @@ inAnnex rmt key r = repo rmt checkhttp = do showChecking r - (headers, options) <- getHttpHeadersOptions - ifM (anyM (\u -> Url.withUserAgent $ Url.checkBoth u headers options (keySize key)) (keyUrls rmt key)) + ifM (Url.withUrlOptions $ \uo -> anyM (\u -> Url.checkBoth u (keySize key) uo) (keyUrls rmt key)) ( return $ Right True , return $ Left "not found" ) diff --git a/Remote/Web.hs b/Remote/Web.hs index d41b12b6a..ddd1fc1cc 100644 --- a/Remote/Web.hs +++ b/Remote/Web.hs @@ -14,7 +14,6 @@ import Types.Remote import qualified Git import qualified Git.Construct import Annex.Content -import Config import Config.Cost import Logs.Web import Types.Key @@ -117,9 +116,8 @@ checkKey' key us = firsthit us (Right False) $ \u -> do return $ Left "quvi support needed for this url" #endif DefaultDownloader -> do - (headers, options) <- getHttpHeadersOptions - Url.withUserAgent $ catchMsgIO . - Url.checkBoth u' headers options (keySize key) + Url.withUrlOptions $ catchMsgIO . + Url.checkBoth u' (keySize key) where firsthit [] miss _ = return miss firsthit (u:rest) _ a = do diff --git a/Utility/Url.hs b/Utility/Url.hs index 49f25c371..3ab14ebe4 100644 --- a/Utility/Url.hs +++ b/Utility/Url.hs @@ -10,6 +10,7 @@ module Utility.Url ( URLString, UserAgent, + UrlOptions(..), check, checkBoth, exists, @@ -23,6 +24,7 @@ import Network.URI import qualified Network.Browser as Browser import Network.HTTP import Data.Either +import Data.Default import qualified Build.SysConfig @@ -32,14 +34,24 @@ type Headers = [String] type UserAgent = String +data UrlOptions = UrlOptions + { userAgent :: Maybe UserAgent + , reqHeaders :: Headers + , reqParams :: [CommandParam] + } + +instance Default UrlOptions + where + def = UrlOptions Nothing [] [] + {- Checks that an url exists and could be successfully downloaded, - also checking that its size, if available, matches a specified size. -} -checkBoth :: URLString -> Headers -> [CommandParam] -> Maybe Integer -> Maybe UserAgent -> IO Bool -checkBoth url headers options expected_size ua = do - v <- check url headers options expected_size ua +checkBoth :: URLString -> Maybe Integer -> UrlOptions -> IO Bool +checkBoth url expected_size uo = do + v <- check url expected_size uo return (fst v && snd v) -check :: URLString -> Headers -> [CommandParam] -> Maybe Integer -> Maybe UserAgent -> IO (Bool, Bool) -check url headers options expected_size = handle <$$> exists url headers options +check :: URLString -> Maybe Integer -> UrlOptions -> IO (Bool, Bool) +check url expected_size = handle <$$> exists url where handle (False, _) = (False, False) handle (True, Nothing) = (True, True) @@ -55,8 +67,8 @@ check url headers options expected_size = handle <$$> exists url headers options - Uses curl otherwise, when available, since curl handles https better - than does Haskell's Network.Browser. -} -exists :: URLString -> Headers -> [CommandParam] -> Maybe UserAgent -> IO (Bool, Maybe Integer) -exists url headers options ua = case parseURIRelaxed url of +exists :: URLString -> UrlOptions -> IO (Bool, Maybe Integer) +exists url uo = case parseURIRelaxed url of Just u | uriScheme u == "file:" -> do s <- catchMaybeIO $ getFileStatus (unEscapeString $ uriPath u) @@ -70,7 +82,7 @@ exists url headers options ua = case parseURIRelaxed url of Just ('2':_:_) -> return (True, extractsize output) _ -> dne else do - r <- request u headers HEAD ua + r <- request u HEAD uo case rspCode r of (2,_,_) -> return (True, size r) _ -> return (False, Nothing) @@ -78,12 +90,12 @@ exists url headers options ua = case parseURIRelaxed url of where dne = return (False, Nothing) - curlparams = addUserAgent ua $ + curlparams = addUserAgent uo $ [ Param "-s" , Param "--head" , Param "-L", Param url , Param "-w", Param "%{http_code}" - ] ++ concatMap (\h -> [Param "-H", Param h]) headers ++ options + ] ++ concatMap (\h -> [Param "-H", Param h]) (reqHeaders uo) ++ (reqParams uo) extractsize s = case lastMaybe $ filter ("Content-Length:" `isPrefixOf`) (lines s) of Just l -> case lastMaybe $ words l of @@ -94,9 +106,10 @@ exists url headers options ua = case parseURIRelaxed url of size = liftM Prelude.read . lookupHeader HdrContentLength . rspHeaders -- works for both wget and curl commands -addUserAgent :: Maybe UserAgent -> [CommandParam] -> [CommandParam] -addUserAgent Nothing ps = ps -addUserAgent (Just ua) ps = ps ++ [Param "--user-agent", Param ua] +addUserAgent :: UrlOptions -> [CommandParam] -> [CommandParam] +addUserAgent uo ps = case userAgent uo of + Nothing -> ps + Just ua -> ps ++ [Param "--user-agent", Param ua] {- Used to download large files, such as the contents of keys. - @@ -105,15 +118,15 @@ addUserAgent (Just ua) ps = ps ++ [Param "--user-agent", Param ua] - would not be appropriate to test at configure time and build support - for only one in. -} -download :: URLString -> Headers -> [CommandParam] -> FilePath -> Maybe UserAgent -> IO Bool +download :: URLString -> FilePath -> UrlOptions -> IO Bool download = download' False {- No output, even on error. -} -downloadQuiet :: URLString -> Headers -> [CommandParam] -> FilePath -> Maybe UserAgent -> IO Bool +downloadQuiet :: URLString -> FilePath -> UrlOptions -> IO Bool downloadQuiet = download' True -download' :: Bool -> URLString -> Headers -> [CommandParam] -> FilePath -> Maybe UserAgent -> IO Bool -download' quiet url headers options file ua = +download' :: Bool -> URLString -> FilePath -> UrlOptions -> IO Bool +download' quiet url file uo = case parseURIRelaxed url of Just u | uriScheme u == "file:" -> do @@ -124,7 +137,7 @@ download' quiet url headers options file ua = | otherwise -> ifM (inPath "wget") (wget , curl) _ -> return False where - headerparams = map (\h -> Param $ "--header=" ++ h) headers + headerparams = map (\h -> Param $ "--header=" ++ h) (reqHeaders uo) wget = go "wget" $ headerparams ++ quietopt "-q" ++ wgetparams {- Regular wget needs --clobber to continue downloading an existing - file. On Android, busybox wget is used, which does not @@ -142,7 +155,7 @@ download' quiet url headers options file ua = curl = go "curl" $ headerparams ++ quietopt "-s" ++ [Params "-f -L -C - -# -o"] go cmd opts = boolSystem cmd $ - addUserAgent ua $ options++opts++[File file, File url] + addUserAgent uo $ reqParams uo++opts++[File file, File url] quietopt s | quiet = [Param s] | otherwise = [] @@ -157,14 +170,14 @@ download' quiet url headers options file ua = - Unfortunately, does not handle https, so should only be used - when curl is not available. -} -request :: URI -> Headers -> RequestMethod -> Maybe UserAgent -> IO (Response String) -request url headers requesttype ua = go 5 url +request :: URI -> RequestMethod -> UrlOptions -> IO (Response String) +request url requesttype uo = go 5 url where go :: Int -> URI -> IO (Response String) go 0 _ = error "Too many redirects " go n u = do rsp <- Browser.browse $ do - maybe noop Browser.setUserAgent ua + maybe noop Browser.setUserAgent (userAgent uo) Browser.setErrHandler ignore Browser.setOutHandler ignore Browser.setAllowRedirects False @@ -174,7 +187,7 @@ request url headers requesttype ua = go 5 url (3,0,x) | x /= 5 -> redir (n - 1) u rsp _ -> return rsp addheaders req = setHeaders req (rqHeaders req ++ userheaders) - userheaders = rights $ map parseHeader headers + userheaders = rights $ map parseHeader (reqHeaders uo) ignore = const noop redir n u rsp = case retrieveHeaders HdrLocation rsp of [] -> return rsp diff --git a/debian/control b/debian/control index 110d160cb..067c2ab67 100644 --- a/debian/control +++ b/debian/control @@ -6,6 +6,7 @@ Build-Depends: ghc (>= 7.4), libghc-mtl-dev (>= 2.1.1), libghc-missingh-dev, + libghc-data-default-dev, libghc-hslogger-dev, libghc-pcre-light-dev, libghc-sha-dev, diff --git a/doc/install/fromscratch.mdwn b/doc/install/fromscratch.mdwn index 2c8bf4b71..6cc2d90c6 100644 --- a/doc/install/fromscratch.mdwn +++ b/doc/install/fromscratch.mdwn @@ -5,6 +5,7 @@ quite a lot. * [The Haskell Platform](http://haskell.org/platform/) (GHC 7.4 or newer) * [mtl](http://hackage.haskell.org.package/mtl) (2.1.1 or newer) * [MissingH](http://github.com/jgoerzen/missingh/wiki) + * [data-default](http://hackage.haskell.org/package/data-default) * [utf8-string](http://hackage.haskell.org/package/utf8-string) * [SHA](http://hackage.haskell.org/package/SHA) * [cryptohash](http://hackage.haskell.org/package/cryptohash) (optional but recommended) diff --git a/git-annex.cabal b/git-annex.cabal index a2e082cdc..7ba1c7d78 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -93,7 +93,8 @@ Executable git-annex extensible-exceptions, dataenc, SHA, process, json, base (>= 4.5 && < 4.9), monad-control, MonadCatchIO-transformers, IfElse, text, QuickCheck >= 2.1, bloomfilter, edit-distance, process, - SafeSemaphore, uuid, random, dlist, unix-compat, async, stm (>= 2.3) + SafeSemaphore, uuid, random, dlist, unix-compat, async, stm (>= 2.3), + data-default CC-Options: -Wall GHC-Options: -Wall Extensions: PackageImports -- cgit v1.2.3 From 97cf760f1dd70a1bf44672316df7c674f40585b7 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 25 Feb 2014 01:55:01 -0400 Subject: fix #740010 properly --- Remote/WebDAV.hs | 21 +++++++++------------ git-annex.cabal | 2 +- 2 files changed, 10 insertions(+), 13 deletions(-) diff --git a/Remote/WebDAV.hs b/Remote/WebDAV.hs index de1b721c9..8ac9c2c79 100644 --- a/Remote/WebDAV.hs +++ b/Remote/WebDAV.hs @@ -17,7 +17,11 @@ import qualified Data.ByteString.Lazy as L import Network.URI (normalizePathSegments) import qualified Control.Exception as E import qualified Control.Exception.Lifted as EL +#if MIN_VERSION_DAV(0,6,0) +import Network.HTTP.Client (HttpException(..)) +#else import Network.HTTP.Conduit (HttpException(..)) +#endif import Network.HTTP.Types import System.IO.Error @@ -355,8 +359,9 @@ mkdirDAV url user pass = #endif existsDAV :: DavUrl -> DavUser -> DavPass -> IO (Either String Bool) -existsDAV url user pass = either onerr id <$> tryNonAsync check +existsDAV url user pass = either (Left . show) id <$> tryNonAsync check where + ispresent = return . Right #if MIN_VERSION_DAV(0,6,0) check = goDAV url user pass $ do setDepth Nothing @@ -374,20 +379,12 @@ existsDAV url user pass = either onerr id <$> tryNonAsync check #endif (const $ ispresent False) #endif - ispresent = return . Right - {- This is a horrible hack, it seems that the type of the - - HttpException gets screwed up with DAV 0.6.x, and so - - I'm reduced to string matching. :( -} - onerr e - | "StatusCodeException" `isInfixOf` show e - && "statusCode = 404" `isInfixOf` show e = Right False - | otherwise = Left (show e) matchStatusCodeException :: Status -> HttpException -> Maybe () -#if ! MIN_VERSION_http_conduit(1,9,0) -matchStatusCodeException want (StatusCodeException s _) -#else +#if MIN_VERSION_DAV(0,6,0) matchStatusCodeException want (StatusCodeException s _ _) +#else +matchStatusCodeException want (StatusCodeException s _) #endif | s == want = Just () | otherwise = Nothing diff --git a/git-annex.cabal b/git-annex.cabal index 7ba1c7d78..d7bf6cad6 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -134,7 +134,7 @@ Executable git-annex if flag(WebDAV) Build-Depends: DAV ((>= 0.3 && < 0.6) || > 0.6), - http-conduit, xml-conduit, http-types, lifted-base + http-client, http-conduit, http-types, lifted-base CPP-Options: -DWITH_WEBDAV if flag(Assistant) && ! os(solaris) -- cgit v1.2.3 From 5bc9ae17ec6e354bb12f49d6d9845b27ce9259ec Mon Sep 17 00:00:00 2001 From: "https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" Date: Tue, 25 Feb 2014 09:34:17 +0000 Subject: Added a comment --- .../comment_6_5899741cb7f83e1b22c5ee3509c5ff21._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_6_5899741cb7f83e1b22c5ee3509c5ff21._comment diff --git a/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_6_5899741cb7f83e1b22c5ee3509c5ff21._comment b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_6_5899741cb7f83e1b22c5ee3509c5ff21._comment new file mode 100644 index 000000000..8b7c3b52e --- /dev/null +++ b/doc/forum/Convert_regular_git-annex_repo_to_a_rsync_repo/comment_6_5899741cb7f83e1b22c5ee3509c5ff21._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/a/FHnTlSBo1eCGJRwueeKeB6.RCaPbGMPr5jxx8A--#ce0d8" + nickname="Hamza" + subject="comment 6" + date="2014-02-25T09:34:16Z" + content=""" +assuming the remote I am converting is called some-repo should mark it as dead before converting and reinitting as rsync some-repo again? +"""]] -- cgit v1.2.3 From 5262abce37c9310d64d70f75965ea76adaed9196 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" Date: Tue, 25 Feb 2014 09:51:17 +0000 Subject: Added a comment --- .../comment_2_03ae28acedbe1fa45c366b30b58fcf48._comment | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 doc/design/metadata/comment_2_03ae28acedbe1fa45c366b30b58fcf48._comment diff --git a/doc/design/metadata/comment_2_03ae28acedbe1fa45c366b30b58fcf48._comment b/doc/design/metadata/comment_2_03ae28acedbe1fa45c366b30b58fcf48._comment new file mode 100644 index 000000000..c222f75d3 --- /dev/null +++ b/doc/design/metadata/comment_2_03ae28acedbe1fa45c366b30b58fcf48._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" + nickname="Jimmy" + subject="comment 2" + date="2014-02-25T09:51:17Z" + content=""" +Some additional ideas for metadata... + +Instead of having a simplistic scheme like 'field=value' it might be advantageous to consider a scheme like 'attribute=XXX, value=YYY, unit=ZZZ' that way you could do intesting things with the metadata like adding counters to things, and allow for doing interesting queries like give me all 'things' tagged with a unit of \"audio_file\", this assumes one had trawled through an entire annex and then tagged all files based on type with the unix file tool or something like that. + +The above idea is already in use in irods and its a really nice and powerful way to let users add meta-data and to build up more interesting use cases and tools. + +btw, I plan on taking a look at seeing if I can map some of the meta that we have in work into this new git-annex feature to see how well/bad it works. Either way this feature looks cool! +1!!! +"""]] -- cgit v1.2.3 From ecffc15ff50493c6d143f57e057ef96804cccec9 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" Date: Tue, 25 Feb 2014 09:57:09 +0000 Subject: Added a comment --- .../comment_3_ee850df7d3fa4c56194f13a6e3890a30._comment | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 doc/design/metadata/comment_3_ee850df7d3fa4c56194f13a6e3890a30._comment diff --git a/doc/design/metadata/comment_3_ee850df7d3fa4c56194f13a6e3890a30._comment b/doc/design/metadata/comment_3_ee850df7d3fa4c56194f13a6e3890a30._comment new file mode 100644 index 000000000..f77cd8611 --- /dev/null +++ b/doc/design/metadata/comment_3_ee850df7d3fa4c56194f13a6e3890a30._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" + nickname="Jimmy" + subject="comment 3" + date="2014-02-25T09:57:09Z" + content=""" +actually in your mp3 example you could have .... + +ATTRIBUTE=sample_rate, VALUE=22100, UNIT=Hertz + +another example use case is to always be consistent with the AVU order then you could stick in ntriples from RDF to do other cool things by looking up various linked data sources -- see http://www.w3.org/2001/sw/RDFCore/ntriples/ and http://www.freebase.com/, actually this would be quite cool if git-annex examined the mp3's id3 tag, the created an ntriple styled entry can be automatically parsed with the web-based annex gui and automatically pull in additional meta-data from the likes of freebase. I guess the list of ideas can just only get bigger with this potential metadata capability. +"""]] -- cgit v1.2.3 From 6cc6cf01d1644df543d3e264d3f9ddb44e64424b Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawnNqLKszWk9EoD4CDCqNXJRIklKFBCN1Ao" Date: Tue, 25 Feb 2014 11:37:15 +0000 Subject: Added a comment: the startup check is not a small issue --- .../comment_3_ad7cb4c510e2ab26959ea7cb40a43fef._comment | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 doc/forum/performance_and_multiple_replication_problems/comment_3_ad7cb4c510e2ab26959ea7cb40a43fef._comment diff --git a/doc/forum/performance_and_multiple_replication_problems/comment_3_ad7cb4c510e2ab26959ea7cb40a43fef._comment b/doc/forum/performance_and_multiple_replication_problems/comment_3_ad7cb4c510e2ab26959ea7cb40a43fef._comment new file mode 100644 index 000000000..6e4e1b1c6 --- /dev/null +++ b/doc/forum/performance_and_multiple_replication_problems/comment_3_ad7cb4c510e2ab26959ea7cb40a43fef._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawnNqLKszWk9EoD4CDCqNXJRIklKFBCN1Ao" + nickname="maurizio" + subject="the startup check is not a small issue" + date="2014-02-25T11:37:15Z" + content=""" +I would like to add that this startup check has probably been a blocker for my use case for a long long time. I tried to use git-annex to synchronize a huge number of files, most of them never changing. My plan was to have a few tens of GB of data which more or less never change in an archive directory and then add from time to time new data (by batches of a few hundreds of files, each of them not necessarily very large) to the annex. Once this new data has been processed or otherwise become less immediately useful, it would be shifted to the archive. It would have been very useful to have such a setup, because the amount of data is too large to be replicated everywhere, especially on a laptop. After finding this post I finally understand that the seemingly never ending \"performing startup scan\" that I observed are probably not due to the assistant somehow hanging, contrary to what I thought. It seems it is just normal operation. The problem is that this normal operation makes it unusable for the use case I was considering, since it does not make much sense to have git-annex scanning about 10^6 files or links on every boot of a laptop. On my workstation this \"startup scan\" has now been running for close to one hour now and is not finished yet, this is not thinkable on laptop boot. + +Maybe an analysis of how well git-annex operation scales with number of files should be part of the documentation, since \"large files\" is not the only issue when trying to sync different computers. One finds references to \"very large number of files\" about annex.queuesize, but \"very large\" has no clear meaning. One also finds a reference to \"1 million files\" being a bit of a git limitation on comments of a bug report . + +Orders of magnitude of the number of files that git-annex is supposed to be able to handle would be very useful. + + +"""]] -- cgit v1.2.3