summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2016-01-05 12:01:59 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2016-01-05 12:01:59 -0400
commit20bcc4b2815bffff65d8a776f990572066ea7f94 (patch)
treef16e2ef351e336f42fad7b3c1374d097246eaa32
parent0ec3c369ff6d3d3a2364094980d481c0da71c325 (diff)
parent060de0cb495ca72c41dd68e3786a3415a947ed65 (diff)
Merge branch 'master' of ssh://git-annex.branchable.com
-rw-r--r--doc/bugs/--json_issues.mdwn44
-rw-r--r--doc/bugs/acl_not_honoured_in_rsync_remote/comment_3_f93177593a2d90627672647fd5f065c9._comment7
-rw-r--r--doc/forum/How_does_git-annex_handle_rsyncing_between_different_OSes_with_regards_to_UTF-8__63__/comment_2_7393a4b2e94f9d36c3c9ca977a8f67b6._comment8
-rw-r--r--doc/forum/Massive_drop_in_performance_with_--jobs_option.mdwn9
-rw-r--r--doc/forum/Massive_drop_in_performance_with_--jobs_option/comment_2_c9fea9db7116e7ee82a09c23f22b54e9._comment30
-rw-r--r--doc/forum/__34__git_annex_get__34___on_windows_fails_with_rsync_error.mdwn20
-rw-r--r--doc/forum/basic_usage_questions.mdwn5
-rw-r--r--doc/git-annex-importfeed/comment_3_bce2b233e4d42fc87a2e17d51e2c2606._comment11
-rw-r--r--doc/publicrepos.mdwn5
-rw-r--r--doc/tips/Repositories_with_large_number_of_files/comment_3_992f2a85ce0cdcef2f97ff978560fdb8._comment19
-rw-r--r--doc/tips/Repositories_with_large_number_of_files/comment_4_8ff3aa032fb778ff69276984152578b0._comment11
-rw-r--r--doc/todo/--batch_for_info.mdwn3
12 files changed, 172 insertions, 0 deletions
diff --git a/doc/bugs/--json_issues.mdwn b/doc/bugs/--json_issues.mdwn
new file mode 100644
index 000000000..b1cad44da
--- /dev/null
+++ b/doc/bugs/--json_issues.mdwn
@@ -0,0 +1,44 @@
+### Please describe the problem.
+Duplicate "note" properties in JSON output of whereis command. The JSON output is "[strictly speaking](http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object)" valid, however, the duplicate property makes it difficult to use python-json, and possibly other implementations (I just checked node and chromium and they also keep the value of the last property after deserialization).
+
+I noticed the problem being mentioned [here](https://git-annex.branchable.com/forum/git_annex_whereis_--json_output_with_two_variables_with_same_name/), where the user desires to parse an entry in the machine parsable json output that is apparently not meant to be parsed by a machine, which makes a person wonder why it's there anyway. Even so, obtaining the actual url in the web remote for an annexed object relies on one of the "note" properties in the json output. Using [whereis](
+http://git-annex.branchable.com/tips/using_the_web_as_a_special_remote/#comment-7878bde74289b42500e4fac3a122a535) to get the url(s) for a file is the recommended method. A json implementation that sets the first property, then ignores remaining duplicates will only parse the "2 copies" note, and ignore the url.
+
+If the "note" properties are meant to be comments, it might be a good idea to find another property for the url(s). Please note that I haven't looked at multiple urls for an object, so I'm not sure that only the last listed url will appear in parsed json object.
+
+### What steps will reproduce the problem?
+
+git-annex whereis --json
+
+### What version of git-annex are you using? On what operating system?
+
+git-annex version: 5.20151208-1 (sid chroot)
+
+### Please provide any additional information below.
+
+[[!format sh """
+# If you can, paste a complete transcript of the problem occurring here.
+# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
+umeboshi@bard:~/tmp$ git init testweb
+Initialized empty Git repository in /freespace/home/umeboshi/tmp/testweb/.git/
+umeboshi@bard:~/tmp$ cd testweb/
+umeboshi@bard:~/tmp/testweb$ git-annex init
+init ok
+(recording state in git...)
+umeboshi@bard:~/tmp/testweb$ git-annex addurl http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
+addurl www.ecma_international.org_publications_files_ECMA_ST_ECMA_404.pdf (downloading http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf ...)
+/freespace/home/ume 100%[===================>] 1.08M 917KB/s in 1.2s
+ok
+(recording state in git...)
+umeboshi@bard:~/tmp/testweb$ git-annex whereis www.ecma_international.org_publications_files_ECMA_ST_ECMA_404.pdf --json
+{"command":"whereis","file":"www.ecma_international.org_publications_files_ECMA_ST_ECMA_404.pdf","note":"2 copies","whereis":[{"uuid":"00000000-0000-0000-0000-000000000001","description":"web","here":false},{"uuid":"996522e8-a433-42ff-85f2-48e456fdb120","description":"umeboshi@bard:~/tmp/testweb","here":true}],"note":"\t00000000-0000-0000-0000-000000000001 -- web\n \t996522e8-a433-42ff-85f2-48e456fdb120 -- umeboshi@bard:~/tmp/testweb [here]\n","untrusted":[],"note":"web: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf\n","success":true}
+
+umeboshi@bard:~/tmp/testweb$
+
+
+# End of transcript or log.
+"""]]
+
+### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
+
+Lol! Positive note! I spend a couple of days making specially crafted rss files for importfeed, so I can have appropriate filenames, then look back and see that you are working on '--batch --with-files' options to addurl. Have a Happy New Year! :)
diff --git a/doc/bugs/acl_not_honoured_in_rsync_remote/comment_3_f93177593a2d90627672647fd5f065c9._comment b/doc/bugs/acl_not_honoured_in_rsync_remote/comment_3_f93177593a2d90627672647fd5f065c9._comment
new file mode 100644
index 000000000..6b20bbc3a
--- /dev/null
+++ b/doc/bugs/acl_not_honoured_in_rsync_remote/comment_3_f93177593a2d90627672647fd5f065c9._comment
@@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="eigengrau"
+ subject="comment 3"
+ date="2016-01-03T07:52:36Z"
+ content="""
+I recently experimented with shared repos a bit and came across this rsync issue. I’ve found that instead of setting `rsync-options` to `--chmod=755`, you can also set it to `--chmod=ugo=rX`. This will cause the executable bit to be set only when the source file is already executable.
+"""]]
diff --git a/doc/forum/How_does_git-annex_handle_rsyncing_between_different_OSes_with_regards_to_UTF-8__63__/comment_2_7393a4b2e94f9d36c3c9ca977a8f67b6._comment b/doc/forum/How_does_git-annex_handle_rsyncing_between_different_OSes_with_regards_to_UTF-8__63__/comment_2_7393a4b2e94f9d36c3c9ca977a8f67b6._comment
new file mode 100644
index 000000000..eb12e5153
--- /dev/null
+++ b/doc/forum/How_does_git-annex_handle_rsyncing_between_different_OSes_with_regards_to_UTF-8__63__/comment_2_7393a4b2e94f9d36c3c9ca977a8f67b6._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="grawity@2ea26be48562f66fcb9b66307da72b1e2e37453f"
+ nickname="grawity"
+ subject="comment 2"
+ date="2016-01-05T11:09:07Z"
+ content="""
+Git does have a `core.precomposeUnicode` option for converting NFD to NFC; would that work?
+"""]]
diff --git a/doc/forum/Massive_drop_in_performance_with_--jobs_option.mdwn b/doc/forum/Massive_drop_in_performance_with_--jobs_option.mdwn
new file mode 100644
index 000000000..bccb0d9dc
--- /dev/null
+++ b/doc/forum/Massive_drop_in_performance_with_--jobs_option.mdwn
@@ -0,0 +1,9 @@
+Old version: 5.20150916-1
+New version: 5.20151208-1
+
+
+I have noticed that with the addition of the progress display, the actual overall performance of the transfer drop very noticeably from these two versions. I am very tempted to keep the old version around to just handle the transfers.
+
+Also, I decided to build a newer version from git, using debuild. I noticed that the libghc-persistant dependencies have no trailing commas. I added the commas, built the package and had two errors in one of the tests. I removed these dependencies from the control file and I am currently performing a rebuild (they were not present in 12/08 version).
+
+Build complete. Tests passed. Parallel transfers seem to be back up to speed. Thanks!
diff --git a/doc/forum/Massive_drop_in_performance_with_--jobs_option/comment_2_c9fea9db7116e7ee82a09c23f22b54e9._comment b/doc/forum/Massive_drop_in_performance_with_--jobs_option/comment_2_c9fea9db7116e7ee82a09c23f22b54e9._comment
new file mode 100644
index 000000000..cfc5f2684
--- /dev/null
+++ b/doc/forum/Massive_drop_in_performance_with_--jobs_option/comment_2_c9fea9db7116e7ee82a09c23f22b54e9._comment
@@ -0,0 +1,30 @@
+[[!comment format=mdwn
+ username="umeboshi"
+ subject="comment 2"
+ date="2016-01-02T23:13:15Z"
+ content="""
+I have been getting more failures than I used to when using -J with get and copy commands.
+
+The dirannex remote is a local directory remote.
+
+```
+get 74/024842a88350573fa70e33f6dea9d274.jpg (from web...) ok
+get 74/02c67a1850ee54848d0203c48a613d74.jpg (from dirannex...) ok
+get 75/024b4a9e700c51aeb9a51d7566a19c75.jpg (transfer already in progress, or unable to take transfer lock)
+ Unable to access these remotes: web
+
+ Try making some of these repositories available:
+ 00000000-0000-0000-0000-000000000001 -- web
+failed
+```
+
+```
+git-annex copy -J5 --to dirannex
+<snip>
+(transfer already in progress, or unable to take transfer lock) failed
+git-annex: copy: 5 failed
+```
+
+
+
+"""]]
diff --git a/doc/forum/__34__git_annex_get__34___on_windows_fails_with_rsync_error.mdwn b/doc/forum/__34__git_annex_get__34___on_windows_fails_with_rsync_error.mdwn
new file mode 100644
index 000000000..5fb7492ba
--- /dev/null
+++ b/doc/forum/__34__git_annex_get__34___on_windows_fails_with_rsync_error.mdwn
@@ -0,0 +1,20 @@
+I tried cloning an annex repo between two drives "c:" and "d:". The part with "git clone" itself works, but when I try to execute "git annex get", rsync reports an error about a missing path starting with "/cygdrive".
+
+ Sameer@DESKTOP-6CJGO0T MINGW32 /d/a (annex/direct/master)
+ $ git annex get --not --in here
+ get world.txt (from origin...)
+ rsync: change_dir "/cygdrive/c/scratch/a" failed: No such file or directory (2)
+ rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1183) [sender=3.1.1]
+
+ rsync failed -- run git annex again to resume file transfer
+
+ Unable to access these remotes: origin
+
+ Try making some of these repositories available:
+ cda3a2c6-7ddc-4164-ad6d-fbb2720b24d7 -- DESKTOP-6CJGO0T:C:\scratch\a [origin]
+ failed
+ git-annex: get: 1 failed
+
+I am running git annex that was installed in the same directory as a 32-bit git version 2.6.4 for windows (mingw32).
+
+The question is, my windows drives are actually visible as "/c" and "/a" ... then why is rsync searching for "/cygdrive/c" etc? This is clearly not a Cygwin installation.
diff --git a/doc/forum/basic_usage_questions.mdwn b/doc/forum/basic_usage_questions.mdwn
new file mode 100644
index 000000000..cd6a5505b
--- /dev/null
+++ b/doc/forum/basic_usage_questions.mdwn
@@ -0,0 +1,5 @@
+Seeking some clarification:
+
+ 1. I'm using direct mode. When I run `git annex sync --content` on repo A, all files get copied to repo B, but they remain hidden under `.git/annex/objects`. Is there a way to _automatically_ put them in repo B's working tree, without having to go to repo B and run `git annex sync` there as well? _(I'm sure I saw that happen earlier, but not anymore?)_
+
+ 1. I have two PCs and a portable HD. There are git-annex repos on PC_1 and USB_HD, with each other listed under `git remote`. Now I want to set up git-annex on PC_2. Is it okay to use the same repo path (~/Videos/) on both PCs? I'm concerned that it would confuse the USB_HD repo greatly, as it would end up having two "remotes" with identical paths.
diff --git a/doc/git-annex-importfeed/comment_3_bce2b233e4d42fc87a2e17d51e2c2606._comment b/doc/git-annex-importfeed/comment_3_bce2b233e4d42fc87a2e17d51e2c2606._comment
new file mode 100644
index 000000000..25f0ed793
--- /dev/null
+++ b/doc/git-annex-importfeed/comment_3_bce2b233e4d42fc87a2e17d51e2c2606._comment
@@ -0,0 +1,11 @@
+[[!comment format=mdwn
+ username="umeboshi"
+ subject="comment 3"
+ date="2016-01-02T21:02:51Z"
+ content="""
+I just thought it was surprising behavior, especially when ```git-annex addurl --file 123-45.ext $url``` preserves the dash in the filename, yet the rss I constructed to add multiple urls, with files, didn't do this.
+
+The new ```--batch --with-files``` options to addurl will eliminate the need to create specially crafted rss files.
+
+Thanks! :)
+"""]]
diff --git a/doc/publicrepos.mdwn b/doc/publicrepos.mdwn
index e54e0b99a..c3adf6739 100644
--- a/doc/publicrepos.mdwn
+++ b/doc/publicrepos.mdwn
@@ -33,5 +33,10 @@ the public repositories that you can clone to try out git-annex.
git-annex repository using [the CDN](http://media.ccc.de/) of the German [CCC](http://www.ccc.de/).
Contains a lot of talks (mostly in German) held on events from the CCC as well as other stuff.
+* [ifarchive](https://gitlab.com/umeboshi2/ifarchive.git)
+ A slightly outdated mirror of http://ifarchive.org. Scripts should probably be written
+ to update the archive regularly.
+
+
This is a wiki -- add your own public repository to the list!
See [[tips/centralized_git_repository_tutorial]].
diff --git a/doc/tips/Repositories_with_large_number_of_files/comment_3_992f2a85ce0cdcef2f97ff978560fdb8._comment b/doc/tips/Repositories_with_large_number_of_files/comment_3_992f2a85ce0cdcef2f97ff978560fdb8._comment
new file mode 100644
index 000000000..514c829fe
--- /dev/null
+++ b/doc/tips/Repositories_with_large_number_of_files/comment_3_992f2a85ce0cdcef2f97ff978560fdb8._comment
@@ -0,0 +1,19 @@
+[[!comment format=mdwn
+ username="umeboshi"
+ subject="comment 3"
+ date="2016-01-04T21:26:05Z"
+ content="""
+I have been playing with tracking a large number of url's for about one month now. Having already been disappointed by how git performs when there are a very large amount of files in the annex, I tested making multiple annexes. I did find that splitting the url's into multiple annexes increased performance, but at the cost of extra housekeeping, duplicated url's, and more work needed to keep track of the url's. Part of the duplication and tracking problem was mitigated by using a dumb remote, such as rsync or directory, where a very large amount of objects can be stored. The dumb remotes perform very well, however each annex needed to be synced regularly with the dumb remote.
+
+I found the dumb remote to be great for multiple annexes. I have noticed that a person can create a new annex and extract a tarball of symlinks into the repo, the ```git commit``` the links. Subsequently, executing ```git-annex fsck --from dummy``` would setup the tracking info, which was pretty useful.
+
+However, I found that by the time I got to over fifty annexes, the overall performance far worse than just storing the url's and file paths in a postgresql database. In fact, the url's are already being stored and accessed from such a database, but I had the desire to access the url's from multiple machines, which is a bit more difficult with a centralized database.
+
+After reading the tips and pages discussing splitting the files into multiple directories, and changing the index version, I decided to try a single annex to hold the url's. Over the new year's weekend, I decided to write a script that generates rss files to use with importfeed to add url's to this annex. I have noticed that when using ```git commit``` the load average of the host was in the mid twenties and persisted for hours until I had to kill the processes to be able to use the machine again (**I would really like to know if there is a git config setting that would keep the load down, so the machine can be used during a commit**). I gave up on ```git-annex sync``` this morning, since it was taking longer than I was able to sit in the coffee shop and wait for it (~3 hrs).
+
+I came back to the office, and started ```git gc``` which has been running for ~1hr.
+
+When making the larger annex, I decided to use the hexidecimal value of uuid5(url) for each filename, and use the two high nybbles and the two low nybbles for a two state directory structure, with respect for the advice from [CandyAngel](http://git-annex.branchable.com/forum/Handling_a_large_number_of_files/#comment-48ac38361b131b18125f8c43eb6ad577). When my url's are organized in this manner, I still need access to the database to perform the appropriate ```git-annex get``` which impairs the use of this strategy, but I'm still testing and playing around. I suspended adding url's to this annex until I get at least one sync performed.
+
+The url annex itself is not very big, and I am guessing the average file size to be close to 500K. The large number of url's seems to be a problem I have yet to solve. I wanted to experiment with this to further the idea of the public git-annex repositories that seem to be a useful idea, even though the utility of this idea is very limited at the moment.
+"""]]
diff --git a/doc/tips/Repositories_with_large_number_of_files/comment_4_8ff3aa032fb778ff69276984152578b0._comment b/doc/tips/Repositories_with_large_number_of_files/comment_4_8ff3aa032fb778ff69276984152578b0._comment
new file mode 100644
index 000000000..78b79ef08
--- /dev/null
+++ b/doc/tips/Repositories_with_large_number_of_files/comment_4_8ff3aa032fb778ff69276984152578b0._comment
@@ -0,0 +1,11 @@
+[[!comment format=mdwn
+ username="CandyAngel"
+ subject="comment 4"
+ date="2016-01-05T11:41:13Z"
+ content="""
+@umeboshi: Odd that you report your machine freezes during commits.. I find the exact opposite.. waiting for a long time with no load at all.
+
+My current setup for my sorting annex (where I import all the files off old HDDs) is to have the HDD plugged into my home server (Atom 330 @ 1.6Ghz) and import the files on a cloned (empty) annex. Doing so for 1.1M files (latest HDD) is a long wait, because 80% of the time is waiting for something to happen (but there being no load on the machine). Once that is done, the HDD is transferred to my desktop, where the annex is \"joined\" to the others and files are sorted in a dedicated VM[1], where commit times are reasonable.
+
+[1] Fully virtualising my desktop is possibly the best thing I've ever done, in terms of setup. Locking up any VM affects none of the others (which is handy, as I discovered an issue that causes X to almost hardlock whenever libvo is used..).
+"""]]
diff --git a/doc/todo/--batch_for_info.mdwn b/doc/todo/--batch_for_info.mdwn
new file mode 100644
index 000000000..007bc6d49
--- /dev/null
+++ b/doc/todo/--batch_for_info.mdwn
@@ -0,0 +1,3 @@
+I guess as other commands which take separate files/keys as its argument(s), having --batch for info command would be of benefit
+
+[[!meta author=yoh]]