diff options
author | Joey Hess <joeyh@joeyh.name> | 2015-01-21 13:11:17 -0400 |
---|---|---|
committer | Joey Hess <joeyh@joeyh.name> | 2015-01-21 13:11:17 -0400 |
commit | 58c4115c463ee98588df66774a22a3f82f9a6273 (patch) | |
tree | 22d884e69ad2ecc6d98d81573d42384b6e641a76 | |
parent | 2f2635528d8e428e0fafaf7877a4572a397cdf52 (diff) | |
parent | 3a2b585656ba53fc90ebdf9e4ef1a1f912c8bdda (diff) |
Merge branch 'master' of ssh://git-annex.branchable.com
4 files changed, 129 insertions, 0 deletions
diff --git a/doc/bugs/How_to_use_a_DRA_bucket_in_Google_cloud_storage__63__.mdwn b/doc/bugs/How_to_use_a_DRA_bucket_in_Google_cloud_storage__63__.mdwn new file mode 100644 index 000000000..03c80b045 --- /dev/null +++ b/doc/bugs/How_to_use_a_DRA_bucket_in_Google_cloud_storage__63__.mdwn @@ -0,0 +1,23 @@ +### Please describe the problem. + +Git annex's special S3 remote doesn't seem to work with DRA buckets in Google cloud storage. + +### What steps will reproduce the problem? + +I created a DRA-style bucket in Google cloud storage: + + gsutil mb gs://gitannex-dra + +Then followed [this hint](https://gist.github.com/jterrace/4576324) to +set up use of GCS. Except that it didn't work: + + git annex initremote gcs type=S3 encryption=none host=storage.googleapis.com port=80 bucket=gitannex-dra + initremote gcs (checking bucket...) git-annex: Invalid argument. + +### What version of git-annex are you using? On what operating system? + +Wheezy, git-annex version: 5.20141024~bpo70+1 + +### Please provide any additional information below. + +There didn't seem to be any extra logs and `--debug` didn't seem to add anything useful. diff --git a/doc/bugs/git_annex_direct_-__62___rename:_does_not_exist/comment_4_3c7a7a0983d3a75a04395141aaf16dbb._comment b/doc/bugs/git_annex_direct_-__62___rename:_does_not_exist/comment_4_3c7a7a0983d3a75a04395141aaf16dbb._comment new file mode 100644 index 000000000..fb1c5641d --- /dev/null +++ b/doc/bugs/git_annex_direct_-__62___rename:_does_not_exist/comment_4_3c7a7a0983d3a75a04395141aaf16dbb._comment @@ -0,0 +1,33 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawnwNDA50ZupMvOgpgDqzDRyu5B-mYlVwa4" + nickname="Andreas" + subject="comment 4" + date="2015-01-21T07:30:51Z" + content=""" +This is what I see: + + ➜ ~ mkdir test + ➜ ~ cd test + ➜ test git init + Initialized empty Git repository in /home/deas/test/.git/ + ➜ test git:(master) + ➜ test git:(master) git annex init + init ok + (Recording state in git...) + ➜ test git:(master) touch foobar.txt + ➜ test git:(master) ✗ git annex add + add foobar.txt ok + (Recording state in git...) + ➜ test git:(master) ✗ git annex direct + commit + [master (root-commit) a6e3d83] commit before switching to direct mode + 1 file changed, 1 insertion(+) + create mode 120000 foobar.txt + ok + direct foobar.txt + /home/deas/test/.git/annex/misctmp/tmp6895: rename: does not exist (No such file or directory) + + leaving this file as-is; correct this problem and run git annex fsck on it + direct ok + ➜ test git:(annex/direct/master) +"""]] diff --git a/doc/bugs/huge_multiple_copies_of___39__.nfs__42____39___and___39__.panfs__42____39___being_created.mdwn b/doc/bugs/huge_multiple_copies_of___39__.nfs__42____39___and___39__.panfs__42____39___being_created.mdwn new file mode 100644 index 000000000..76ae3646f --- /dev/null +++ b/doc/bugs/huge_multiple_copies_of___39__.nfs__42____39___and___39__.panfs__42____39___being_created.mdwn @@ -0,0 +1,30 @@ +### Please describe the problem. + +I have 2 indirect mode repos, both on network filesystems, that I have only used for adding +data on one end, then syncing via `git annex sync` and `git annex get`. The problem +is that`.nfs` copies are being made for each git annex object data file, e.g: + +`./.git/annex/objects/34/2x/SHA256E-s4112535690--c5f0e5a8af7bf17dd4a8ca192c8ddfb01fe6ec10908c80cffa5ac64c00e28443.vtk.gz/.nfs0000000006d0018600002147` + +Reading up on .nfs files, they are generated when "an open file is removed but is still being accessed". + +### What steps will reproduce the problem? +Clone a git annex repo on a network file system, run +`git annex sync` , +`git annex drop` , +`git annex get` + +### What version of git-annex are you using? On what operating system? +* git-annex version: 5.20140818-g10bf03a +* 2.6.34.9-69.fc13.x86_64 fedora 13 +* 2.6.32-279.22.1.el6.x86_64 centOS + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] diff --git a/doc/forum/scalability_with_lots_of_files.mdwn b/doc/forum/scalability_with_lots_of_files.mdwn new file mode 100644 index 000000000..3bbd877cf --- /dev/null +++ b/doc/forum/scalability_with_lots_of_files.mdwn @@ -0,0 +1,43 @@ +What is git-annex's [[scalability]] with large (10k+) number of files and a few (~10) repositories? + +I have had difficult times maintaining a music archive of around 20k files, spread around 17 repositories. + +`ncdu` tells me, of the actual files in the direct repository: + +<pre> +$ ncdu --exclude .git + Total disk usage: 109,3GiB Apparent size: 109,3GiB Items: 23771 +</pre> + +Now looking at the git-annex metadata: + +<pre> +$ time git clone -b git-annex /srv/mp3 +Cloning into 'mp3'... +done. +Checking out files: 100% (31207/31207), done. +0.69user 1.72system 0:04.65elapsed 51%CPU (0avgtext+0avgdata 47732maxresident)k +40inputs+489552outputs (1major+2906minor)pagefaults 0swaps +$ git branch + annex/direct/master +* git-annex + master +$ wc -l uuid.log +7 uuid.log +$ find -type f | wc + 31429 62214 3013920 +$ du -sh . +361M . +$ du -sch * | tail -1 +243M total +</pre> + +So basically, it looks like the git-annex location tracking takes up around 243M, 361M if we include git's history of it (I assume). This means around 8KiB of storage per file, and 4KiB/file for history (git is doing a pretty good job here). (8KiB kind of makes sense here: one file for the tracking log (4KiB) and another directory to hold it (another 4KiB)...) + +Is that about right? Are there ways to compress that somehow? Could I at least drop the *history* of that from git without too much harm - that would already save 120MiB... + +That repository is around 18 months old. + +(It's interesting to notice the limitation of the "one file per record" storage format here: since git-annex has so many little files, and all of those take at least $blocksize (it seems like it's 4KB here), it takes up space pretty quickly. Another good point for git here: packing files together saves a *lot* of space! Could files be packed *before* being stored in the git-annex branch? or is that totally stupid. :) + +Thanks! --[[anarcat]] |