diff options
author | Joey Hess <joey@kitenet.net> | 2011-04-05 14:02:01 -0400 |
---|---|---|
committer | Joey Hess <joey@kitenet.net> | 2011-04-05 14:02:01 -0400 |
commit | 501e0ded5e6b87bb02d091db83c4e7086befcc39 (patch) | |
tree | a8604d430622b85976caec542947e78cc13dcaa9 /doc | |
parent | 094983a2bdbb7cbf2aea96addd3afe1ddedc6c69 (diff) | |
parent | 24e2c13387179d3ca1ed2dd50f5d3fbafbc8f32e (diff) |
Merge remote-tracking branch 'branchable/master'
Diffstat (limited to 'doc')
3 files changed, 55 insertions, 0 deletions
diff --git a/doc/forum/Problems_with_large_numbers_of_files.mdwn b/doc/forum/Problems_with_large_numbers_of_files.mdwn new file mode 100644 index 000000000..1dbddd3e2 --- /dev/null +++ b/doc/forum/Problems_with_large_numbers_of_files.mdwn @@ -0,0 +1,8 @@ +I'm trying to use git-annex to archive scientific data. I'm often dealing with large numbers of files, sometimes 10k or more. When I try to git-annex add these files I get this error: + + + Stack space overflow: current size 8388608 bytes. + Use `+RTS -Ksize' to increase it. + + +This is with the latest version of git-annex and a current version of git on OS 10.6.7. After this error occurs, I am unable to un-annex the files and I'm forced to recover from a backup. diff --git a/doc/forum/Problems_with_large_numbers_of_files/comment_1_08791cb78b982087c2a07316fe3ed46c._comment b/doc/forum/Problems_with_large_numbers_of_files/comment_1_08791cb78b982087c2a07316fe3ed46c._comment new file mode 100644 index 000000000..94043a700 --- /dev/null +++ b/doc/forum/Problems_with_large_numbers_of_files/comment_1_08791cb78b982087c2a07316fe3ed46c._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" + nickname="Jimmy" + subject="comment 1" + date="2011-04-05T07:27:46Z" + content=""" +Heh, cool, I was thinking throwing about 28million files at git-annex. Let me know how it goes, I suspect you have just run into a default limits OSX problem. + +You probably just need to up some system limits (you will need to read the error messages that first appear) then do something like + +<pre> +# this is really for the run time, you can set these settings in /etc/sysctl.conf +sudo sysctl -w kern.maxproc=2048 +sudo sysctl -w kern.maxprocperuid=1024 + +# tell launchd about having higher limits +sudo echo \"limit maxfiles 1024 unlimited\" >> /etc/launchd.conf +sudo echo \"limit maxproc 1024 2048\" >> /etc/launchd.conf +</pre> + +There are other system limits which you can check by doing a \"ulimit -a\", once you make the above changes, you will need to reboot to make the changes take affect. I am unsure if the above will help as it is an example of what I did on 10.6.6 a few months ago to fix some forking issues. From the error you got you will probably need to increase the stacksize to something bigger or even make it unlimited if you feel lucky, the default stacksize on OSX is 8192, try making it say 10times that size first and see what happens. +"""]] diff --git a/doc/forum/Problems_with_large_numbers_of_files/comment_2_0392a11219463e40c53bae73c8188b69._comment b/doc/forum/Problems_with_large_numbers_of_files/comment_2_0392a11219463e40c53bae73c8188b69._comment new file mode 100644 index 000000000..8ea5531f4 --- /dev/null +++ b/doc/forum/Problems_with_large_numbers_of_files/comment_2_0392a11219463e40c53bae73c8188b69._comment @@ -0,0 +1,25 @@ +[[!comment format=mdwn + username="http://joey.kitenet.net/" + nickname="joey" + subject="comment 2" + date="2011-04-05T17:46:03Z" + content=""" +This message comes from ghc's runtime memory manager. Apparently your ghc defaults to limiting the stack to 80 mb. +Mine seems to limit it slightly higher -- I have seen haskell programs successfully grow as large as 350 mb, although generally not intentionally. :) + +Here's how to adjust the limit at runtime, obviously you'd want a larger number: + +<pre> +# git-annex +RTS -K100 -RTS find +Stack space overflow: current size 100 bytes. +Use `+RTS -Ksize -RTS' to increase it. +</pre> + +I've tried to avoid git-annex using quantities of memory that scale with the number of files in the repo, and I think in general successfully -- I run it on 32 mb and 128 mb machines, FWIW. There are some tricky cases, and haskell makes it easy to accidentally write code that uses much more memory than would be expected. + +One well known case is `git annex unused`, which *has* to build a structure of every annexed file. I have been considering using a bloom filter or something to avoid that. + +Another possible case is when running a command like `git annex add`, and passing it a lot of files/directories. Some code tries to preserve the order of your input after passing it through `git ls-files` (which destroys ordering), and to do so it needs to buffer both the input and the result in ram. + +It's possible to build git-annex with memory profiling and generate some quite helpful profiling data. Edit the Makefile and add this to GHCFLAGS: `-prof -auto-all -caf-all -fforce-recomp` then when running git-annex, add the parameters: `+RTS -p -RTS` , and look for the git-annex.prof file. +"""]] |