diff options
author | CandyAngel <CandyAngel@web> | 2015-06-17 08:28:14 +0000 |
---|---|---|
committer | admin <admin@branchable.com> | 2015-06-17 08:28:14 +0000 |
commit | 5732c5c9da8c42ebf2ea69e25e2e6f0e5a3d5157 (patch) | |
tree | 8317faf62059c2ec6fff2372b3c47f5c1ef26ce5 /doc/tips | |
parent | 623991ff30b640155bf4c291131d30c652a58732 (diff) |
Initial writeup of tips for repos with large file count
Diffstat (limited to 'doc/tips')
-rw-r--r-- | doc/tips/Repositories_with_large_number_of_files.mdwn | 48 |
1 files changed, 48 insertions, 0 deletions
diff --git a/doc/tips/Repositories_with_large_number_of_files.mdwn b/doc/tips/Repositories_with_large_number_of_files.mdwn new file mode 100644 index 000000000..c1f219eee --- /dev/null +++ b/doc/tips/Repositories_with_large_number_of_files.mdwn @@ -0,0 +1,48 @@ +Just as git does not scale well with large files, it can also become painful to work with when you have a large *number* of files. Below are things I have found to minimise the pain. + +# Using version 4 index files + +During operations which affect the index, git writes an entirely new index out to index.lck and then replaces .git/index with it. With a large number of files, this index file can be quite large and take several seconds to write every time you manipulate the index! + +This can be mitigated by changing it to version 4 which uses path compression to reduce the filesize: + + git update-index --index-version 4 + +*NOTE: The git documentation warns that this version may not be supported by other git implementations like JGit and libgit2.* + +Personally, I saw a reduction from 516MB to 206MB (*40% of original size*) and got a much more responsive git! + +It may also be worth doing the same to git-annex's index: + + GIT_INDEX_FILE=.git/annex/index git update-index --index-version 4 + +Though I didn't gain as much here with 89MB to 86MB (96% of original size). + +# Packing + +As I have gc disabled: + + git config gc.auto 0 + +so I control when it is run, I ended up with a lot of loose objects which also cause slowness in git. Using + + git count-objects + +to tell me how many loose objects I have, when I reach a threshold (~25000), I pack those loose objects and clean things up: + + git repack -d + git gc + git prune + +# File count per directory + +If it takes a long time to list the files in a directory, naturally, git(-annex) will be affected by this bottleneck. + +You can avoid this by keeping the number of files in a directory to between 5000 and 20000 (depends on the filesystem and its settings). + +[fpart](http://contribs.martymac.org/fpart/) can be a very useful tool to achieve this. + +## Topics discussing this sort of usage + +* [[forum/Handling_a_large_number_of_files]] +* [[forum/__34__git_annex_sync__34___synced_after_8_hours]] |