summaryrefslogtreecommitdiff
path: root/doc/bugs/copy_doesn__39__t_scale.mdwn
blob: 1a83ae548bf4ff0017653f459e31603ce5b650fb (plain)
1
2
3
4
It seems that git-annex copies every individual file in a separate transaction. This is quite costly for mass transfers: each file involves a separate rsync invocation and the creation of a new commit. Even with a meager thousand files or so in the annex, I have to wait for fifteen minutes to copy the contents to another disk, simply because every individual file involves some disk thrashing. Also, it seems suspicious that the git-annex branch would get a thousands commits of history from the simple procedure of copying everything to a new repository. Surely it would be better to first copy everything and then create only a single commit that registers the changes to the files' availability?

(I'm also not quite clear on why rsync is being used when both repositories are local. It seems to be just overhead.)