summaryrefslogtreecommitdiff
path: root/doc/bugs
diff options
context:
space:
mode:
Diffstat (limited to 'doc/bugs')
-rw-r--r--doc/bugs/copy_doesn__39__t_scale.mdwn29
1 files changed, 27 insertions, 2 deletions
diff --git a/doc/bugs/copy_doesn__39__t_scale.mdwn b/doc/bugs/copy_doesn__39__t_scale.mdwn
index 1a83ae548..79479a335 100644
--- a/doc/bugs/copy_doesn__39__t_scale.mdwn
+++ b/doc/bugs/copy_doesn__39__t_scale.mdwn
@@ -1,4 +1,29 @@
-It seems that git-annex copies every individual file in a separate transaction. This is quite costly for mass transfers: each file involves a separate rsync invocation and the creation of a new commit. Even with a meager thousand files or so in the annex, I have to wait for fifteen minutes to copy the contents to another disk, simply because every individual file involves some disk thrashing. Also, it seems suspicious that the git-annex branch would get a thousands commits of history from the simple procedure of copying everything to a new repository. Surely it would be better to first copy everything and then create only a single commit that registers the changes to the files' availability?
+It seems that git-annex copies every individual file in a separate
+transaction. This is quite costly for mass transfers: each file involves a
+separate rsync invocation and the creation of a new commit. Even with a
+meager thousand files or so in the annex, I have to wait for fifteen
+minutes to copy the contents to another disk, simply because every
+individual file involves some disk thrashing. Also, it seems suspicious
+that the git-annex branch would get a thousands commits of history from the
+simple procedure of copying everything to a new repository. Surely it would
+be better to first copy everything and then create only a single commit
+that registers the changes to the files' availability?
-(I'm also not quite clear on why rsync is being used when both repositories are local. It seems to be just overhead.)
+> git-annex is very careful to commit as infrequently as possible,
+> and the current version makes *1* commit after all the copies are
+> complete, even if it transferred a billion files. The only overhead
+> incurred for each file is writing a journal file.
+> You must have an old version.
+> --[[Joey]]
+(I'm also not quite clear on why rsync is being used when both repositories
+are local. It seems to be just overhead.)
+
+> Even when copying to another disk it's often on
+> some slow bus, and the file is by definition large. So it's
+> nice to support resumes of interrupted transfers of files.
+> Also because rsync has a handy progress display that is hard to get with cp.
+>
+> (However, if the copy is to another directory in the same disk, it does
+> use cp, and even supports really fast copies on COW filesystems.)
+> --[[Joey]]