summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Joey Hess <joey@kitenet.net>2012-06-12 21:29:01 -0400
committerGravatar Joey Hess <joey@kitenet.net>2012-06-12 21:29:01 -0400
commitc1566757972e4774476a10bc6f865a88823e8938 (patch)
tree9ac9b90db3e25dacb548f38fff15e8c5cfa85418
parent74aa310ad6f68cc670135f12e1023a4f9dfaf3c4 (diff)
blog for the day
-rw-r--r--doc/design/assistant/blog/day_7__bugfixes.mdwn45
-rw-r--r--doc/design/assistant/blog/day_7__bugfixes/profile.pngbin0 -> 47098 bytes
-rw-r--r--doc/design/assistant/blog/day_7__bugfixes/profile2.pngbin0 -> 230937 bytes
3 files changed, 45 insertions, 0 deletions
diff --git a/doc/design/assistant/blog/day_7__bugfixes.mdwn b/doc/design/assistant/blog/day_7__bugfixes.mdwn
new file mode 100644
index 000000000..3704969e3
--- /dev/null
+++ b/doc/design/assistant/blog/day_7__bugfixes.mdwn
@@ -0,0 +1,45 @@
+Kickstarter is over. Yay!
+
+Today I worked on the bug where `git annex watch` turned regular files
+that were already checked into git into symlinks. So I made it check
+if a file is already in git before trying to add it to the annex.
+
+The tricky part was doing this check quickly. Unless I want to write my
+own git index parser (or use one from Hackage), this check requires running
+`git ls-files`, once per file to be added. That won't fly if a huge
+tree of files is being moved or unpacked into the watched directory.
+
+Instead, I made it only do the check during `git annex watch`'s initial
+scan of the tree. This should be ok, because once it's running, you
+won't be adding new files to git anyway, since it'll automatically annex
+new files. This is good enough for now, but there are at least two problems
+with it:
+
+* Someone might `git merge` in a branch that has some regular files,
+ and it would add the merged in files to the annex.
+* Once `git annex watch` is running, if you modify a file that was
+ checked into git as a regular file, the new version will be added
+ to the annex.
+
+I'll probably come back to this issue, and may well find myself directly
+querying git's index.
+
+---
+
+I've started work to fix the memory leak I see when running `git annex
+watch` in a large repository (40 thousand files). As always with a Haskell
+memory leak, I crack open [Real World Haskell's chapter on profiling](http://book.realworldhaskell.org/read/profiling-and-optimization.html).
+
+Eventually this yields a nice graph of the problem:
+
+[[!img profile.png alt="memory profile"]]
+
+So, looks like a few minor memory leaks, and one huge leak. Stared
+at this for a while and trying a few things, and got a much better result:
+
+[[!img profile2.png alt="memory profile"]]
+
+I may come back later and try to improve this further, but it's not bad memory
+usage. But, it's still rather slow to start up in such a large repository,
+and its initial scan is still doing too much work. I need to optimize
+more..
diff --git a/doc/design/assistant/blog/day_7__bugfixes/profile.png b/doc/design/assistant/blog/day_7__bugfixes/profile.png
new file mode 100644
index 000000000..702af1ca0
--- /dev/null
+++ b/doc/design/assistant/blog/day_7__bugfixes/profile.png
Binary files differ
diff --git a/doc/design/assistant/blog/day_7__bugfixes/profile2.png b/doc/design/assistant/blog/day_7__bugfixes/profile2.png
new file mode 100644
index 000000000..4d487d02a
--- /dev/null
+++ b/doc/design/assistant/blog/day_7__bugfixes/profile2.png
Binary files differ