summaryrefslogtreecommitdiff
path: root/doc/todo
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2016-09-26 16:39:18 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2016-09-26 16:39:18 -0400
commitf468c42146ff164936b8138271793323e893acf8 (patch)
treeb1c2747352f36f6dd72db0667ca8719f2c0eef3b /doc/todo
parent17041a4e16931991c1794048d2574310df4af235 (diff)
more profiling
Diffstat (limited to 'doc/todo')
-rw-r--r--doc/todo/make_copy_--fast__faster/comment_9_f4d802a28b79905da0cb24af6cb65b0a._comment42
1 files changed, 42 insertions, 0 deletions
diff --git a/doc/todo/make_copy_--fast__faster/comment_9_f4d802a28b79905da0cb24af6cb65b0a._comment b/doc/todo/make_copy_--fast__faster/comment_9_f4d802a28b79905da0cb24af6cb65b0a._comment
new file mode 100644
index 000000000..9692ad2d7
--- /dev/null
+++ b/doc/todo/make_copy_--fast__faster/comment_9_f4d802a28b79905da0cb24af6cb65b0a._comment
@@ -0,0 +1,42 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""more profiling"""
+ date="2016-09-26T19:59:43Z"
+ content="""
+Instead of profiling `git annex copy --to remote`, I profiled `git annex
+find --not --in web`, which needs to do the same kind of location log lookup.
+
+ total time = 12.41 secs (12413 ticks @ 1000 us, 1 processor)
+ total alloc = 8,645,057,104 bytes (excludes profiling overheads)
+
+ COST CENTRE MODULE %time %alloc
+
+ adjustGitEnv Git.Env 21.4 37.0
+ catchIO Utility.Exception 13.2 2.8
+ spanList Data.List.Utils 12.6 17.9
+ parsePOSIXTime Logs.TimeStamp 6.1 5.0
+ catObjectDetails.receive Git.CatFile 5.9 2.1
+ startswith Data.List.Utils 5.7 3.8
+ md5 Data.Hash.MD5 5.1 7.9
+ join Data.List.Utils 2.4 6.0
+ readFileStrictAnyEncoding Utility.Misc 2.2 0.5
+
+The adjustGitEnv overhead is a surprise! It seems it is getting called once
+per file, and allocating a new copy of the environment each time. Call stack:
+withIndex calls withIndexFile calls addGitEnv calls adjustGitEnv.
+Looks like simply making gitEnv be cached at startup would avoid most of
+the adjustGitEnv slowdown.
+
+(The catchIO overhead is a false reading; the detailed profile shows
+that all its time and allocations are inherited. getAnnexLinkTarget
+is running catchIO in the expensive case, so readSymbolicLink is
+the actual expensive bit.)
+
+The parsePOSIXTime comes from reading location logs. It's implemented
+using a generic Data.Time.Format.parseTime, which uses a format string
+"%s%Qs". A custom parser that splits into seconds and picoseconds
+and simply reads both numbers might be more efficient.
+
+catObjectDetails.receive is implemented using mostly String and could
+probably be sped up by being converted to use ByteString.
+"""]]