aboutsummaryrefslogtreecommitdiff
path: root/doc/benchmarking/comment_10_1af4ac0d37c876912678522895c1656b._comment
blob: 868b103646e752686efe323b51f21ba81e93b495 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
[[!comment format=mdwn
 username="joey"
 subject="""comment 10"""
 date="2016-09-29T18:33:33Z"
 content="""
* Optimised key2file and file2key. 18% scanning time speedup.
* Optimised adjustGitEnv. 50% git-annex branch query speedup
* Optimised parsePOSIXTime. 10% git-annex branch query speedup
* Tried making catObjectDetails.receive use ByteString for parsing, 
  but that did not seem to speed it up significantly.
  So it parsing is already fairly optimal, it's just that a
  lot of data passes through it when querying the git-annex
  branch.

After all that, profiling `git-annex find`:

	        Thu Sep 29 16:51 2016 Time and Allocation Profiling Report  (Final)
	
	           git-annex.1 +RTS -p -RTS find
	
	        total time  =        1.73 secs   (1730 ticks @ 1000 us, 1 processor)
	        total alloc = 1,812,406,632 bytes  (excludes profiling overheads)
	
	COST CENTRE            MODULE                  %time %alloc
	
	md5                    Data.Hash.MD5            28.0   37.9
	catchIO                Utility.Exception        10.2   12.5
	inAnnex'.checkindirect Annex.Content             9.9    3.7
	catches                Control.Monad.Catch       8.7    5.7
	readish                Utility.PartialPrelude    5.7    3.0
	isAnnexLink            Annex.Link                5.0    8.4
	keyFile                Annex.Locations           4.2    5.8
	spanList               Data.List.Utils           4.0    6.3
	startswith             Data.List.Utils           2.0    1.3

And `git-annex find --not --in web`:

	        Thu Sep 29 16:35 2016 Time and Allocation Profiling Report  (Final)
	
	           git-annex +RTS -p -RTS find --not --in web
	
	        total time  =        5.24 secs   (5238 ticks @ 1000 us, 1 processor)
	        total alloc = 3,293,314,472 bytes  (excludes profiling overheads)
	
	COST CENTRE               MODULE                      %time %alloc
	
	catObjectDetails.receive  Git.CatFile                  12.9    5.5
	md5                       Data.Hash.MD5                10.6   20.8
	readish                   Utility.PartialPrelude        7.3    8.2
	catchIO                   Utility.Exception             6.7    7.3
	spanList                  Data.List.Utils               4.1    7.4
	readFileStrictAnyEncoding Utility.Misc                  3.5    1.3
	catches                   Control.Monad.Catch           3.3    3.2

So, quite a large speedup overall!

This leaves md5 still unoptimised at 10-28% of CPU use. I looked at switching
it to cryptohash's implementation, but it would require quite a lot of
bit-banging math to pull the used values out of the ByteString containing
the md5sum.
"""]]