[[!comment format=mdwn username="joey" subject="""profiling""" date="2016-09-26T19:20:36Z" content=""" Built git-annex with profiling, using `stack build --profile` (For reproduciblity, running git-annex in a clone of the git-annex repo https://github.com/RichiH/conference_proceedings with rev 2797a49023fc24aff6fcaec55421572e1eddcfa2 checked out. It has 9496 annexed objects.) Profiling `git-annex find +RTS -p`: total time = 3.53 secs (3530 ticks @ 1000 us, 1 processor) total alloc = 3,772,700,720 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc spanList Data.List.Utils 32.6 37.7 startswith Data.List.Utils 14.3 8.1 md5 Data.Hash.MD5 12.4 18.2 join Data.List.Utils 6.9 13.7 catchIO Utility.Exception 5.9 6.0 catches Control.Monad.Catch 5.0 2.8 inAnnex'.checkindirect Annex.Content 4.6 1.8 readish Utility.PartialPrelude 3.0 1.4 isAnnexLink Annex.Link 2.6 4.0 split Data.List.Utils 1.5 0.8 keyPath Annex.Locations 1.2 1.7 This is interesting! Fully 40% of CPU time and allocations are in list (really String) processing, and the details of the profiling report show that `spanList` and `startsWith` and `join` are all coming from calls to `replace` in `keyFile` and `fileKey`. Both functions nest several calls to replace, so perhaps that could be unwound into a single pass and/or a ByteString used to do it more efficiently. 12% of run time is spent calculating the md5 hashes for the hash directories for .git/annex/objects. Data.Hash.MD5 is from missingh, and it is probably a quite unoptimised version. Switching to the version if cryptonite would probably speed it up a lot. """]]