add database benchmark

The benchmark shows that the database access is quite fast indeed! And, it scales linearly to the number of keys, with one exception, getAssociatedKey. Based on this benchmark, I don't think I need worry about optimising for cases where all files are locked and the database is mostly empty. In those cases, database access will be misses, and according to this benchmark, should add only 50 milliseconds to runtime. (NB: There may be some overhead to getting the database opened and locking the handle that this benchmark doesn't see.) joey@darkstar:~/src/git-annex>./git-annex benchmark setting up database with 1000 setting up database with 10000 benchmarking keys database/getAssociatedFiles from 1000 (hit) time 62.77 μs (62.70 μs .. 62.85 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 62.81 μs (62.76 μs .. 62.88 μs) std dev 201.6 ns (157.5 ns .. 259.5 ns) benchmarking keys database/getAssociatedFiles from 1000 (miss) time 50.02 μs (49.97 μs .. 50.07 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.09 μs (50.04 μs .. 50.17 μs) std dev 206.7 ns (133.8 ns .. 295.3 ns) benchmarking keys database/getAssociatedKey from 1000 (hit) time 211.2 μs (210.5 μs .. 212.3 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 211.0 μs (210.7 μs .. 212.0 μs) std dev 1.685 μs (334.4 ns .. 3.517 μs) benchmarking keys database/getAssociatedKey from 1000 (miss) time 173.5 μs (172.7 μs .. 174.2 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 173.7 μs (173.0 μs .. 175.5 μs) std dev 3.833 μs (1.858 μs .. 6.617 μs) variance introduced by outliers: 16% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (hit) time 64.01 μs (63.84 μs .. 64.18 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 64.85 μs (64.34 μs .. 66.02 μs) std dev 2.433 μs (547.6 ns .. 4.652 μs) variance introduced by outliers: 40% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (miss) time 50.33 μs (50.28 μs .. 50.39 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.32 μs (50.26 μs .. 50.38 μs) std dev 202.7 ns (167.6 ns .. 252.0 ns) benchmarking keys database/getAssociatedKey from 10000 (hit) time 1.142 ms (1.139 ms .. 1.146 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.142 ms (1.140 ms .. 1.144 ms) std dev 7.142 μs (4.994 μs .. 10.98 μs) benchmarking keys database/getAssociatedKey from 10000 (miss) time 1.094 ms (1.092 ms .. 1.096 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.095 ms (1.095 ms .. 1.097 ms) std dev 4.277 μs (2.591 μs .. 7.228 μs)
author: Joey Hess <joeyh@joeyh.name> 2016-01-12 13:01:44 -0400
committer: Joey Hess <joeyh@joeyh.name> 2016-01-12 13:07:03 -0400
commit: 17cf39db4fb4985ad1230417f537dadce8272d38 (patch)
tree: 89e5588428255072f88307fc3f324ffc84db7299 /doc
parent: fcdf8b0475b39d5132e74978479cb541c276ccfe (diff)
2 files changed, 5 insertions, 8 deletions
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index 299428d1e..329fb8932 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -672,6 +672,11 @@ subdirectories).
   
   See [[git-annex-fuzztest]](1) for details.
 
+* `benchmark`
+
+  This runs git-annex's built-in benchmarks, if it was built with
+  benchmarking support.
+
 # COMMON OPTIONS
 
 These common options are accepted by all git-annex commands, and
diff --git a/doc/todo/smudge.mdwn b/doc/todo/smudge.mdwn
index 6498863e4..5f3d521bf 100644
--- a/doc/todo/smudge.mdwn
+++ b/doc/todo/smudge.mdwn
@@ -32,14 +32,6 @@ git-annex should use smudge/clean filters.
   when pushing changes committed in such a repo. Ideally, should avoid
   committing implicit unlocks, or should prevent such commits leaking out
   in pushes.
-* Optimisation: See if the database schema can be improved to speed things
-  up. Are there enough indexes? getAssociatedKey in particular does a
-  reverse lookup and might benefit from an index.
-* Optimisation: Reads from the Keys database avoid doing anything if the
-  database doesn't exist. This makes v5 repos, or v6 with all locked files
-  faster. However, if a v6 repo unlocks and then re-locks a file, its
-  database will exist, and so this optimisation will no longer apply.
-  Could try to detect when the database is empty, and remove it or avoid reads.
 
 * Eventually (but not yet), make v6 the default for new repositories.
   Note that the assistant forces repos into direct mode; that will need to
author	Joey Hess <joeyh@joeyh.name>	2016-01-12 13:01:44 -0400
committer	Joey Hess <joeyh@joeyh.name>	2016-01-12 13:07:03 -0400
commit	17cf39db4fb4985ad1230417f537dadce8272d38 (patch)
tree	89e5588428255072f88307fc3f324ffc84db7299 /doc
parent	fcdf8b0475b39d5132e74978479cb541c276ccfe (diff)