3 files changed, 95 insertions, 15 deletions
diff --git a/doc/bugs/incremental_fsck_should_not_use_sticky_bit/comment_8_9f5acbc79b631a93d7cdf4ae37c07cab._comment b/doc/bugs/incremental_fsck_should_not_use_sticky_bit/comment_8_9f5acbc79b631a93d7cdf4ae37c07cab._comment
new file mode 100644
index 000000000..a63381ed0
--- /dev/null
+++ b/doc/bugs/incremental_fsck_should_not_use_sticky_bit/comment_8_9f5acbc79b631a93d7cdf4ae37c07cab._comment
@@ -0,0 +1,19 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""vache"""
+ date="2015-02-15T16:35:12Z"
+ content="""
+<https://github.com/dmbarbour/haskell-vcache>
+
+It uses LMDB, which is a C library, and its PVar is a variable named by a
+bytestring, so it's essentially a key/value store where the values can be
+arbitrary Haskell data types. Since git-annex already has Keys, and most
+of the need for the database is to look up some cached value for a Key,
+this seems like a pretty good fit!
+
+Unfortunately, "A VCache file may be opened by only one process at a time,
+and only once within said process."
+
+But, git-annex needs multiple read-only access, as many git-annex processes
+can run concurrently.
+"""]]
diff --git a/doc/design/caching_database.mdwn b/doc/design/caching_database.mdwn
index a6face00f..8678f8c3d 100644
--- a/doc/design/caching_database.mdwn
+++ b/doc/design/caching_database.mdwn
@@ -25,11 +25,11 @@ Store in the database the Ref of the branch that was used to construct it.
 
 ## implementation plan
 
-1. Implement for metadata, on a branch, with sqlite.
+1. Store incremental fsck info in db, on a branch, with sqlite.
 2. Make sure that builds on all platforms.
-3. Add associated file mappings support. This is needed to fully
+3. Implement for metadata, on a branch, with sqlite.
+4. Add associated file mappings support. This is needed to fully
    use the caching database to construct views.
-4. Store incremental fsck info in db.
 5. Replace .map files with 3. for direct mode.
 
 ## sqlite or not?
@@ -39,19 +39,20 @@ SQL. And even if that's hidden by a layer like persistent, it's still going
 to involve some technical debt (eg, database migrations).
 
 It would be great if there were some haskell thing like acid-state
-that I could use instead. But, acid-sate needs to load the whole
+that I could use instead. But, acid-state needs to load the whole
 DB into memory. In the comments of
 [[bugs/incremental_fsck_should_not_use_sticky_bit]] I examined several
 other haskell database-like things, and found them all wanting, except for
-possibly TCache.
+possibly TCache. (And TCache is backed by persistent/sqlite anyway.)
 
-TODO: This seems promising; investigate it:
-<https://awelonblue.wordpress.com/2014/12/19/vcache-an-acid-state-killer/>  
-It uses LMDB, which is a C library, and its PVar is a variable named by a
-bytestring, so it's essentially a key/value store where the values can be
-arbitrary Haskell data types. Since git-annex already has Keys, and most
-of the need for the database is to look up some cached value for a Key,
-this seems like a pretty good fit!
+## one db or multiple?
+
+Using a single database will use less space. Eg, each Key will only need to
+appear in it once, with proper normalization.
+
+OTOH, it's more complicated, and harder to recover from problems.
+
+Currently leaning toward one database per purpose.
 
 ## case study: persistent with sqllite
 
@@ -128,15 +129,19 @@ With this, 1000 queries takes 0.406s. Note that persistent is probably not
 actually doing a join at the SQL level, so this could be sped up using
 eg, esquelito.
 
-Update2: Using esquelito to do a join got this down to 0.250s.
+Update2: Using esquelito to do a join got this down to 0.109s.
+See `database` branch for code.
 
-Code: <http://lpaste.net/101141> <http://lpaste.net/101142>
+Update3: Converting to a single un-normalized table for AssociatedFiles
+avoids the join, and increased lookup speed to 0.087s. Of course, when
+a key has multiple associated files, this will use more disk space, due
+to not normalizing the key.
 
 Compare the above with 1000 calls to `associatedFiles`, which is approximately
 as fast as just opening and reading 1000 files, so will take well under
 0.05s with a **cold** cache.
 
-So, we're looking at nearly an order of magnitude slowdown using sqlite and
+So, we're looking at maybe 50% slowdown using sqlite and
 persistent for associated files. OTOH, the normalized schema should
 perform better when adding an associated file to a key that already has many.
 
diff --git a/doc/devblog/day_253__sqlite_for_incremental_fsck.mdwn b/doc/devblog/day_253__sqlite_for_incremental_fsck.mdwn
new file mode 100644
index 000000000..e29f2ebb1
--- /dev/null
+++ b/doc/devblog/day_253__sqlite_for_incremental_fsck.mdwn
@@ -0,0 +1,56 @@
+Yesterday I did a little more investigation of key/value stores.
+I'd love a pure haskell key/value store that didn't buffer everything in
+memory, and that allowed concurrent readers, and was ACID, and production
+quality. But so far, I have not found anything that meets all those
+criteria. It seems that sqlite is the best choice for now.
+
+Started working on the `database` branch today. The plan is to use
+sqlite for incremental fsck first, and if that works well, do the rest
+of what's planned in [[design/caching_database]].
+
+At least for now, I'm going to use a dedicated database file for each
+different thing. (This may not be as space-efficient due to lacking
+normalization, but it keeps things simple.) 
+
+So, .git/annex/fsck.db will be used by incremental fsck, and it has
+a super simple Persistent database schema:
+
+[[!format haskell """
+Fscked
+  key SKey
+  UniqueKey key
+"""]]
+
+It was pretty easy to implement this and make incremental fsck use it. The
+hard part is making it both fast and robust.
+
+At first, I was doing everything inside a single `runSqlite` action.
+Including creating the table. But, it turns out that runs as a single
+transaction, and if it was interrupted, this left the database in a
+state where it exists, but has no tables. Hard to recover from.
+
+So, I separated out creating the database, made that be done in a separate
+transation and fully atomically. Now `fsck --incremental` could be crtl-c'd
+and resumed with `fsck --more`, but it would lose the transaction and so
+not remember anything had been checked.
+
+To fix that, I tried making a separate transation per file fscked. That
+worked, and it resumes nicely where it left off, but all those transactions
+made it much slower.
+
+To fix the speed, I made it commit just one transaction per minute. This
+seems like an ok balance. Having fsck re-do one minute's work when restarting
+an interrupted incremental fsck is perfectly reasonable, and now the speed,
+using the sqlite database, is nearly as fast as the old sticky bit hack was.
+(Specifically, 6m7s old vs 6m27s new, fscking 37000 files from cold cache
+in --fast mode.)
+
+There is still a problem with multiple concurrent `fsck --more`
+failing. Probably a concurrent writer problem? And, some porting will be
+required to get sqlite and persistent working on Windows and Android.
+So the branch isn't ready to merge yet, but it seems promising.
+
+In retrospect, while incremental fsck has the simplest database schema, it
+might be one of the harder things listed in [[design/caching_database]], 
+just because it involves so many writes to the database. The other use
+cases are more read heavy.