git-annex-gpl - git-annex without the AGPL

	Commit message (Collapse)	Author	Age
*	fix bug that prevented db being written to disk in SingleWriter mode	Joey Hess	2017-09-18
\| \| \| \| \| \| \| \| \| \| \|	The bug occurred when closeDb was not called, and garbage collection of the DbHandle didn't give the workerThread time to shut down. Fixed by exiting the runSqlite action when a commit is made. (MultiWriter mode already forked off a runSqlite action, so avoided the problem.) This commit was sponsored by Brock Spratlen on Patreon.
*	fix consistency bug reading from export database	Joey Hess	2017-09-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The export database has writes made to it and then expects to read back the same data immediately. But, the way that Database.Handle does writes, in order to support multiple writers, makes that not work, due to caching issues. This resulted in export re-uploading files it had already successfully renamed into place. Fixed by allowing databases to be opened in MultiWriter or SingleWriter mode. The export database only needs to support a single writer; it does not make sense for multiple exports to run at the same time to the same special remote. All other databases still use MultiWriter mode. And by inspection, nothing else in git-annex seems to be relying on being able to immediately query for changes that were just written to the database. This commit was supported by the NSF-funded DataLad project.
*	Work around sqlite's incorrect handling of umask when creating databases.	Joey Hess	2017-02-13
\| \| \| \| \| \| \| \| \|	Refactored some common code into initDb. This only deals with the problem when creating new databases. If a repo got bad permissions into it, it's up to the user to deal with it. This commit was sponsored by Ole-Morten Duesund on Patreon.
*	work around ghc segfault	Joey Hess	2016-12-30
\| \| \| \| \| \| \| \| \| \| \| \| \|	hSetEncoding of a closed handle segfaults. https://ghc.haskell.org/trac/ghc/ticket/7161 3b9d9a267b7c9247d36d9b622e1b836724ca5fb0 introduced the crash. In particular, stdin may get closed (by eg, getContents) and then trying to set its encoding will crash. We didn't need to adjust stdin's encoding anyway, but only stderr, to work around https://github.com/yesodweb/persistent/issues/474 Thanks to Mesar Hameed for assistance related to reproducing this bug.
*	Always use filesystem encoding for all file and handle reads and writes.	Joey Hess	2016-12-24
\| \| \| \| \|	This is a big scary change. I have convinced myself it should be safe. I hope!
*	optimise read and write for Keys database (untested)	Joey Hess	2015-12-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Writes are optimised by queueing up multiple writes when possible. The queue is flushed after the Annex monad action finishes. That makes it happen on program termination, and also whenever a nested Annex monad action finishes. Reads are optimised by checking once (per AnnexState) if the database exists. If the database doesn't exist yet, all reads return mempty. Reads also cause queued writes to be flushed, so reads will always be consistent with writes (as long as they're made inside the same Annex monad). A future optimisation path would be to determine when that's not necessary, which is probably most of the time, and avoid flushing unncessarily. Design notes for this commit: - separate reads from writes - reuse a handle which is left open until program exit or until the MVar goes out of scope (and autoclosed then) - writes are queued - queue is flushed periodically - immediate queue flush before any read - auto-flush queue when database handle is garbage collected - flush queue on exit from Annex monad (Note that this may happen repeatedly for a single database connection; or a connection may be reused for multiple Annex monad actions, possibly even concurrent ones.) - if database does not exist (or is empty) the handle is not opened by reads; reads instead return empty results - writes open the handle if it was not open previously
*	auto-close database connections when MVar is GCed	Joey Hess	2015-12-23
\|
*	split out Database.Queue from Database.Handle	Joey Hess	2015-12-23
\| \| \| \| \| \|	Fsck can use the queue for efficiency since it is write-heavy, and only reads a value before writing it. But, the queue is not suited to the Keys database.
*	reorder database shutdown to be concurrency safe	Joey Hess	2015-12-16
\| \| \| \| \| \| \| \| \| \| \| \|	If a DbHandle is in use by another thread, it could be queueing changes while shutdown is running. So, wait for the worker to finish before flushing the queue, so that any last-minute writes are included. Before this fix, they would be silently dropped. Of course, if the other thread continues to try to use a DbHandle once it's closed, it will block forever as the worker is no longer reading from the jobs MVar. So, that would crash with "thread blocked indefinitely in an MVar operation".
*	stash DbHandle in Annex state	Joey Hess	2015-12-09
\|
*	avoid ugly error about MVar if the sqlite worker thread crashes	Joey Hess	2015-10-12
\|
*	fsck: Commit incremental fsck database after every 1000 files fscked, or ↵	Joey Hess	2015-07-31
\| \| \| \| \| \| \| \|	every 5 minutes, whichever comes first. Previously, commits were made every 1000 files fscked. Also, improve docs
*	avoid closing db handle when reconnecting to do a write	Joey Hess	2015-02-22
\|
*	complete work around for sqlite SELECT ErrorBusy on new connection bug	Joey Hess	2015-02-22
\|
*	WIP	Joey Hess	2015-02-18
\|
*	deal with rare SELECT ErrorBusy failures	Joey Hess	2015-02-18
\| \| \| \|	I think they might be a sqlite bug. In discussions with sqlite devs.
*	use WAL mode to ensure read from db always works, even when it's being ↵	Joey Hess	2015-02-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	written to Also, moved the database to a subdir, as there are multiple files. This seems to work well with concurrent fscks, although they still do redundant work due to the commit granularity. Occasionally two writes will conflict, and one is then deferred and happens later. Except, with 3 concurrent fscks, I got failures: git-annex: user error (SQLite3 returned ErrorBusy while attempting to perform prepare "SELECT \"fscked\".\"key\"\nFROM \"fscked\"\nWHERE \"fscked\".\"key\" = ?\n": database is locked) Argh!!!
*	more robust handling of deferred commits	Joey Hess	2015-02-18
\| \| \| \| \| \| \| \| \| \| \| \|	Still not robust enough. I have 3 fscks running concurrently, and am seeing: ("commit deferred",user error (SQLite3 returned ErrorBusy while attempting to perform step.)) and git-annex: user error (SQLite3 returned ErrorBusy while attempting to perform prepare "SELECT \"fscked\".\"key\"\nFROM \"fscked\"\nWHERE \"fscked\".\"key\" = ?\n": database is locked)
*	allow for concurrent incremental fsck processes again (sorta)	Joey Hess	2015-02-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sqlite doesn't support multiple concurrent writers at all. One of them will fail to write. It's not even possible to have two processes building up separate transactions at the same time. Before using sqlite, incremental fsck could work perfectly well with multiple fsck processes running concurrently. I'd like to keep that working. My partial solution, so far, is to make git-annex buffer writes, and every so often send them all to sqlite at once, in a transaction. So most of the time, nothing is writing to the database. (And if it gets unlucky and a write fails due to a collision with another writer, it can just wait and retry the write later.) This lets multiple processes write to the database successfully. But, for the purposes of concurrent, incremental fsck, it's not ideal. Each process doesn't immediately learn of files that another process has checked. So they'll tend to do redundant work. Only way I can see to improve this is to use some other mechanism for short-term IPC between the fsck processes. Not yet done. ---- Also, make addDb check if an item is in the database already, and not try to re-add it. That fixes an intermittent crash with "SQLite3 returned ErrorConstraint while attempting to perform step." I am not 100% sure why; it only started happening when I moved write buffering into the queue. It seemed to generally happen on the same file each time, so could just be due to multiple files having the same key. However, I doubt my sound repo has many duplicate keys, and I suspect something else is going on. ---- Updated benchmark, with the 1000 item queue: 6m33.808s
*	show error when sqlite crashes worker thread	Joey Hess	2015-02-17
\| \| \| \|	Better than "blocked indefinitely in MVar"..
*	avoid fromIntegral overhead	Joey Hess	2015-02-16
\|
*	commit new transaction after 60 seconds	Joey Hess	2015-02-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Database.Handle can now be given a CommitPolicy, making it easy to specify transaction granularity. Benchmarking the old git-annex incremental fsck that flips sticky bits to the new that uses sqlite, running in a repo with 37000 annexed files, both from cold cache: old: 6m6.906s new: 6m26.913s This commit was sponsored by TasLUG.
*	commit more transactions when fscking	Joey Hess	2015-02-16
\| \| \| \| \| \|	This makes interrupt and resume work, robustly. But, incremental fsck is slowed down by all those transactions..
*	convert incremental fsck to using sqlite database	Joey Hess	2015-02-16
	Did not keep backwards compat for sticky bit records. An incremental fsck that is already in progress will start over on upgrade to this version. This is not yet ready for merging. The autobuilders need to have sqlite installed. Also, interrupting a fsck --incremental does not commit the database. So, resuming with fsck --more restarts from beginning. Memory: Constant during a fsck of tens of thousands of files. (But, it does seem to buffer whole transation in memory, so may really scale with number of files.) CPU: ?