summaryrefslogtreecommitdiff
path: root/Database
Commit message (Collapse)AuthorAge
* fix build with old version of persistentGravatar Joey Hess2017-09-25
|
* refactorGravatar Joey Hess2017-09-20
|
* fix bug that prevented db being written to disk in SingleWriter modeGravatar Joey Hess2017-09-18
| | | | | | | | | | | The bug occurred when closeDb was not called, and garbage collection of the DbHandle didn't give the workerThread time to shut down. Fixed by exiting the runSqlite action when a commit is made. (MultiWriter mode already forked off a runSqlite action, so avoided the problem.) This commit was sponsored by Brock Spratlen on Patreon.
* merge changes made on other repos into ExportTreeGravatar Joey Hess2017-09-18
| | | | | | | | | | | Now when one repository has exported a tree, another repository can get files from the export, after syncing. There's a bug: While the database update works, somehow the database on disk does not get updated, and so the database update is run the next time, etc. Wasn't able to figure out why yet. This commit was sponsored by Ole-Morten Duesund on Patreon.
* update ExportTree table efficientlyGravatar Joey Hess2017-09-18
| | | | | | | Use same diff and key lookup except when the whole tree has to be scanned. This commit was sponsored by Peter Hogg on Patreon.
* add ExportTree table to export dbGravatar Joey Hess2017-09-18
| | | | | | | | | | | | New table needed to look up what filenames are used in the currently exported tree, for reasons explained in export.mdwn. Also, added smart constructors for ExportLocation and ExportDirectory to make sure they contain filepaths with the right direction slashes. And some code refactoring. This commit was sponsored by Francois Marier on Patreon.
* split out Types.ExportGravatar Joey Hess2017-09-15
|
* remove empty directories when removing from exportGravatar Joey Hess2017-09-15
| | | | | | | | | | | | | | | The subtle part of this is what happens when the remote fails to remove an empty directory. The removal from the export needs to fail in that case, so the removal will be tried again later. However, removeExportLocation has already been run and changed the export db, so if the next run checks getExportLocation, it might decide nothing remains to be done, leaving the empty directory. Dealt with that by making removeEmptyDirectories, handle a failure by calling addExportLocation, reverting the database changes so the next run will be guaranteed to try deleting the empty directory again. This commit was sponsored by Thomas Hochstein on Patreon.
* add table to keep track of what subdirectories are populated in the exportGravatar Joey Hess2017-09-15
| | | | | | So empty subdirectories can be identified and removed. This commit was sponsored by Jochen Bartl on Patreon.
* fix consistency bug reading from export databaseGravatar Joey Hess2017-09-06
| | | | | | | | | | | | | | | | | | | The export database has writes made to it and then expects to read back the same data immediately. But, the way that Database.Handle does writes, in order to support multiple writers, makes that not work, due to caching issues. This resulted in export re-uploading files it had already successfully renamed into place. Fixed by allowing databases to be opened in MultiWriter or SingleWriter mode. The export database only needs to support a single writer; it does not make sense for multiple exports to run at the same time to the same special remote. All other databases still use MultiWriter mode. And by inspection, nothing else in git-annex seems to be relying on being able to immediately query for changes that were just written to the database. This commit was supported by the NSF-funded DataLad project.
* use export db to correctly handle duplicate filesGravatar Joey Hess2017-09-04
| | | | | | | | | | | | | | | | | Removed uncorrect UniqueKey key in db schema; a key can appear multiple times with different files. The database has to be flushed after each removal. But when adding files to the export, lots of changes are able to be queued up w/o flushing. So it's still fairly efficient. If large removals of files from exports are too slow, an alternative would be to make two passes over the diff, one pass queueing deletions from the database, then a flush and the a second pass updating the location log. But that would use more memory, and need to look up exportKey twice per removed file, so I've avoided such optimisation yet. This commit was supported by the NSF-funded DataLad project.
* flush queued changes to export db on exitGravatar Joey Hess2017-09-04
|
* track exported files in a sqlite databaseGravatar Joey Hess2017-09-04
| | | | | | | | | Went with a separate db per export remote, rather than a single export database. Mostly because there will probably not be a lot of separate export remotes, and it might be convenient to be able to delete a given remote's export database. This commit was supported by the NSF-funded DataLad project.
* factor non-type stuff out of KeyGravatar Joey Hess2017-02-24
|
* Work around sqlite's incorrect handling of umask when creating databases.Gravatar Joey Hess2017-02-13
| | | | | | | | | Refactored some common code into initDb. This only deals with the problem when creating new databases. If a repo got bad permissions into it, it's up to the user to deal with it. This commit was sponsored by Ole-Morten Duesund on Patreon.
* work around ghc segfaultGravatar Joey Hess2016-12-30
| | | | | | | | | | | | | hSetEncoding of a closed handle segfaults. https://ghc.haskell.org/trac/ghc/ticket/7161 3b9d9a267b7c9247d36d9b622e1b836724ca5fb0 introduced the crash. In particular, stdin may get closed (by eg, getContents) and then trying to set its encoding will crash. We didn't need to adjust stdin's encoding anyway, but only stderr, to work around https://github.com/yesodweb/persistent/issues/474 Thanks to Mesar Hameed for assistance related to reproducing this bug.
* Always use filesystem encoding for all file and handle reads and writes.Gravatar Joey Hess2016-12-24
| | | | | This is a big scary change. I have convinced myself it should be safe. I hope!
* Avoid backtraces on expected failures when built with ghc 8; only use ↵Gravatar Joey Hess2016-11-15
| | | | | | | | | | | | | backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin.
* remove unused importsGravatar Joey Hess2016-10-18
|
* refactorGravatar Joey Hess2016-10-17
|
* slightly more efficient checking of versionUsesKeysDatabaseGravatar Joey Hess2016-07-19
| | | | | It's a mvar lookup either way, but I think this way will be slightly more efficient. And it reduces the number of places where it's checked to 1.
* Avoid any access to keys database in v5 mode repositories, which are not ↵Gravatar Joey Hess2016-07-19
| | | | supposed to use that database.
* assistant: Fix race in v6 mode that caused downloaded file content to ↵Gravatar Joey Hess2016-05-16
| | | | | | | | | | | | sometimes not replace pointer files. The keys database handle needs to be closed after merging, because the smudge filter, in another process, updates the database. Old cached info can be read for a while from the open database handle; closing it ensures that the info written by the smudge filter is available. This is pretty horribly ad-hoc, and it's especially nasty that the transferrer closes the database every time.
* fix bug in unlocked file scanner that skipped over executable unlocked filesGravatar Joey Hess2016-04-14
|
* fix typoGravatar Joey Hess2016-02-14
|
* devblogGravatar Joey Hess2016-02-14
|
* Fix storing of filenames of v6 unlocked files when the filename is not ↵Gravatar Joey Hess2016-02-14
| | | | | | | | | | | | | | | | | | | | | | | representable in the current locale. This is a mostly backwards compatable change. I broke backwards compatability in the case where a filename starts with double-quote. That seems likely to be very rare, and v6 unlocked files are a new feature anyway, and fsck needs to fix missing associated file mappings anyway. So, I decided that is good enough. The encoding used is to just show the String when it contains a problem character. While that adds some overhead to addAssociatedFile and removeAssociatedFile, those are not called very often. This approach has minimal decode overhead, because most filenames won't be encoded that way, and it only has to look for the leading double-quote to skip the expensive read. So, getAssociatedFiles remains fast. I did consider using ByteString instead, but getting a FilePath converted with all chars intact, even surrigates, is difficult, and it looks like instance PersistField ByteString uses Text, which I don't trust for problem encoded data. It would probably be slower too, and it would make the database less easy to inspect manually.
* if keys database cannot be opened due to permissions, ignoreGravatar Joey Hess2016-02-12
| | | | | | This lets readonly repos be used. If a repo is readonly, we can ignore the keys database, because nothing that we can do will change the state of the repo anyway.
* remove 163 lines of code without changing anything except importsGravatar Joey Hess2016-01-20
|
* fix import warningsGravatar Joey Hess2016-01-14
|
* another fix for old ghcGravatar Joey Hess2016-01-13
|
* change keys database to use IKey type with more efficient serializationGravatar Joey Hess2016-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This breaks any existing keys database! IKey serializes more efficiently than SKey, although this limits the use of its Read/Show instances. This makes the keys database use less disk space, and so should be a win. Updated benchmark: benchmarking keys database/getAssociatedFiles from 1000 (hit) time 64.04 μs (63.95 μs .. 64.13 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 64.02 μs (63.96 μs .. 64.08 μs) std dev 218.2 ns (172.5 ns .. 299.3 ns) benchmarking keys database/getAssociatedFiles from 1000 (miss) time 52.53 μs (52.18 μs .. 53.21 μs) 0.999 R² (0.998 R² .. 1.000 R²) mean 52.31 μs (52.18 μs .. 52.91 μs) std dev 734.6 ns (206.2 ns .. 1.623 μs) benchmarking keys database/getAssociatedKey from 1000 (hit) time 64.60 μs (64.46 μs .. 64.77 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 64.74 μs (64.57 μs .. 65.20 μs) std dev 900.2 ns (389.7 ns .. 1.733 μs) benchmarking keys database/getAssociatedKey from 1000 (miss) time 52.46 μs (52.29 μs .. 52.68 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 52.63 μs (52.35 μs .. 53.37 μs) std dev 1.362 μs (562.7 ns .. 2.608 μs) variance introduced by outliers: 24% (moderately inflated) benchmarking keys database/addAssociatedFile to 1000 (old) time 487.3 μs (484.7 μs .. 490.1 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 490.9 μs (487.8 μs .. 496.5 μs) std dev 13.95 μs (6.841 μs .. 22.03 μs) variance introduced by outliers: 20% (moderately inflated) benchmarking keys database/addAssociatedFile to 1000 (new) time 6.633 ms (5.741 ms .. 7.751 ms) 0.905 R² (0.850 R² .. 0.965 R²) mean 8.252 ms (7.803 ms .. 8.602 ms) std dev 1.126 ms (900.3 μs .. 1.430 ms) variance introduced by outliers: 72% (severely inflated) benchmarking keys database/getAssociatedFiles from 10000 (hit) time 65.36 μs (64.71 μs .. 66.37 μs) 0.998 R² (0.995 R² .. 1.000 R²) mean 65.28 μs (64.72 μs .. 66.45 μs) std dev 2.576 μs (920.8 ns .. 4.122 μs) variance introduced by outliers: 42% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (miss) time 52.34 μs (52.25 μs .. 52.45 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 52.49 μs (52.42 μs .. 52.59 μs) std dev 255.4 ns (205.8 ns .. 312.9 ns) benchmarking keys database/getAssociatedKey from 10000 (hit) time 64.76 μs (64.67 μs .. 64.84 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 64.67 μs (64.62 μs .. 64.72 μs) std dev 177.3 ns (148.1 ns .. 217.1 ns) benchmarking keys database/getAssociatedKey from 10000 (miss) time 52.75 μs (52.66 μs .. 52.82 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 52.69 μs (52.63 μs .. 52.75 μs) std dev 210.6 ns (173.7 ns .. 265.9 ns) benchmarking keys database/addAssociatedFile to 10000 (old) time 489.7 μs (488.7 μs .. 490.7 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 490.4 μs (489.6 μs .. 492.2 μs) std dev 3.990 μs (2.435 μs .. 7.604 μs) benchmarking keys database/addAssociatedFile to 10000 (new) time 9.994 ms (9.186 ms .. 10.74 ms) 0.959 R² (0.928 R² .. 0.979 R²) mean 9.906 ms (9.343 ms .. 10.40 ms) std dev 1.384 ms (1.051 ms .. 2.100 ms) variance introduced by outliers: 69% (severely inflated)
* cleanupGravatar Joey Hess2016-01-12
|
* add FileKeyIndex to Keys db to optimize getAssociatedKeyGravatar Joey Hess2016-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a schema change so will break any existing keys databases. But, it's not been released yet, so I'm still able to make such changes. This speeds up the benchmark quite nicely: benchmarking keys database/getAssociatedKey from 1000 (hit) time 91.65 μs (91.48 μs .. 91.81 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 91.78 μs (91.66 μs .. 91.94 μs) std dev 468.3 ns (353.1 ns .. 624.3 ns) benchmarking keys database/getAssociatedKey from 1000 (miss) time 53.33 μs (53.23 μs .. 53.40 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 53.43 μs (53.36 μs .. 53.53 μs) std dev 274.2 ns (211.7 ns .. 361.5 ns) benchmarking keys database/getAssociatedKey from 10000 (hit) time 92.99 μs (92.74 μs .. 93.27 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 92.90 μs (92.76 μs .. 93.16 μs) std dev 608.7 ns (404.1 ns .. 963.5 ns) benchmarking keys database/getAssociatedKey from 10000 (miss) time 53.12 μs (52.91 μs .. 53.39 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 52.84 μs (52.68 μs .. 53.16 μs) std dev 715.4 ns (400.4 ns .. 1.370 μs)
* add database benchmarkGravatar Joey Hess2016-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The benchmark shows that the database access is quite fast indeed! And, it scales linearly to the number of keys, with one exception, getAssociatedKey. Based on this benchmark, I don't think I need worry about optimising for cases where all files are locked and the database is mostly empty. In those cases, database access will be misses, and according to this benchmark, should add only 50 milliseconds to runtime. (NB: There may be some overhead to getting the database opened and locking the handle that this benchmark doesn't see.) joey@darkstar:~/src/git-annex>./git-annex benchmark setting up database with 1000 setting up database with 10000 benchmarking keys database/getAssociatedFiles from 1000 (hit) time 62.77 μs (62.70 μs .. 62.85 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 62.81 μs (62.76 μs .. 62.88 μs) std dev 201.6 ns (157.5 ns .. 259.5 ns) benchmarking keys database/getAssociatedFiles from 1000 (miss) time 50.02 μs (49.97 μs .. 50.07 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.09 μs (50.04 μs .. 50.17 μs) std dev 206.7 ns (133.8 ns .. 295.3 ns) benchmarking keys database/getAssociatedKey from 1000 (hit) time 211.2 μs (210.5 μs .. 212.3 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 211.0 μs (210.7 μs .. 212.0 μs) std dev 1.685 μs (334.4 ns .. 3.517 μs) benchmarking keys database/getAssociatedKey from 1000 (miss) time 173.5 μs (172.7 μs .. 174.2 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 173.7 μs (173.0 μs .. 175.5 μs) std dev 3.833 μs (1.858 μs .. 6.617 μs) variance introduced by outliers: 16% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (hit) time 64.01 μs (63.84 μs .. 64.18 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 64.85 μs (64.34 μs .. 66.02 μs) std dev 2.433 μs (547.6 ns .. 4.652 μs) variance introduced by outliers: 40% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (miss) time 50.33 μs (50.28 μs .. 50.39 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.32 μs (50.26 μs .. 50.38 μs) std dev 202.7 ns (167.6 ns .. 252.0 ns) benchmarking keys database/getAssociatedKey from 10000 (hit) time 1.142 ms (1.139 ms .. 1.146 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.142 ms (1.140 ms .. 1.144 ms) std dev 7.142 μs (4.994 μs .. 10.98 μs) benchmarking keys database/getAssociatedKey from 10000 (miss) time 1.094 ms (1.092 ms .. 1.096 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.095 ms (1.095 ms .. 1.097 ms) std dev 4.277 μs (2.591 μs .. 7.228 μs)
* split out raw sql interfaceGravatar Joey Hess2016-01-11
|
* fix inverted logic in old associated files cleanupGravatar Joey Hess2016-01-07
|
* clarify absPathFromGravatar Joey Hess2016-01-05
| | | | | | | | | | | | The repo path is typically relative, not absolute, so providing it to absPathFrom doesn't yield an absolute path. This is not a bug, just unclear documentation. Indeed, there seem to be no reason to simplifyPath here, which absPathFrom does, so instead just combine the repo path and the TopFilePath. Also, removed an export of the TopFilePath constructor; asTopFilePath is provided to construct one as-is.
* use TopFilePath for associated filesGravatar Joey Hess2016-01-05
| | | | | | | | | | | | | | | Fixes several bugs with updates of pointer files. When eg, running git annex drop --from localremote it was updating the pointer file in the local repository, not the remote. Also, fixes drop ../foo when run in a subdir, and probably lots of other problems. Test suite drops from ~30 to 11 failures now. TopFilePath is used to force thinking about what the filepath is relative to. The data stored in the sqlite db is still just a plain string, and TopFilePath is a newtype, so there's no overhead involved in using it in DataBase.Keys.
* improve data typeGravatar Joey Hess2016-01-01
|
* wait for git lstree to exitGravatar Joey Hess2016-01-01
|
* only do scan when there's a branch, not in freshly created new repoGravatar Joey Hess2016-01-01
|
* scan for unlocked files on init/upgrade of v6 repoGravatar Joey Hess2016-01-01
|
* fix build with pre-AMP ghcGravatar Joey Hess2015-12-28
|
* fix build with pre-AMP GHCGravatar Joey Hess2015-12-28
|
* unused importGravatar Joey Hess2015-12-24
|
* typoGravatar Joey Hess2015-12-24
|
* optimise read and write for Keys database (untested)Gravatar Joey Hess2015-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Writes are optimised by queueing up multiple writes when possible. The queue is flushed after the Annex monad action finishes. That makes it happen on program termination, and also whenever a nested Annex monad action finishes. Reads are optimised by checking once (per AnnexState) if the database exists. If the database doesn't exist yet, all reads return mempty. Reads also cause queued writes to be flushed, so reads will always be consistent with writes (as long as they're made inside the same Annex monad). A future optimisation path would be to determine when that's not necessary, which is probably most of the time, and avoid flushing unncessarily. Design notes for this commit: - separate reads from writes - reuse a handle which is left open until program exit or until the MVar goes out of scope (and autoclosed then) - writes are queued - queue is flushed periodically - immediate queue flush before any read - auto-flush queue when database handle is garbage collected - flush queue on exit from Annex monad (Note that this may happen repeatedly for a single database connection; or a connection may be reused for multiple Annex monad actions, possibly even concurrent ones.) - if database does not exist (or is empty) the handle is not opened by reads; reads instead return empty results - writes open the handle if it was not open previously
* allow flushDbQueue to be run repeatedlyGravatar Joey Hess2015-12-23
|
* auto-close database connections when MVar is GCedGravatar Joey Hess2015-12-23
|