aboutsummaryrefslogtreecommitdiff
path: root/doc/todo/smudge.mdwn
Commit message (Collapse)AuthorAge
* fix now-dead gmane linksGravatar Joey Hess2017-12-26
| | | | | | | | gmane's disk crashed, I found one thread in another archive, but could not find my whole patch set in any archive (perhaps some of the messages were too long), so pulled it out of my personal mail archives. This commit was supported by the NSF-funded DataLad project.
* close as dupGravatar Joey Hess2017-06-09
|
* todoGravatar Joey Hess2016-12-12
|
* updateGravatar Joey Hess2016-07-12
|
* link to patchGravatar Joey Hess2016-06-16
|
* found a bad memory use in gitGravatar Joey Hess2016-05-12
|
* link to my post "proposal for extending smudge/clean filters with raw file ↵Gravatar Joey Hess2016-05-12
| | | | access"
* updateGravatar Joey Hess2016-04-13
|
* updateGravatar Joey Hess2016-04-12
|
* commentGravatar Joey Hess2016-04-09
|
* updateGravatar Joey Hess2016-04-08
|
* init: Automatically enter the adjusted unlocked branch when in a v6 repo on ↵Gravatar Joey Hess2016-03-29
| | | | a filesystem not supporting symlinks.
* commentGravatar Joey Hess2016-02-12
|
* pointerGravatar Joey Hess2016-02-09
|
* small formatting fix.Gravatar fiatjaf2016-01-19
|
* avoid hard linking object from other repository when annex.thin is setGravatar Joey Hess2016-01-13
| | | | | This is simpler and less expensive than checking if the src file has a link count >= 2, and also is unlocked.
* add database benchmarkGravatar Joey Hess2016-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The benchmark shows that the database access is quite fast indeed! And, it scales linearly to the number of keys, with one exception, getAssociatedKey. Based on this benchmark, I don't think I need worry about optimising for cases where all files are locked and the database is mostly empty. In those cases, database access will be misses, and according to this benchmark, should add only 50 milliseconds to runtime. (NB: There may be some overhead to getting the database opened and locking the handle that this benchmark doesn't see.) joey@darkstar:~/src/git-annex>./git-annex benchmark setting up database with 1000 setting up database with 10000 benchmarking keys database/getAssociatedFiles from 1000 (hit) time 62.77 μs (62.70 μs .. 62.85 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 62.81 μs (62.76 μs .. 62.88 μs) std dev 201.6 ns (157.5 ns .. 259.5 ns) benchmarking keys database/getAssociatedFiles from 1000 (miss) time 50.02 μs (49.97 μs .. 50.07 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.09 μs (50.04 μs .. 50.17 μs) std dev 206.7 ns (133.8 ns .. 295.3 ns) benchmarking keys database/getAssociatedKey from 1000 (hit) time 211.2 μs (210.5 μs .. 212.3 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 211.0 μs (210.7 μs .. 212.0 μs) std dev 1.685 μs (334.4 ns .. 3.517 μs) benchmarking keys database/getAssociatedKey from 1000 (miss) time 173.5 μs (172.7 μs .. 174.2 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 173.7 μs (173.0 μs .. 175.5 μs) std dev 3.833 μs (1.858 μs .. 6.617 μs) variance introduced by outliers: 16% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (hit) time 64.01 μs (63.84 μs .. 64.18 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 64.85 μs (64.34 μs .. 66.02 μs) std dev 2.433 μs (547.6 ns .. 4.652 μs) variance introduced by outliers: 40% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (miss) time 50.33 μs (50.28 μs .. 50.39 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.32 μs (50.26 μs .. 50.38 μs) std dev 202.7 ns (167.6 ns .. 252.0 ns) benchmarking keys database/getAssociatedKey from 10000 (hit) time 1.142 ms (1.139 ms .. 1.146 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.142 ms (1.140 ms .. 1.144 ms) std dev 7.142 μs (4.994 μs .. 10.98 μs) benchmarking keys database/getAssociatedKey from 10000 (miss) time 1.094 ms (1.092 ms .. 1.096 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.095 ms (1.095 ms .. 1.097 ms) std dev 4.277 μs (2.591 μs .. 7.228 μs)
* updateGravatar Joey Hess2016-01-08
|
* devblogGravatar Joey Hess2016-01-07
|
* updateGravatar Joey Hess2016-01-07
|
* test: Added --keep-failures option.Gravatar Joey Hess2016-01-06
|
* updateGravatar Joey Hess2016-01-05
|
* scan for unlocked files on init/upgrade of v6 repoGravatar Joey Hess2016-01-01
|
* updateGravatar Joey Hess2016-01-01
|
* add test: conflict resolution (mixed locked and unlocked file)Gravatar Joey Hess2015-12-30
|
* test suite 100% pass in v6, finally!Gravatar Joey Hess2015-12-30
| | | | | Set annex.largefiles when adding the conflicting non-annexed file, otherwise it would be added as an annexed file.
* updateGravatar Joey Hess2015-12-29
|
* automatic conflict resolution for v6 unlocked filesGravatar Joey Hess2015-12-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several tricky parts: * When the conflict is just between the same key being locked and unlocked, the unlocked version wins, and the file is not renamed in this case. * Need to update associated file map when conflict resolution renames an unlocked file. * git merge runs the smudge filter on the conflicting file, and actually overwrites the file with the same content it had before, and so invalidates its inode cache. This makes it difficult to know when it's safe to remove such files as conflict cruft, without going so far as to compare their entire contents. Dealt with this by preventing the smudge filter from populating the file when a merge is run. However, that also prevents the smudge filter being run for non-conflicting files, so eg moving a file won't put its new content into place. * Ideally, if a merge or a merge conflict resolution renames an unlocked file, the file in the work tree can just be moved, rather than copying the content to a new worktree file. This is attempted to be done in merge conflict resolution, but due to git merge's behavior of running smudge filters, what actually seems to happen is the old worktree file with the content is deleted and rewritten as a pointer file, so doesn't get reused. So, this is probably not as efficient as it optimally could be. If that becomes a problem, could look into running the merge in a separate worktree and updating the real worktree more efficiently, similarly to the direct mode merge. However, the direct mode merge had a lot of bugs, and I'd rather not use that more error-prone method unless really needed.
* annex.thinGravatar Joey Hess2015-12-27
| | | | | | | | | | | | | | Decided it's too scary to make v6 unlocked files have 1 copy by default, but that should be available to those who need it. This is consistent with git-annex not dropping unused content without --force, etc. * Added annex.thin setting, which makes unlocked files in v6 repositories be hard linked to their content, instead of a copy. This saves disk space but means any modification of an unlocked file will lose the local (and possibly only) copy of the old version. * Enable annex.thin by default on upgrade from direct mode to v6, since direct mode made the same tradeoff. * fix: Adjusts unlocked files as configured by annex.thin.
* reorgGravatar Joey Hess2015-12-26
|
* updateGravatar Joey Hess2015-12-26
|
* optimise read and write for Keys database (untested)Gravatar Joey Hess2015-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Writes are optimised by queueing up multiple writes when possible. The queue is flushed after the Annex monad action finishes. That makes it happen on program termination, and also whenever a nested Annex monad action finishes. Reads are optimised by checking once (per AnnexState) if the database exists. If the database doesn't exist yet, all reads return mempty. Reads also cause queued writes to be flushed, so reads will always be consistent with writes (as long as they're made inside the same Annex monad). A future optimisation path would be to determine when that's not necessary, which is probably most of the time, and avoid flushing unncessarily. Design notes for this commit: - separate reads from writes - reuse a handle which is left open until program exit or until the MVar goes out of scope (and autoclosed then) - writes are queued - queue is flushed periodically - immediate queue flush before any read - auto-flush queue when database handle is garbage collected - flush queue on exit from Annex monad (Note that this may happen repeatedly for a single database connection; or a connection may be reused for multiple Annex monad actions, possibly even concurrent ones.) - if database does not exist (or is empty) the handle is not opened by reads; reads instead return empty results - writes open the handle if it was not open previously
* updateGravatar Joey Hess2015-12-22
|
* wip v6 support for assistantGravatar Joey Hess2015-12-21
| | | | Files are not yet added to v6 repos in unlocked mode.
* interaction with shared clonesGravatar Joey Hess2015-12-17
|
* updateGravatar Joey Hess2015-12-16
|
* update todo listGravatar Joey Hess2015-12-16
|
* starting to work on test suite for v6Gravatar Joey Hess2015-12-15
|
* update todo listGravatar Joey Hess2015-12-15
|
* implemented upgrade of direct mode repo to v6Gravatar Joey Hess2015-12-15
|
* have clean filter check if the filename was already in use by an old keyGravatar Joey Hess2015-12-15
| | | | | | | | The annex object for it may have been modified due to hard link, and that should be cleaned up when the new version is added. If another associated file has the old key's content, that's linked into the annex object. Otherwise, update location log to reflect that content has been lost.
* todoGravatar Joey Hess2015-12-15
|
* updateGravatar Joey Hess2015-12-11
|
* checked getKeysPresent; it's ok for v6 unlocked filesGravatar Joey Hess2015-12-11
| | | | | | When a v6 unlocked files is removed from the work tree, unused doesn't show it. When it gets removed from the index, unused does show it. This is the same as a locked file.
* fsck for v6 unlocked filesGravatar Joey Hess2015-12-11
| | | | | | | | This only adds 1 stat to each file fscked for locked files, so added overhead is minimal. For unlocked files it has to access the database to see if a file is modified.
* finish v6 git-annex lockGravatar Joey Hess2015-12-11
| | | | This was a doozy!
* only make 1 hardlink max between pointer file and annex objectGravatar Joey Hess2015-12-11
| | | | | | | If multiple files point to the same annex object, the user may want to modify them independently, so don't use a hard link. Also, check diskreserve when copying.
* wipGravatar Joey Hess2015-12-11
|
* v6 git-annex unlockGravatar Joey Hess2015-12-10
| | | | | | | | | | | Note that the implementation uses replaceFile, so that the actual replacement of the work tree file is atomic. This seems a good property to have! It would be possible for unlock in v6 mode to be run on files that do not have their content present. However, that would be a behavior change from before, and I don't see any immediate need to support it, so I didn't implement it.
* add generalized linkAnnex'Gravatar Joey Hess2015-12-10
|