aboutsummaryrefslogtreecommitdiff
path: root/Command/Smudge.hs
Commit message (Collapse)AuthorAge
* add KeyVariety typeGravatar Joey Hess2017-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.
* Make import --deduplicate and --skip-duplicates only hash once, not twiceGravatar Joey Hess2017-02-09
| | | | | | | | | | | | | | | | | | | | import: --deduplicate and --skip-duplicates were implemented inneficiently; they unncessarily hashed each file twice. They have been improved to only hash once. The new approach is to lock down (minimally) and hash files, and then reuse that information when importing them. This was rather tricky, especially in detecting changes to files while they are being imported. The output of import changed slightly. While before it silently skipped over files with eg --skip-duplicates, now it shows each file as it starts to act on it. Since every file is hashed first thing, it would otherwise not be clear what file import is chewing on. (Actually, it wasn't clear before when any of the duplicates switches were used.) This commit was sponsored by Alexander Thompson on Patreon.
* Make git clean filter preserve the backend that was used for a file.Gravatar Joey Hess2016-06-09
|
* reorder associated file db updateGravatar Joey Hess2016-05-16
| | | | | | | | | | | | | | | There's a potential race where the smudge filter is run at the same time an object is being downloaded. If the download finished after the inAnnex check, and before the keys db was updated, the associated file would not get updated with the downloaded content. I'm not sure this closes the race; it may only narrow the window. Problem is, the keys database needs to communicate between two different processes. In the case of the assistant, the transferkeys command is the other process, and it closes the db handle after getting the file. So, it should re-open the database and so see the update that the smudge filter has written to it. But, what if the smudge filter takes a while to update the database?
* Fix bug that sometimes prevented git-annex smudge --clean from consuming all ↵Gravatar Joey Hess2016-05-02
| | | | its input, which resulted in git add bypassing git-annex.
* smudge: Print a warning when annex.thin is set, as git's smudge interface ↵Gravatar Joey Hess2016-04-13
| | | | does not allow honoring that configuration.
* remove 163 lines of code without changing anything except importsGravatar Joey Hess2016-01-20
|
* avoid confusing git with a modified ctime in clean filterGravatar Joey Hess2016-01-07
| | | | | | | Linking the file to the tmp dir was not necessary in the clean filter, and it caused the ctime to change, which caused git to think the file was changed. This caused git status to get slow as it kept re-cleaning unchanged files.
* use TopFilePath for associated filesGravatar Joey Hess2016-01-05
| | | | | | | | | | | | | | | Fixes several bugs with updates of pointer files. When eg, running git annex drop --from localremote it was updating the pointer file in the local repository, not the remote. Also, fixes drop ../foo when run in a subdir, and probably lots of other problems. Test suite drops from ~30 to 11 failures now. TopFilePath is used to force thinking about what the filepath is relative to. The data stored in the sqlite db is still just a plain string, and TopFilePath is a newtype, so there's no overhead involved in using it in DataBase.Keys.
* switch to using main ingest codeGravatar Joey Hess2016-01-01
| | | | | Fixes at least one bug, in populating existing worktree files that use the same key that's ingested.
* don't disable smudge filter while mergingGravatar Joey Hess2015-12-29
| | | | | | | | | | | The smudge filter does need to be run, because if the key is in the local annex already (due to renaming, or a copy of a file added, or a new file added and its content has already arrived), git merge smudges the file and this should provide its content. This does probably mean that in merge conflict resolution, git smudges the existing file, re-copying all its content to it, and then the file is deleted. So, not efficient.
* automatic conflict resolution for v6 unlocked filesGravatar Joey Hess2015-12-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several tricky parts: * When the conflict is just between the same key being locked and unlocked, the unlocked version wins, and the file is not renamed in this case. * Need to update associated file map when conflict resolution renames an unlocked file. * git merge runs the smudge filter on the conflicting file, and actually overwrites the file with the same content it had before, and so invalidates its inode cache. This makes it difficult to know when it's safe to remove such files as conflict cruft, without going so far as to compare their entire contents. Dealt with this by preventing the smudge filter from populating the file when a merge is run. However, that also prevents the smudge filter being run for non-conflicting files, so eg moving a file won't put its new content into place. * Ideally, if a merge or a merge conflict resolution renames an unlocked file, the file in the work tree can just be moved, rather than copying the content to a new worktree file. This is attempted to be done in merge conflict resolution, but due to git merge's behavior of running smudge filters, what actually seems to happen is the old worktree file with the content is deleted and rewritten as a pointer file, so doesn't get reused. So, this is probably not as efficient as it optimally could be. If that becomes a problem, could look into running the merge in a separate worktree and updating the real worktree more efficiently, similarly to the direct mode merge. However, the direct mode merge had a lot of bugs, and I'd rather not use that more error-prone method unless really needed.
* annex.thinGravatar Joey Hess2015-12-27
| | | | | | | | | | | | | | Decided it's too scary to make v6 unlocked files have 1 copy by default, but that should be available to those who need it. This is consistent with git-annex not dropping unused content without --force, etc. * Added annex.thin setting, which makes unlocked files in v6 repositories be hard linked to their content, instead of a copy. This saves disk space but means any modification of an unlocked file will lose the local (and possibly only) copy of the old version. * Enable annex.thin by default on upgrade from direct mode to v6, since direct mode made the same tradeoff. * fix: Adjusts unlocked files as configured by annex.thin.
* lost some bookkeeping infoGravatar Joey Hess2015-12-24
| | | | I forgot to convert this to use Annex.Ingest, todo later.
* move cleanOldKey into ingestGravatar Joey Hess2015-12-22
|
* make linkAnnex detect when the file changes as it's being copied/linked inGravatar Joey Hess2015-12-22
| | | | | | | | | This fixes a race where the modified file ended up in annex/objects, and the InodeCache stored in the database was for the modified version, so git-annex didn't know it had gotten modified. The race could occur when the smudge filter was running; now it gets the InodeCache before generating the Key, which avoids the race.
* have clean filter check if the filename was already in use by an old keyGravatar Joey Hess2015-12-15
| | | | | | | | The annex object for it may have been modified due to hard link, and that should be cleaned up when the new version is added. If another associated file has the old key's content, that's linked into the annex object. Otherwise, update location log to reflect that content has been lost.
* avoid smudge filter returning invalid contentGravatar Joey Hess2015-12-11
| | | | | | | | | | | | 1. git add file 2. git commit 3. modify file 4. git commit 5. git reset HEAD^ Before this fix, that resulted in git saying the file was modified. And indeed, it didn't have the content it should in the just checked out ref, because step 3 modified the object file for the old key.
* only make 1 hardlink max between pointer file and annex objectGravatar Joey Hess2015-12-11
| | | | | | | If multiple files point to the same annex object, the user may want to modify them independently, so don't use a hard link. Also, check diskreserve when copying.
* always format pointer file with a trailing newlineGravatar Joey Hess2015-12-10
| | | | | | | Before the smudge filter added a trailing newline, but other things that wrote formatPointer to a file did not. also some new pointer staging code to use later
* use InodeCache when dropping a key to see if a pointer file can be safely resetGravatar Joey Hess2015-12-09
| | | | | | | | | | | | | | | | The Keys database can hold multiple inode caches for a given key. One for the annex object, and one for each pointer file, which may not be hard linked to it. Inode caches for a key are recorded when its content is added to the annex, but only if it has known pointer files. This is to avoid the overhead of maintaining the database when not needed. When the smudge filter outputs a file's content, the inode cache is not updated, because git's smudge interface doesn't let us write the file. So, dropping will fall back to doing an expensive verification then. Ideally, git's interface would be improved, and then the inode cache could be updated then too.
* add inode cache to the dbGravatar Joey Hess2015-12-09
| | | | | | | | | Renamed the db to keys, since it is various info about a Keys. Dropping a key will update its pointer files, as long as their content can be verified to be unmodified. This falls back to checksum verification, but I want it to use an InodeCache of the key, for speed. But, I have not made anything populate that cache yet.
* avoid clean filter trying to annex a pointer fileGravatar Joey Hess2015-12-09
|
* stash DbHandle in Annex stateGravatar Joey Hess2015-12-09
|
* refactor and improve pointer file handling codeGravatar Joey Hess2015-12-09
|
* require "annex/objects/" before key in pointer filesGravatar Joey Hess2015-12-07
| | | | | | | | | | | | | | This removes ambiguity, because while someone might have "WORM--foo" in a file that's not intended to be a git-annex pointer file, "annex/objects/WORM--foo" is less likely. Also, ee0c34c8f2f94775b39ef10ed580cab47d2f929c had a caveat about symlink targets being parsed as pointer files, and now the same parser is used for both. I did not include any hash directories before the key in the pointer file, as they're not needed. However, if they were included, the parser would still work ok.
* support pointer filesGravatar Joey Hess2015-12-07
| | | | | | | | | | | | | | | | | | | | | Backend.lookupFile is changed to always fall back to catKey when operating on a file that's not a symlink. catKey is changed to understand pointer files, as well as annex symlinks. Before, catKey needed a file mode witness, to be sure it was looking at a symlink. That was complicated stuff. Now, it doesn't actually care if a file in git is a symlink or not; in either case asking git for the content of the file will get the pointer to the key. This does mean that git-annex will treat a link foo -> WORM--bar as a git-annex file, and also treats a regular file containing annex/objects/WORM--bar as a git-annex file. Calling catKey could make git-annex commands need to do more work than before. This would especially be the case if a repo contained many regular files, and only a few annexed files, as now git-annex will need to ask git about the contents of the regular files.
* update associated files database on smudge and cleanGravatar Joey Hess2015-12-07
|
* refactorGravatar Joey Hess2015-12-04
|
* commentsGravatar Joey Hess2015-12-04
|
* merge clean into smudge commandGravatar Joey Hess2015-12-04
| | | | | | The git filter config can be used to map the single git-annex command to the 2 actions, and this avoids "git annex clean" being used for this thing, it might have a better use for that name later.
* avoid commit and messages for smudge filterGravatar Joey Hess2015-12-04
|
* smudge filter workingGravatar Joey Hess2015-12-04
|
* skeleton smudge/clean filtersGravatar Joey Hess2015-12-04