aboutsummaryrefslogtreecommitdiff
path: root/Annex/Ingest.hs
Commit message (Collapse)AuthorAge
* enable LambdaCase and convert around 10% of places that could use itGravatar Joey Hess2017-11-15
| | | | | | | | | | | Needs ghc 7.6.1, so minimum base version increased slightly. All builds are well above this version of ghc, and debian oldstable is as well. Code that could use lambdacase can be found by running: git grep -B 1 'case ' | less and searching in less for "<-" This commit was sponsored by andrea rota.
* add: Replace work tree file atomically.Gravatar Joey Hess2017-10-16
| | | | | | | Before, there was a window where interrupting an add could result in the file being moved into the annex, with no symlink yet created. This commit was supported by the NSF-funded DataLad project.
* annex.securehashesonlyGravatar Joey Hess2017-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cryptographically secure hashes can be forced to be used in a repository, by setting annex.securehashesonly. This does not prevent the git repository from containing files with insecure hashes, but it does prevent the content of such files from being pulled into .git/annex/objects from another repository. We want to make sure that at no point does git-annex accept content into .git/annex/objects that is hashed with an insecure key. Here's how it was done: * .git/annex/objects/xx/yy/KEY/ is kept frozen, so nothing can be written to it normally * So every place that writes content must call, thawContent or modifyContent. We can audit for these, and be sure we've considered all cases. * The main functions are moveAnnex, and linkToAnnex; these were made to check annex.securehashesonly, and are the main security boundary for annex.securehashesonly. * Most other calls to modifyContent deal with other files in the KEY directory (inode cache etc). The other ones that mess with the content are: - Annex.Direct.toDirectGen, in which content already in the annex directory is moved to the direct mode file, so not relevant. - fix and lock, which don't add new content - Command.ReKey.linkKey, which manually unlocks it to make a copy. * All other calls to thawContent appear safe. Made moveAnnex return a Bool, so checked all callsites and made them deal with a failure in appropriate ways. linkToAnnex simply returns LinkAnnexFailed; all callsites already deal with it failing in appropriate ways. This commit was sponsored by Riku Voipio.
* Make import --deduplicate and --skip-duplicates only hash once, not twiceGravatar Joey Hess2017-02-09
| | | | | | | | | | | | | | | | | | | | import: --deduplicate and --skip-duplicates were implemented inneficiently; they unncessarily hashed each file twice. They have been improved to only hash once. The new approach is to lock down (minimally) and hash files, and then reuse that information when importing them. This was rather tricky, especially in detecting changes to files while they are being imported. The output of import changed slightly. While before it silently skipped over files with eg --skip-duplicates, now it shows each file as it starts to act on it. Since every file is hashed first thing, it would otherwise not be clear what file import is chewing on. (Actually, it wasn't clear before when any of the duplicates switches were used.) This commit was sponsored by Alexander Thompson on Patreon.
* lock: Fix edge cases where data loss could occur in v6 mode.Gravatar Joey Hess2016-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case where the pointer file is in place, and not the content of the object, lock's performNew was called with filemodified=True, which caused it to try to repopulate the object from an unmodified associated file, of which there were none. So, the content of the object got thrown away incorrectly. This was the cause (although not the root cause) of data loss in https://github.com/datalad/datalad/issues/1020 The same problem could also occur when the work tree file is modified, but the object is not, and lock is called with --force. Added a test case for this, since it's excercising the same code path and is easier to set up than the problem above. Note that this only occurred when the keys database did not have an inode cache recorded for the annex object. Normally, the annex object would be in there, but there are of course circumstances where the inode cache is out of sync with reality, since it's only a cache. Fixed by checking if the object is unmodified; if so we don't need to try to repopulate it. This does add an additional checksum to the unlock path, but it's already checksumming the worktree file in another case, so it doesn't slow it down overall. Further investigation found a similar problem occurred when smudge --clean is called on a file and the inode cache is not populated. cleanOldKeys deleted the unmodified old object file in this case. This was also fixed by checking if the object is unmodified. In general, use of getInodeCaches and sameInodeCache is potentially dangerous if the inode cache has not gotten populated for some reason. Better to use isUnmodified. I breifly auited other places that check the inode cache, and did not see any immediate problems, but it would be easy to miss this kind of problem.
* Make git clean filter preserve the backend that was used for a file.Gravatar Joey Hess2016-06-09
|
* Preserve execute bits of unlocked files in v6 mode.Gravatar Joey Hess2016-04-14
| | | | | | | | | | | | | | When annex.thin is set, adding an object will add the execute bits to the work tree file, and this does mean that the annex object file ends up executable. This doesn't add any complexity that wasn't already present, because git annex add of an executable file has always ingested it so that the annex object ends up executable. But, since an annex object file can be executable or not, when populating an unlocked file from one, the executable bit is always added or removed to match the mode of the pointer file.
* git annex add in adjusted unlocked branchGravatar Joey Hess2016-03-29
| | | | | Cached the current branch lookup just because it seems unnecessary overhead to run an extra git command per add to query the current branch.
* Merge branch 'no-cbits'Gravatar Joey Hess2016-03-05
|\
* | annex.addunlockedGravatar Joey Hess2016-02-16
| | | | | | | | | | | | | | * add, addurl, import, importfeed: When in a v6 repository on a crippled filesystem, add files unlocked. * annex.addunlocked: New configuration setting, makes files always be added unlocked. (v6 only)
| * move old ghc compat code into separate module; eliminate WITH_CLIBSGravatar Joey Hess2016-02-15
|/ | | | | This avoids hsc2hs being run except when building for the old version of ghc. Should speed up builds.
* remove 163 lines of code without changing anything except importsGravatar Joey Hess2016-01-20
|
* avoid confusing git with a modified ctime in clean filterGravatar Joey Hess2016-01-07
| | | | | | | Linking the file to the tmp dir was not necessary in the clean filter, and it caused the ctime to change, which caused git to think the file was changed. This caused git status to get slow as it kept re-cleaning unchanged files.
* use TopFilePath for associated filesGravatar Joey Hess2016-01-05
| | | | | | | | | | | | | | | Fixes several bugs with updates of pointer files. When eg, running git annex drop --from localremote it was updating the pointer file in the local repository, not the remote. Also, fixes drop ../foo when run in a subdir, and probably lots of other problems. Test suite drops from ~30 to 11 failures now. TopFilePath is used to force thinking about what the filepath is relative to. The data stored in the sqlite db is still just a plain string, and TopFilePath is a newtype, so there's no overhead involved in using it in DataBase.Keys.
* annex.thinGravatar Joey Hess2015-12-27
| | | | | | | | | | | | | | Decided it's too scary to make v6 unlocked files have 1 copy by default, but that should be available to those who need it. This is consistent with git-annex not dropping unused content without --force, etc. * Added annex.thin setting, which makes unlocked files in v6 repositories be hard linked to their content, instead of a copy. This saves disk space but means any modification of an unlocked file will lose the local (and possibly only) copy of the old version. * Enable annex.thin by default on upgrade from direct mode to v6, since direct mode made the same tradeoff. * fix: Adjusts unlocked files as configured by annex.thin.
* clean up cruft in assistant fast rename code pathGravatar Joey Hess2015-12-22
|
* move cleanOldKey into ingestGravatar Joey Hess2015-12-22
|
* populate unlocked files with newly available content when ingestingGravatar Joey Hess2015-12-22
| | | | | | This can happen when ingesting a new file in either locked or unlocked mode, when some unlocked files in the repo use the same key, and the content was not locally available before.
* finish v6 support for assistantGravatar Joey Hess2015-12-22
| | | | Seems to basically work now!
* refactoringGravatar Joey Hess2015-12-22
no behavior changes