aboutsummaryrefslogtreecommitdiff
path: root/Annex/CatFile.hs
Commit message (Collapse)AuthorAge
* Always use filesystem encoding for all file and handle reads and writes.Gravatar Joey Hess2016-12-24
| | | | | This is a big scary change. I have convinced myself it should be safe. I hope!
* Avoid using a lot of memory when large objects are present in the git repositoryGravatar Joey Hess2016-10-05
| | | | | | | | | | | | | | | | | | | | | | .. and have to be checked to see if they are a pointed to an annexed file. Cases where such memory use could occur included, but were not limited to: - git commit -a of a large unlocked file (in v5 mode) - git-annex adjust when a large file was checked into git directly Generally, any use of catKey was a potential problem. Fix by using git cat-file --batch-check to check size before catting. This adds another git batch process, which is included in the CatFileHandle for simplicity. There could be performance impact, anywhere catKey is used. Particularly likely to affect adjusted branch generation speed, and operations on unlocked files in v6 mode. Hopefully since the --batch-check and --batch read the same data, disk buffering will avoid most overhead. Leaving only the overhead of talking to the process over the pipe and whatever computation --batch-check needs to do. This commit was sponsored by Bruno BEAUFILS on Patreon.
* Optimisations to git-annex branch query and setting, avoiding repeated ↵Gravatar Joey Hess2016-09-29
| | | | | | | | | | | | | | | | | | | | | | copies of the environment. Speeds up commands like "git-annex find --in remote" by over 50%. Profiling showed that adjustGitEnv was 21% of the time and 37% of the allocations of that command. It copied the environment each time with getEnvironment. The only repeated use of adjustGitEnv is in withIndexFile, which tends to be run at least once per file. So, it was optimised by keeping a cache of the environment, which can be reused. There could be other better ways to optimise this. Maybe get the while environment once at startup. But, then it would have to be serialized back out each time running a child process, so I doubt that would be a net win. It might be better to cache a version of the environment that is pre-modified to use .git-annex/index. But, profiling doesn't show that modifying the enviroment is taking any significant time.
* use indexEnvGravatar Joey Hess2016-05-17
|
* add catCommitGravatar Joey Hess2016-02-25
|
* remove 163 lines of code without changing anything except importsGravatar Joey Hess2016-01-20
|
* improve commentGravatar Joey Hess2015-12-26
|
* refactor and improve pointer file handling codeGravatar Joey Hess2015-12-09
|
* require "annex/objects/" before key in pointer filesGravatar Joey Hess2015-12-07
| | | | | | | | | | | | | | This removes ambiguity, because while someone might have "WORM--foo" in a file that's not intended to be a git-annex pointer file, "annex/objects/WORM--foo" is less likely. Also, ee0c34c8f2f94775b39ef10ed580cab47d2f929c had a caveat about symlink targets being parsed as pointer files, and now the same parser is used for both. I did not include any hash directories before the key in the pointer file, as they're not needed. However, if they were included, the parser would still work ok.
* support pointer filesGravatar Joey Hess2015-12-07
| | | | | | | | | | | | | | | | | | | | | Backend.lookupFile is changed to always fall back to catKey when operating on a file that's not a symlink. catKey is changed to understand pointer files, as well as annex symlinks. Before, catKey needed a file mode witness, to be sure it was looking at a symlink. That was complicated stuff. Now, it doesn't actually care if a file in git is a symlink or not; in either case asking git for the content of the file will get the pointer to the key. This does mean that git-annex will treat a link foo -> WORM--bar as a git-annex file, and also treats a regular file containing annex/objects/WORM--bar as a git-annex file. Calling catKey could make git-annex commands need to do more work than before. This would especially be the case if a repo contained many regular files, and only a few annexed files, as now git-annex will need to ask git about the contents of the regular files.
* update my email address and homepage urlGravatar Joey Hess2015-01-21
|
* fix some mixed space+tab indentationGravatar Joey Hess2014-10-09
| | | | | | | | | This fixes all instances of " \t" in the code base. Most common case seems to be after a "where" line; probably vim copied the two space layout of that line. Done as a background task while listening to episode 2 of the Type Theory podcast.
* When accessing a local remote, shut down git-cat-file processes afterwards, ↵Gravatar Joey Hess2014-08-20
| | | | | | | | | | | | | | | to ensure that remotes on removable media can be unmounted. Closes: #758630 This does mean that eg, copying multiple files to a local remote will become slightly slower, since it now restarts git-cat-file after each copy. Should not be significant slowdown. The reason git-cat-file is run on the remote at all is to update its location log. In order to add an item to it, it needs to get the current content of the log. Finding a way to avoid needing to do that would be a good path to avoiding this slowdown if it does become a problem somehow. This commit was sponsored by Evan Deaubl.
* Fix bug in automatic merge conflict resolutionGravatar Joey Hess2014-07-08
| | | | | | | | | | | | | | | | When one side is an annexed symlink, and the other side is a non-annexed symlink. In this case, git-merge does not replace the annexed symlink in the work tree with the non-annexed symlink, which is different from it's handling of conflicts between annexed symlinks and regular files or directories. So, while git-annex generated the correct merge commit, the work tree didn't get updated to reflect it. See comments on bug for additional analysis. Did not add this to the test suite yet; just unloaded a truckload of firewood and am feeling lazy. This commit was sponsored by Adam Spiers.
* Windows: Fix some filename encoding bugs.Gravatar Joey Hess2014-03-19
| | | | | | http://git-annex.branchable.com/bugs/Unicode_file_names_ignored_on_Windows/ Not a complete fix yet.
* sync: Fix bug in direct mode that caused a file not checked into git to be ↵Gravatar Joey Hess2014-03-03
| | | | deleted when merging with a remote that added a file by the same name. (Thanks, jkt)
* Preserve metadata when staging a new version of an annexed file.Gravatar Joey Hess2014-02-24
| | | | | | | | | | | | | | | | | Performance impact: When adding a large tree of new files, this needs to do some git cat-file queries to check if any of the files already existed and might need a metadata copy. I tried a benchmark in a copy of my sound repository (so there was already a significant git tree to check against. Adding 10000 small files, with a cold cache: before: 1m48.539s after: 1m52.791s So, impact is 0.0004 seconds per file added. Which seems acceptable, so did not add some kind of configuration to enable/disable this. This commit was sponsored by Lisa Feilen.
* Add plumbing-level lookupkey command.Gravatar Joey Hess2013-12-15
|
* add new status commandGravatar Joey Hess2013-11-07
| | | | | | | | | | | | | | | This works for both direct and indirect mode. It may need some performance tuning. Note that unlike git status, it only shows the status of the work tree, not the status of the index. So only one status letter, not two .. and since files that have been added and not yet committed do not differ between the work tree and the index, they are not shown. Might want to add display of the index vs the last commit eventually. This commit was sponsored by an unknown bitcoin contributor, whose contribution as been going up lately! ;)
* git-recover-repository 1/2 doneGravatar Joey Hess2013-10-20
|
* completely solve catKey memory leakGravatar Joey Hess2013-09-19
| | | | | | | | | | | Since 4aaa584eb632a981f5364c844f9293d4cdedaa65 was incomplete, not being able to get the right mode of the file when the index differs from HEAD, this is a final workaround. Only buffering the start of the file in this case avoids leaking memory. This does not prevent git-cat-file being asked to output the whole file, which needs to be consumed, and can be slow. But this only happens in a rare edge case.
* more completely solve catKey memory leakGravatar Joey Hess2013-09-19
| | | | | | | | | | | | | | | | | | | Done using a mode witness, which ensures it's fixed everywhere. Fixing catFileKey was a bear, because git cat-file does not provide a nice way to query for the mode of a file and there is no other efficient way to do it. Oh, for libgit2.. Note that I am looking at tree objects from HEAD, rather than the index. Because I cat-file cannot show a tree object for the index. So this fix is technically incomplete. The only cases where it matters are: 1. A new large file has been directly staged in git, but not committed. 2. A file that was committed to HEAD as a symlink has been staged directly in the index. This could be fixed a lot better using libgit2.
* Fix bug that caused typechanged symlinks to be assumed to be unlocked files, ↵Gravatar Joey Hess2013-08-22
| | | | so they were added to the annex by the pre-commit hook.
* assistant: Work around git-cat-file's not reloading the index after files ↵Gravatar Joey Hess2013-05-25
| | | | | | are staged. Argh.
* Merge branch 'master' into windowsGravatar Joey Hess2013-05-15
|\
| * start one git-cat-file per index fileGravatar Joey Hess2013-05-15
| | | | | | | | | | | | | | This reverts a5031031f0d596b2381a785925beb574d90a862e and properly fixes the issue discussed there. This makes git-annex behave much nicer in direct mode.
* | fix the day's windows permissions damageGravatar Joey Hess2013-05-12
| |
* | deal with git using / internally, even on DOSGravatar Joey Hess2013-05-12
|/
* work around a very strange git-cat-file behaviorGravatar Joey Hess2013-01-05
| | | | | | | | | | | | | Sometimes it seems that git-cat-file --batch stops getting info for files in the current repo, when ":file" is fed to it. I have not reproduced this at the command line, but only when using git annex whereis and git annex move inside a direct mode repo. Those failed, because cat-file returned "file missing". OTOH, git annex find works fine, despite passing the same file to cat-file. It seems that the failing commands first asked cat-file to show a file on the git-annex branch. Perhaps it got "stuck" on that branch? But I cannot repoduce it running cat-file by hand. Most strange. HEAD is a workaround for this extreme weirdness, since I spent a good 2 hours struggling with it already.
* assistant: Make expensive transfer scan work fully in direct mode.Gravatar Joey Hess2013-01-05
| | | | | | | | | | | | | The expensive scan uses lookupFile, but in direct mode, that doesn't work for files that are present. So the scan was not finding things that are present that need to be uploaded. (It did find things not present that needed to be downloaded.) Now lookupFile also works in direct mode. Note that it still prefers symlinks on disk to info committed to git, in direct mode. This is necessary to make things like Assistant.Threads.Watcher.onAddSymlink work correctly, when given a new symlink not yet checked into git (or replacing a file checked into git).
* assistant direct mode file add/change bookkeepingGravatar Joey Hess2012-12-25
| | | | | | | | | | | | | | When a file is changed in direct mode, the old content is probably lost (at least from the local repo), and bookeeping needs to be updated to reflect this. Also, synthetic add events are generated at assistant startup, so make it detect when the file has not really changed, and avoid re-adding it. This does add the overhead of querying the runing git cat-file for the key that's recorded in git for the file, each time a file is added or modified in direct mode.
* Merge branch 'master' into desymlinkGravatar Joey Hess2012-12-13
|\ | | | | | | | | | | | | | | Conflicts: Annex/CatFile.hs Annex/Content.hs Git/LsFiles.hs Git/LsTree.hs
| * finished where indentation changesGravatar Joey Hess2012-12-13
| |
* | direct mode committingGravatar Joey Hess2012-12-12
|/
* Revert "add catFileIndex"Gravatar Joey Hess2012-09-15
| | | | | This interface is not a good idea, because a running git cat-file --batch does not notice when existing files in the index are changed.
* add catFileIndexGravatar Joey Hess2012-09-15
|
* avoid ByteString.Char8 where not neededGravatar Joey Hess2012-06-20
| | | | | Its truncation behavior is a red flag, so avoid using it in these places where only raw ByteStrings are used, without looking at the data inside.
* crazy optimisationGravatar Joey Hess2012-06-10
| | | | Crazy like a fox..
* detect and recover from branch push/commit raceGravatar Joey Hess2011-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dealing with a race without using locking is exceedingly difficult and tricky. Fully tested, I hope. There are three places left where the branch can be updated, that are not covered by the race recovery code. Let's prove they're all immune to the race: 1. tryFastForwardTo checks to see if a fast-forward can be done, and then does git-update-ref on the branch to fast-forward it. If a push comes in before the check, then either no fast-forward will be done (ok), or the push set the branch to a ref that can still be fast-forwarded (also ok) If a push comes in after the check, the git-update-ref will undo the ref change made by the push. It's as if the push did not come in, and the next git-push will see this, and try to re-do it. (acceptable) 2. When creating the branch for the very first time, an empty index is created, and a commit of it made to the branch. The commit's ref is recorded as the current state of the index. If a push came in during that, it will be noticed the next time a commit is made to the branch, since the branch will have changed. (ok) 3. Creating the branch from an existing remote branch involves making the branch, and then getting its ref, and recording that the index reflects that ref. If a push creates the branch first, git-branch will fail (ok). If the branch is created and a racing push is then able to change it (highly unlikely!) we're still ok, because it first records the ref into the index.lck, and then updating the index. The race can cause the index.lck to have the old branch ref, while the index has the newly pushed branch merged into it, but that only results in an unnecessary update of the index file later on.
* improve type signatures with a Ref newtypeGravatar Joey Hess2011-11-16
| | | | | | | | | | | In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for those. Note that this does not prevent mixing up of eg, refs and branches at the type level. Since git really doesn't care, except rare cases like git update-ref, or git tag -d, that seems ok for now. There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref may or may not be a tree-ish, depending on the object type, so there seems no point in trying to represent it at the type level.
* Optimised union merging; now only runs git cat-file once.Gravatar Joey Hess2011-11-12
|
* lintGravatar Joey Hess2011-11-11
|
* reorder repo parameters lastGravatar Joey Hess2011-11-08
| | | | | | | | | | | | | Many functions took the repo as their first parameter. Changing it consistently to be the last parameter allows doing some useful things with currying, that reduce boilerplate. In particular, g <- gitRepo is almost never needed now, instead use inRepo to run an IO action in the repo, and fromRepo to get a value from the repo. This also provides more opportunities to use monadic and applicative combinators.
* renameGravatar Joey Hess2011-10-05
|
* renameGravatar Joey Hess2011-10-04