aboutsummaryrefslogtreecommitdiff
path: root/Git/CatFile.hs
Commit message (Collapse)AuthorAge
* adeiu, MissingHGravatar Joey Hess2017-05-16
| | | | | | | | | | | | | | | | Removed dependency on MissingH, instead depending on the split library. After laying groundwork for this since 2015, it was mostly straightforward. Added Utility.Tuple and Utility.Split. Eyeballed System.Path.WildMatch while implementing the same thing. Since MissingH's progress meter display was being used, I re-implemented my own. Bonus: Now progress is displayed for transfers of files of unknown size. This commit was sponsored by Shane-o on Patreon.
* Always use filesystem encoding for all file and handle reads and writes.Gravatar Joey Hess2016-12-24
| | | | | This is a big scary change. I have convinced myself it should be safe. I hope!
* restart coprocess in raw modeGravatar Joey Hess2016-11-01
| | | | | | | | | | | Restarting a crashing git process could result in filename encoding issues when not in a unicode locale, as the restarted processes's handles were not read in raw mode. Since rawMode is always used when starting a coprocess, didn't bother to parameterise it and just always enable it for simplicity. This commit was sponsored by Jake Vosloo on Patreon.
* Fix reversion in 6.20161012 that prevented adding files with a space in ↵Gravatar Joey Hess2016-10-31
| | | | their name.
* Avoid using a lot of memory when large objects are present in the git repositoryGravatar Joey Hess2016-10-05
| | | | | | | | | | | | | | | | | | | | | | .. and have to be checked to see if they are a pointed to an annexed file. Cases where such memory use could occur included, but were not limited to: - git commit -a of a large unlocked file (in v5 mode) - git-annex adjust when a large file was checked into git directly Generally, any use of catKey was a potential problem. Fix by using git cat-file --batch-check to check size before catting. This adds another git batch process, which is included in the CatFileHandle for simplicity. There could be performance impact, anywhere catKey is used. Particularly likely to affect adjusted branch generation speed, and operations on unlocked files in v6 mode. Hopefully since the --batch-check and --batch read the same data, disk buffering will avoid most overhead. Leaving only the overhead of talking to the process over the pipe and whatever computation --batch-check needs to do. This commit was sponsored by Bruno BEAUFILS on Patreon.
* fix parsing of commit with no parentsGravatar Joey Hess2016-03-31
|
* extract commit parent(s)Gravatar Joey Hess2016-03-11
|
* add catCommit, with commit object parserGravatar Joey Hess2016-02-25
|
* remove old TODOGravatar Joey Hess2016-01-01
|
* removed all uses of undefined from code baseGravatar Joey Hess2015-04-19
| | | | It's a code smell, can lead to hard to diagnose error messages.
* update my email address and homepage urlGravatar Joey Hess2015-01-21
|
* fix some mixed space+tab indentationGravatar Joey Hess2014-10-09
| | | | | | | | | This fixes all instances of " \t" in the code base. Most common case seems to be after a "where" line; probably vim copied the two space layout of that line. Done as a background task while listening to episode 2 of the Type Theory podcast.
* Windows: Fix some filename encoding bugs.Gravatar Joey Hess2014-03-19
| | | | | | http://git-annex.branchable.com/bugs/Unicode_file_names_ignored_on_Windows/ Not a complete fix yet.
* sync: Fix bug in direct mode that caused a file not checked into git to be ↵Gravatar Joey Hess2014-03-03
| | | | deleted when merging with a remote that added a file by the same name. (Thanks, jkt)
* remove Read instance for RefGravatar Joey Hess2014-02-19
| | | | | | | | Removed instance, got it all to build using fromRef. (With a few things that really need to show something using a ref for debugging stubbed out.) Then added back Read instance, and made Logs.View use it for serialization. This changes the view log format.
* git-recover-repository 1/2 doneGravatar Joey Hess2013-10-20
|
* remove workaround for bug in git 1.8.4r0Gravatar Joey Hess2013-10-20
|
* Use cryptohash rather than SHA for hashing.Gravatar Joey Hess2013-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a massive win on OSX, which doesn't have a sha256sum normally. Only use external hash commands when the file is > 1 mb, since cryptohash is quite close to them in speed. SHA is still used to calculate HMACs. I don't quite understand cryptohash's API for those. Used the following benchmark to arrive at the 1 mb number. 1 mb file: benchmarking sha256/internal mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950 std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950 found 5 outliers among 100 samples (5.0%) 4 (4.0%) high mild 1 (1.0%) high severe variance introduced by outliers: 10.415% variance is moderately inflated by outliers benchmarking sha256/external mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950 std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950 found 3 outliers among 100 samples (3.0%) 2 (2.0%) high mild 1 (1.0%) high severe 2 mb file: benchmarking sha256/internal mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950 std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950 variance introduced by outliers: 35.540% variance is moderately inflated by outliers benchmarking sha256/external mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950 std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950 found 6 outliers among 100 samples (6.0%) import Crypto.Hash import Data.ByteString.Lazy as L import Criterion.Main import Common testfile :: FilePath testfile = "/run/shm/data" -- on ram disk main = defaultMain [ bgroup "sha256" [ bench "internal" $ whnfIO internal , bench "external" $ whnfIO external ] ] sha256 :: L.ByteString -> Digest SHA256 sha256 = hashlazy internal :: IO String internal = show . sha256 <$> L.readFile testfile external :: IO String external = do s <- readProcess "sha256sum" [testfile] return $ fst $ separate (== ' ') s
* more completely solve catKey memory leakGravatar Joey Hess2013-09-19
| | | | | | | | | | | | | | | | | | | Done using a mode witness, which ensures it's fixed everywhere. Fixing catFileKey was a bear, because git cat-file does not provide a nice way to query for the mode of a file and there is no other efficient way to do it. Oh, for libgit2.. Note that I am looking at tree objects from HEAD, rather than the index. Because I cat-file cannot show a tree object for the index. So this fix is technically incomplete. The only cases where it matters are: 1. A new large file has been directly staged in git, but not committed. 2. A file that was committed to HEAD as a symlink has been staged directly in the index. This could be fixed a lot better using libgit2.
* interface to parse git tree objectsGravatar Joey Hess2013-09-19
|
* avoid more build warnings on WindowsGravatar Joey Hess2013-08-04
|
* Slow and ugly work around for bug #718517 in git, which broke git-cat-file ↵Gravatar Joey Hess2013-08-01
| | | | | | | | | | | | | --batch for filenames containing spaces. This runs git-cat-file in non-batch mode for all files with spaces. If a directory tree has a lot of them, and is in direct mode, even "git annex add" when there are few new files will need a *lot* of forks! The only reason buffering the whole file content to get the sha is not a memory leak is that git-annex only ever uses this on symlinks. This needs to be reverted as soon as a fix is available in git!
* Can now restart certain long-running git processes if they crash, and ↵Gravatar Joey Hess2013-05-31
| | | | | | | | | | | | | | | | | | | continue working. Fuzz tests have shown that git cat-file --batch sometimes stops running. It's not yet known why (no error message; repo seems ok). But this is something we can deal with in the CoProcess framework, since all 3 types of long-running git processes should be restartable if they fail. Note that, as implemented, only IO errors are caught. So an error thrown by the reveiver, when it sees something that is not valid output from git cat-file (etc) will not cause a restart. I don't want it to retry if git commands change their output or are just outputting garbage. This does mean that if the command did a partial output and crashed in the middle, it would still not be restarted. There is currently no guard against restarting a command repeatedly, if, for example, it crashes repeatedly on startup.
* fix the day's windows permissions damageGravatar Joey Hess2013-05-12
|
* deal with git using / internally, even on DOSGravatar Joey Hess2013-05-12
|
* fix permission damage (thanks, Windows)Gravatar Joey Hess2013-05-11
|
* refactoringGravatar Joey Hess2013-05-11
|
* git annex init works on Windows!Gravatar Joey Hess2013-05-11
| | | | git hash-object and cat-file both only use \n at ends of line, even on Windows.
* catFile expects no \r, even on WindowsGravatar Joey Hess2013-05-11
|
* finished where indentation changesGravatar Joey Hess2012-12-13
|
* Revert "add catFileIndex"Gravatar Joey Hess2012-09-15
| | | | | This interface is not a good idea, because a running git cat-file --batch does not notice when existing files in the index are changed.
* run git coprocesses with gitEnvGravatar Joey Hess2012-09-15
|
* add catFileIndexGravatar Joey Hess2012-09-15
|
* avoid ByteString.Char8 where not neededGravatar Joey Hess2012-06-20
| | | | | Its truncation behavior is a red flag, so avoid using it in these places where only raw ByteStrings are used, without looking at the data inside.
* crazy optimisationGravatar Joey Hess2012-06-10
| | | | Crazy like a fox..
* move hashObject to HashObject library and generalize it to support all git ↵Gravatar Joey Hess2012-06-06
| | | | object types
* fix filename encoding for git cat-fileGravatar Joey Hess2012-02-26
| | | | | | | | | | | The filename sent to git cat-file needs to be sent on a File encoded handle. Also set the read handle to use the File encoding, so that any error message mentioning the filename is received properly. The actual file content is read using Data.ByteString.Char8, which will ignore the read handle's encoding, so this won't change that. (Whether that is entirely correct remains to be seen.)
* cleanupGravatar Joey Hess2012-02-21
|
* refactorGravatar Joey Hess2012-02-20
|
* switch to the strict state monadGravatar Joey Hess2012-01-29
| | | | | | | | | | I had not realized what a memory leak the lazy state monad could be, although I have not seen much evidence of actual leaking in git-annex. However, if running git-annex on a great many files, this could matter. The additional Utility.State.changeState adds even more strictness, avoiding a problem I saw in github-backup where repeatedly modifying state built up a huge pile of thunks.
* use Common in a few more modulesGravatar Joey Hess2011-12-20
|
* split out Git/Command.hsGravatar Joey Hess2011-12-14
|
* split more stuff out of Git.hsGravatar Joey Hess2011-12-14
|
* improve type signatures with a Ref newtypeGravatar Joey Hess2011-11-16
| | | | | | | | | | | In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for those. Note that this does not prevent mixing up of eg, refs and branches at the type level. Since git really doesn't care, except rare cases like git update-ref, or git tag -d, that seems ok for now. There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref may or may not be a tree-ish, depending on the object type, so there seems no point in trying to represent it at the type level.
* Optimised union merging; now only runs git cat-file once.Gravatar Joey Hess2011-11-12
|
* reorder repo parameters lastGravatar Joey Hess2011-11-08
| | | | | | | | | | | | | Many functions took the repo as their first parameter. Changing it consistently to be the last parameter allows doing some useful things with currying, that reduce boilerplate. In particular, g <- gitRepo is almost never needed now, instead use inRepo to run an IO action in the repo, and fromRepo to get a value from the repo. This also provides more opportunities to use monadic and applicative combinators.
* refactor catfile codeGravatar Joey Hess2011-09-28
split into generic IO code, and a thin Annex wrapper