summaryrefslogtreecommitdiff
path: root/Git/CatFile.hs
Commit message (Collapse)AuthorAge
* git-recover-repository 1/2 doneGravatar Joey Hess2013-10-20
|
* remove workaround for bug in git 1.8.4r0Gravatar Joey Hess2013-10-20
|
* Use cryptohash rather than SHA for hashing.Gravatar Joey Hess2013-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a massive win on OSX, which doesn't have a sha256sum normally. Only use external hash commands when the file is > 1 mb, since cryptohash is quite close to them in speed. SHA is still used to calculate HMACs. I don't quite understand cryptohash's API for those. Used the following benchmark to arrive at the 1 mb number. 1 mb file: benchmarking sha256/internal mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950 std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950 found 5 outliers among 100 samples (5.0%) 4 (4.0%) high mild 1 (1.0%) high severe variance introduced by outliers: 10.415% variance is moderately inflated by outliers benchmarking sha256/external mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950 std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950 found 3 outliers among 100 samples (3.0%) 2 (2.0%) high mild 1 (1.0%) high severe 2 mb file: benchmarking sha256/internal mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950 std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950 variance introduced by outliers: 35.540% variance is moderately inflated by outliers benchmarking sha256/external mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950 std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950 found 6 outliers among 100 samples (6.0%) import Crypto.Hash import Data.ByteString.Lazy as L import Criterion.Main import Common testfile :: FilePath testfile = "/run/shm/data" -- on ram disk main = defaultMain [ bgroup "sha256" [ bench "internal" $ whnfIO internal , bench "external" $ whnfIO external ] ] sha256 :: L.ByteString -> Digest SHA256 sha256 = hashlazy internal :: IO String internal = show . sha256 <$> L.readFile testfile external :: IO String external = do s <- readProcess "sha256sum" [testfile] return $ fst $ separate (== ' ') s
* more completely solve catKey memory leakGravatar Joey Hess2013-09-19
| | | | | | | | | | | | | | | | | | | Done using a mode witness, which ensures it's fixed everywhere. Fixing catFileKey was a bear, because git cat-file does not provide a nice way to query for the mode of a file and there is no other efficient way to do it. Oh, for libgit2.. Note that I am looking at tree objects from HEAD, rather than the index. Because I cat-file cannot show a tree object for the index. So this fix is technically incomplete. The only cases where it matters are: 1. A new large file has been directly staged in git, but not committed. 2. A file that was committed to HEAD as a symlink has been staged directly in the index. This could be fixed a lot better using libgit2.
* interface to parse git tree objectsGravatar Joey Hess2013-09-19
|
* avoid more build warnings on WindowsGravatar Joey Hess2013-08-04
|
* Slow and ugly work around for bug #718517 in git, which broke git-cat-file ↵Gravatar Joey Hess2013-08-01
| | | | | | | | | | | | | --batch for filenames containing spaces. This runs git-cat-file in non-batch mode for all files with spaces. If a directory tree has a lot of them, and is in direct mode, even "git annex add" when there are few new files will need a *lot* of forks! The only reason buffering the whole file content to get the sha is not a memory leak is that git-annex only ever uses this on symlinks. This needs to be reverted as soon as a fix is available in git!
* Can now restart certain long-running git processes if they crash, and ↵Gravatar Joey Hess2013-05-31
| | | | | | | | | | | | | | | | | | | continue working. Fuzz tests have shown that git cat-file --batch sometimes stops running. It's not yet known why (no error message; repo seems ok). But this is something we can deal with in the CoProcess framework, since all 3 types of long-running git processes should be restartable if they fail. Note that, as implemented, only IO errors are caught. So an error thrown by the reveiver, when it sees something that is not valid output from git cat-file (etc) will not cause a restart. I don't want it to retry if git commands change their output or are just outputting garbage. This does mean that if the command did a partial output and crashed in the middle, it would still not be restarted. There is currently no guard against restarting a command repeatedly, if, for example, it crashes repeatedly on startup.
* fix the day's windows permissions damageGravatar Joey Hess2013-05-12
|
* deal with git using / internally, even on DOSGravatar Joey Hess2013-05-12
|
* fix permission damage (thanks, Windows)Gravatar Joey Hess2013-05-11
|
* refactoringGravatar Joey Hess2013-05-11
|
* git annex init works on Windows!Gravatar Joey Hess2013-05-11
| | | | git hash-object and cat-file both only use \n at ends of line, even on Windows.
* catFile expects no \r, even on WindowsGravatar Joey Hess2013-05-11
|
* finished where indentation changesGravatar Joey Hess2012-12-13
|
* Revert "add catFileIndex"Gravatar Joey Hess2012-09-15
| | | | | This interface is not a good idea, because a running git cat-file --batch does not notice when existing files in the index are changed.
* run git coprocesses with gitEnvGravatar Joey Hess2012-09-15
|
* add catFileIndexGravatar Joey Hess2012-09-15
|
* avoid ByteString.Char8 where not neededGravatar Joey Hess2012-06-20
| | | | | Its truncation behavior is a red flag, so avoid using it in these places where only raw ByteStrings are used, without looking at the data inside.
* crazy optimisationGravatar Joey Hess2012-06-10
| | | | Crazy like a fox..
* move hashObject to HashObject library and generalize it to support all git ↵Gravatar Joey Hess2012-06-06
| | | | object types
* fix filename encoding for git cat-fileGravatar Joey Hess2012-02-26
| | | | | | | | | | | The filename sent to git cat-file needs to be sent on a File encoded handle. Also set the read handle to use the File encoding, so that any error message mentioning the filename is received properly. The actual file content is read using Data.ByteString.Char8, which will ignore the read handle's encoding, so this won't change that. (Whether that is entirely correct remains to be seen.)
* cleanupGravatar Joey Hess2012-02-21
|
* refactorGravatar Joey Hess2012-02-20
|
* switch to the strict state monadGravatar Joey Hess2012-01-29
| | | | | | | | | | I had not realized what a memory leak the lazy state monad could be, although I have not seen much evidence of actual leaking in git-annex. However, if running git-annex on a great many files, this could matter. The additional Utility.State.changeState adds even more strictness, avoiding a problem I saw in github-backup where repeatedly modifying state built up a huge pile of thunks.
* use Common in a few more modulesGravatar Joey Hess2011-12-20
|
* split out Git/Command.hsGravatar Joey Hess2011-12-14
|
* split more stuff out of Git.hsGravatar Joey Hess2011-12-14
|
* improve type signatures with a Ref newtypeGravatar Joey Hess2011-11-16
| | | | | | | | | | | In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for those. Note that this does not prevent mixing up of eg, refs and branches at the type level. Since git really doesn't care, except rare cases like git update-ref, or git tag -d, that seems ok for now. There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref may or may not be a tree-ish, depending on the object type, so there seems no point in trying to represent it at the type level.
* Optimised union merging; now only runs git cat-file once.Gravatar Joey Hess2011-11-12
|
* reorder repo parameters lastGravatar Joey Hess2011-11-08
| | | | | | | | | | | | | Many functions took the repo as their first parameter. Changing it consistently to be the last parameter allows doing some useful things with currying, that reduce boilerplate. In particular, g <- gitRepo is almost never needed now, instead use inRepo to run an IO action in the repo, and fromRepo to get a value from the repo. This also provides more opportunities to use monadic and applicative combinators.
* refactor catfile codeGravatar Joey Hess2011-09-28
split into generic IO code, and a thin Annex wrapper