git-annex-gpl - git-annex without the AGPL

	Commit message (Collapse)	Author	Age
*	Always use filesystem encoding for all file and handle reads and writes.	Joey Hess	2016-12-24
\| \| \| \| \|	This is a big scary change. I have convinced myself it should be safe. I hope!
*	Sped up git-annex merge by using git hash-object --batch.	Joey Hess	2016-03-14
\| \| \| \| \| \| \|	This does mean that it has to write out temp files containing updated objects for the merge. So may use more disk space, and disk IO, but that should generally win out over needing to launch N separate git hash-object processes.
*	avoid nul-truncation	Joey Hess	2015-08-11
\| \| \| \| \| \| \|	This might be a little slower, but it's safer, in the event that a union-merged file contains a NUL. AFAIK, no files in the git-annex branch do.
*	spotted a few more places where diff-tree needed --	Joey Hess	2015-04-09
\| \| \| \| \| \|	None of these are very likely at all to ever be ambiguous, since tree refs almost never have symbolic names and the sha is very unlikely to be in the work tree.. But, let's get it right!
*	fix union merge to call diff-index with -- after the ref	Joey Hess	2015-04-09
\| \| \| \| \|	Otherwise, if there's a file in the repo with a name matching the ref, git could get confused and the merge not work.
*	update my email address and homepage url	Joey Hess	2015-01-21
\|
*	union merge bugfix	Joey Hess	2013-01-16
\| \| \| \| \| \| \| \| \| \| \| \|	Union merges involving two or more repositories could sometimes result in data from one repository getting lost. This could result in the location log data becoming wrong, and fsck being needed to fix it. NB: I audited for any other occurrences of this problem. There are other places than union merge where multiple changes are fed into update-index in a stream, but they all involve working copy files being staged, or their deletion being staged, and in this case it's fine for the later changes to override the earlier ones.
*	finished where indentation changes	Joey Hess	2012-12-13
\|
*	fix slightly incorrect comment	Joey Hess	2012-10-12
\|
*	more zombie fighting	Joey Hess	2012-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm down to 9 places in the code that can produce unwaited for zombies. Most of these are pretty innocuous, at least for now, are only used in short-running commands, or commands that run a set of actions and explicitly reap zombies after each one. The one from Annex.Branch.files could be trouble later, since both Command.Fsck and Command.Unused can trigger it, and the assistant will be doing those eventally. Ditto the one in Git.LsTree.lsTree, which Command.Unused uses. The only ones currently affecting the assistant though, are in Git.LsFiles. Several threads use several of those. (And yeah, using pipes or ResourceT would be a less ad-hoc approach, but I don't really feel like ripping my entire code base apart right now to change a foundation monad. Maybe one of these days..)
*	Got rid of the last place that did utf8 decoding.	Joey Hess	2012-06-26
\| \| \| \| \|	Probably fixes bugs/git-annex:_Cannot_decode_byte___39____92__xfc__39__/ although I don't know how to reproduce that bug.
*	refactor and function name cleanup	Joey Hess	2012-06-08
\| \| \| \|	(oops, I had a calcMerge and a calc_merge!)
*	make watch use the queue	Joey Hess	2012-06-07
\| \| \| \| \|	May not work. Certianly needs to flush the queue from time to time when only symlink changes are being made.
*	add support for staging other types of blobs, like symlinks, into the index	Joey Hess	2012-06-06
\| \| \| \| \|	Also added a utility TopFilePath type, which could stand to be used more widely.
*	move hashObject to HashObject library and generalize it to support all git ↵	Joey Hess	2012-06-06
\| \| \| \|	object types
*	factor out generic update-index code from unionmerge code	Joey Hess	2012-06-06
\|
*	noop	Joey Hess	2012-04-21
\|
*	wording	Joey Hess	2012-02-09
\|
*	use fileEncoding for git-update-index input handle	Joey Hess	2012-02-04
\|
*	support all filename encodings with ghc 7.4	Joey Hess	2012-02-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under ghc 7.4, this seems to be able to handle all filename encodings again. Including filename encodings that do not match the LANG setting. I think this will not work with earlier versions of ghc, it uses some ghc internals. Turns out that ghc 7.4 has a special filesystem encoding that it uses when reading/writing filenames (as FilePaths). This encoding is documented to allow "arbitrary undecodable bytes to be round-tripped through it". So, to get FilePaths from eg, git ls-files, set the Handle that is reading from git to use this encoding. Then things basically just work. However, I have not found a way to make Text read using this encoding. Text really does assume unicode. So I had to switch back to using String when reading/writing data to git. Which is a pity, because it's some percent slower, but at least it works. Note that stdout and stderr also have to be set to this encoding, or printing out filenames that contain undecodable bytes causes a crash. IMHO this is a misfeature in ghc, that the user can pass you a filename, which you can readFile, etc, but that default, putStr of filename may cause a crash! Git.CheckAttr gave me special trouble, because the filenames I got back from git, after feeding them in, had further encoding breakage. Rather than try to deal with that, I just zip up the input filenames with the attributes. Which must be returned in the same order queried for this to work. Also of note is an apparent GHC bug I worked around in Git.CheckAttr. It used to forkProcess and feed git from the child process. Unfortunatly, after this forkProcess, accessing the `files` variable from the parent returns []. Not the value that was passed into the function. This screams of a bad bug, that's clobbering a variable, but for now I just avoid forkProcess there to work around it. That forkProcess was itself only added because of a ghc bug, #624389. I've confirmed that the test case for that bug doesn't reproduce it with ghc 7.4. So that's ok, except for the new ghc bug I have not isolated and reported. Why does this simple bit of code magnet the ghc bugs? :) Also, the symlink touching code is currently broken, when used on utf-8 filenames in a non-utf-8 locale, or probably on any filename containing undecodable bytes, and I temporarily commented it out.
*	attempt at a quick, utf-8 only fix to the ghc 7.4 problem	Joey Hess	2012-02-01
\| \| \| \| \|	If you have only utf-8 filenames, and need to build git-annex with ghc 7.4, this will work. But, it will crash on non-utf-8 filenames.
*	log --after=date	Joey Hess	2012-01-06
\|
*	more partial function removal	Joey Hess	2011-12-15
\| \| \| \| \|	Left a few Prelude.head's in where it was checked not null and too hard to remove, etc.
*	split out Git/Command.hs	Joey Hess	2011-12-14
\|
*	split more stuff out of Git.hs	Joey Hess	2011-12-14
\|
*	always find optimal merge	Joey Hess	2011-12-12
\| \| \| \| \| \| \| \| \| \|	Testing b9ac5854549636493449fea6830364a01159fbf6, it didn't find the optimal union merge, the second sha was the one to use, at least in the case I tried. Let's just try all shas to see if any can be reused. I stopped using the expensive nub, so despite the use of sets to sort/uniq file contents, this is probably as fast or faster than it was before.
*	refactor	Joey Hess	2011-12-12
\|
*	more efficient union merges	Joey Hess	2011-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tries to avoid generating a new object when the merged content has the same lines that were in the old object. I've noticed some merge commits that only move lines around, like this: - 1323478057.181191s 1 be23c3ac-0ee5-11e0-b185-3b0f9b5b00c5 1323204972.062151s 1 87e06c7a-7388-11e0-ba07-03cdf300bd87 ++1323478057.181191s 1 be23c3ac-0ee5-11e0-b185-3b0f9b5b00c5 Unsure if this will really save anything in practice, since it only looks at one of the two old objects, and maybe I didn't pick the best one.
*	hslint	Joey Hess	2011-12-09
\|
*	improve type signatures with a Ref newtype	Joey Hess	2011-11-16
\| \| \| \| \| \| \| \| \| \| \|	In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for those. Note that this does not prevent mixing up of eg, refs and branches at the type level. Since git really doesn't care, except rare cases like git update-ref, or git tag -d, that seems ok for now. There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref may or may not be a tree-ish, depending on the object type, so there seems no point in trying to represent it at the type level.
*	better name	Joey Hess	2011-11-16
\|
*	cleanup	Joey Hess	2011-11-15
\|
*	merge: Now runs in constant space.	Joey Hess	2011-11-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before, a merge was first calculated, by running various actions that called git and built up a list of lines, which were at the end sent to git update-index. This necessarily used space proportional to the size of the diff between the trees being merged. Now, lines are streamed into git update-index from each of the actions in turn. Runtime size of git-annex merge when merging 50000 location log files drops from around 100 mb to a constant 4 mb. Presumably it runs quite a lot faster, too.
*	cleanup	Joey Hess	2011-11-15
\|
*	avoid space leak writing merge	Joey Hess	2011-11-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reduces the memory use of a merge by 1/3rd. The space leak was apparently because the whole update-index input was generated strictly, not lazily. I wondered if the change to ByteStrings contributed to this, due to the need to convert with L.pack here. But going back to the old code, I still see a much similar leak, and worse performance besides due to it not using ByteStrings. The fix is to just hPutStr the lines repeatedly. (Note the \0 is written separately, to avoid allocation overheads in adding it to the string.) The Git.pipeWrite interface is probably just wrong for any large inputs to git. This was the only place using it for input of any size. There is still at least one other space leak in the merge code.
*	Optimised union merging; now only runs git cat-file once.	Joey Hess	2011-11-12
\|
*	lint	Joey Hess	2011-11-11
\|
*	reorder repo parameters last	Joey Hess	2011-11-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	Many functions took the repo as their first parameter. Changing it consistently to be the last parameter allows doing some useful things with currying, that reduce boilerplate. In particular, g <- gitRepo is almost never needed now, instead use inRepo to run an IO action in the repo, and fromRepo to get a value from the repo. This also provides more opportunities to use monadic and applicative combinators.
*	faster union merge of multiple branches into index	Joey Hess	2011-10-07
\| \| \| \|	only write index once
*	convert all git read/write functions to use ByteStrings	Joey Hess	2011-09-29
\| \| \| \| \| \| \| \| \| \|	This yields a second or so speedup in unused, find, etc. Seems that even when the ByteString is immediately split and then converted to Strings, it's faster. I may try to push ByteStrings out into more of git-annex gradually, although I suspect most of the time-critical parts are already covered now, and many of the rest rely on libraries that only support Strings.
*	use ByteStrings when reading content of files	Joey Hess	2011-09-29
\| \| \| \|	didn't bother to benchmark this
*	split groups of related functions out of Utility	Joey Hess	2011-08-22
\|
*	hlint tweaks	Joey Hess	2011-07-15
\| \| \| \|	Did all sources except Remotes/* and Command/*
*	rename GitUnionMerge to Git.UnionMerge	Joey Hess	2011-06-30
	Also, moved commit function into Git proper, it's not union merge specific.