git-annex-gpl - git-annex without the AGPL

	Commit message (Collapse)	Author	Age
*	import: Add --skip-duplicates option.	Joey Hess	2013-12-04
\| \| \| \| \| \| \|	Note that the hash backends were made to stop printing a (checksum..) message as part of this, since it showed up without a file when deciding whether to act on a file. Should have probably removed that message a while ago anyway, I suppose.
*	Better sanitization of problem characters when generating URL and WORM keys.	Joey Hess	2013-10-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FAT has a lot of characters it does not allow in filenames, like ? and * It's probably the worst offender, but other filesystems also have limitiations. In 2011, I made keyFile escape : to handle FAT, but missed the other characters. It also turns out that when I did that, I was also living dangerously; any existing keys that contained a : had their object location change. Oops. So, adding new characters to escape to keyFile is out. Well, it would be possible to make keyFile behave differently on a per-filesystem basis, but this would be a real nightmare to get right. Consider that a rsync special remote uses keyFile to determine the filenames to use, and we don't know the underlying filesystem on the rsync server.. Instead, I have gone for a solution that is backwards compatable and simple. Its only downside is that already generated URL and WORM keys might not be able to be stored on FAT or some other filesystem that dislikes a character used in the key. (In this case, the user can just migrate the problem keys to a checksumming backend. If this became a big problem, fsck could be made to detect these and suggest a migration.) Going forward, new keys that are created will escape all characters that are likely to cause problems. And if some filesystem comes along that's even worse than FAT (seems unlikely, but here it is 2013, and people are still using FAT!), additional characters can be added to the set that are escaped without difficulty. (Also, made WORM limit the part of the filename that is embedded in the key, to deal with filesystem filename length limits. This could have already been a problem, but is more likely now, since the escaping of the filename can make it longer.) This commit was sponsored by Ian Downes
*	allow building w/o cryptohash	Joey Hess	2013-10-03
\| \| \| \| \|	Mostly for the debian stable autobuilds, which have a too old version to use the Crypto.Hash module.
*	better name	Joey Hess	2013-10-01
\|
*	ensure that hash representations don't change in future	Joey Hess	2013-10-01
\|
*	Added SKEIN256 and SKEIN512 backends	Joey Hess	2013-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SHA3 is still waiting for final standardization. Although this is looking less likely given https://www.cdt.org/blogs/joseph-lorenzo-hall/2409-nist-sha-3 In the meantime, cryptohash implements skein, and it's used by some of the haskell ecosystem (for yesod sessions, IIRC), so this implementation is likely to continue working. Also, I've talked with the cryprohash author and he's a reasonable guy. It makes sense to have an alternate high security hash, in case some horrible attack is found against SHA2 tomorrow, or in case SHA3 comes out and worst fears are realized. I'd also like to support using skein for HMAC. But no hurry there and a new version of cryptohash has much nicer HMAC code, so I will probably wait until I can use that version.
*	hlint	Joey Hess	2013-09-25
\| \| \| \|	test suite still passes
*	Use cryptohash rather than SHA for hashing.	Joey Hess	2013-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a massive win on OSX, which doesn't have a sha256sum normally. Only use external hash commands when the file is > 1 mb, since cryptohash is quite close to them in speed. SHA is still used to calculate HMACs. I don't quite understand cryptohash's API for those. Used the following benchmark to arrive at the 1 mb number. 1 mb file: benchmarking sha256/internal mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950 std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950 found 5 outliers among 100 samples (5.0%) 4 (4.0%) high mild 1 (1.0%) high severe variance introduced by outliers: 10.415% variance is moderately inflated by outliers benchmarking sha256/external mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950 std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950 found 3 outliers among 100 samples (3.0%) 2 (2.0%) high mild 1 (1.0%) high severe 2 mb file: benchmarking sha256/internal mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950 std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950 variance introduced by outliers: 35.540% variance is moderately inflated by outliers benchmarking sha256/external mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950 std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950 found 6 outliers among 100 samples (6.0%) import Crypto.Hash import Data.ByteString.Lazy as L import Criterion.Main import Common testfile :: FilePath testfile = "/run/shm/data" -- on ram disk main = defaultMain [ bgroup "sha256" [ bench "internal" $ whnfIO internal , bench "external" $ whnfIO external ] ] sha256 :: L.ByteString -> Digest SHA256 sha256 = hashlazy internal :: IO String internal = show . sha256 <$> L.readFile testfile external :: IO String external = do s <- readProcess "sha256sum" [testfile] return $ fst $ separate (== ' ') s
*	Fix a few bugs involving filenames that are at or near the filesystem's ↵	Joey Hess	2013-07-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	maximum filename length limit. Started with a problem when running addurl on a really long url, because the whole url is munged into the filename. Ended up doing a fairly extensive review for places where filenames could get too large, although it's hard to say I'm not missed any.. Backend.Url had a 128 character limit, which is fine when the limit is 255, but not if it's a lot shorter on some systems. So check the pathconf() limit. Note that this could result in fromUrl creating different keys for the same url, if run on systems with different limits. I don't see this is likely to cause any problems. That can already happen when using addurl --fast, or if the content of an url changes. Both Command.AddUrl and Backend.Url assumed that urls don't contain a lot of multi-byte unicode, and would fail to truncate an url that did properly. A few places use a filename as the template to make a temp file. While that's nice in that the temp file name can be easily related back to the original filename, it could lead to `git annex add` failing to add a filename that was at or close to the maximum length. Note that in Command.Add.lockdown, the template is still derived from the filename, just with enough space left to turn it into a temp file. This is an important optimisation, because the assistant may lock down a bunch of files all at once, and using the same template for all of them would cause openTempFile to iterate through the same set of names, looking for an unused temp file. I'm not very happy with the relatedTemplate hack, but it avoids that slowdown. Backend.WORM does not limit the filename stored in the key. I have not tried to change that; so git annex add will fail on really long filenames when using the WORM backend. It seems better to preserve the invariant that a WORM key always contains the complete filename, since the filename is the only unique material in the key, other than mtime and size. Since nobody has complained about add failing (I think I saw it once?) on WORM, probably it's ok, or nobody but me uses it. There may be compatability problems if using git annex addurl --fast or the WORM backend on a system with the 255 limit and then trying to use that repo in a system with a smaller limit. I have not tried to deal with those. This commit was sponsored by Alexander Brem. Thanks!
*	fix permission damage (thanks, Windows)	Joey Hess	2013-05-11
\|
*	clean up from windows porting	Joey Hess	2013-05-11
\|
*	git-annex now builds on Windows (doesn't work)	Joey Hess	2013-05-11
\|
*	configure: Better checking that sha commands output in the desired format.	Joey Hess	2013-05-08
\| \| \| \| \| \|	Run the same code git-annex used to get the sha, including its sanity checking. Much better than old grep. Should detect FreeBSD systems with sha commands that output in stange format.
*	SHA: Add a runtime sanity check that sha commands output something that ↵	Joey Hess	2013-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	appears to be a real sha. This after fielding a bug where git-annex was built with a sha256 program whose output checked out, but was then run with one that output lines like: SHA256 (file) = <sha here> Which it then parsed as having a SHA256 of "SHA256"! Now the output of the command is required to be of the right length, and contain only the right characters.
*	expose Control.Monad.join	Joey Hess	2013-04-22
\| \| \| \| \|	I think I've been looking for that function for some time. Ie, I remember wanting to collapse Just Nothing to Nothing.
*	SHA*E backends: Exclude non-alphanumeric characters from extensions.	Joey Hess	2012-12-20
\| \| \| \| \| \|	* SHAE backends: Exclude non-alphanumeric characters from extensions. migrate: Remove leading \ in SHA* checksums, and non-alphanumerics from extensions of SHA*E keys.
*	handle sha*sum's leading \ in checksum with certian unsual filenames	Joey Hess	2012-12-20
\| \| \| \| \| \| \| \|	* Bugfix: Remove leading \ from checksums output by shasum commands, when the filename contains \ or a newline. Closes: #696384 fsck: Still accept checksums with a leading \ as valid, now that above bug is fixed. * migrate: Remove leading \ in checksums
*	where indenting	Joey Hess	2012-11-11
\|
*	Avoid crashing on encoding errors in filenames when writing transfer info ↵	Joey Hess	2012-09-16
\| \| \| \|	files and reading from checksum commands.
*	SHA256E is new default backend	Joey Hess	2012-09-12
\| \| \| \| \| \| \| \|	The default backend used when adding files to the annex is changed from SHA256 to SHA256E, to simplify interoperability with OSX, media players, and various programs that needlessly look at symlink targets. To get old behavior, add a .gitattributes containing: * annex.backend=SHA256
*	Bugfix: Fix fsck in SHA*E backends, when the key contains composite ↵	Joey Hess	2012-08-24
\| \| \| \|	extensions, as added in 3.20120721.
*	better readProcess	Joey Hess	2012-07-19
\|
*	add back debug logging	Joey Hess	2012-07-19
\| \| \| \| \| \| \| \| \| \| \| \| \|	Make Utility.Process wrap the parts of System.Process that I use, and add debug logging to them. Also wrote some higher-level code that allows running an action with handles to a processes stdin or stdout (or both), and checking its exit status, all in a single function call. As a bonus, the debug logging now indicates whether the process is being run to read from it, feed it data, chat with it (writing and reading), or just call it for its side effect.
*	switch from System.Cmd.Utils to System.Process	Joey Hess	2012-07-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test suite now passes with -threaded! I traced back all the hangs with -threaded to System.Cmd.Utils. It seems it's just crappy/unsafe/outdated, and should not be used. System.Process seems to be the cool new thing, so converted all the code to use it instead. In the process, --debug stopped printing commands it runs. I may try to bring that back later. Note that even SafeSystem was switched to use System.Process. Since that was a modified version of code from System.Cmd.Utils, it needed to be converted too. I also got rid of nearly all calls to forkProcess, and all calls to executeFile, which I'm also doubtful about working well with -threaded.
*	fix leading period before two-element extensions	Joey Hess	2012-07-06
\|
*	SHAnE backends are now smarter about composite extensions, such as .tar.gz ↵	Joey Hess	2012-07-05
\| \| \| \|	Closes: #680450
*	Use SHA library for files less than 50 kb in size, at which point it's ↵	Joey Hess	2012-07-04
\| \| \| \|	faster than forking the more optimised external program.
*	When shaNsum commands cannot be found, use the Haskell SHA library (already ↵	Joey Hess	2012-07-04
\| \| \| \| \| \| \| \|	a dependency) to do the checksumming. This may be slower, but avoids portability problems. Using Crypto's version of the hashes would be another option. I need to benchmark it. The SHA2 library (which provides SHA1 also, confusing name) may be the fastest option, but is not currently in Debian.
*	maintain set of files pendingAdd	Joey Hess	2012-06-20
\| \| \| \| \| \| \| \|	Kqueue needs to remember which files failed to be added due to being open, and retry them. This commit gets the data in place for such a retry thread. Broke KeySource out into its own file, and added Eq and Ord instances so it can be stored in a Set.
*	separate source of content from the filename associated with the key when ↵	Joey Hess	2012-06-05
\| \| \| \| \| \|	generating a key This already made migrate's code a lot simpler.
*	Require that the SHA256 backend can be used when building, since it's the ↵	Joey Hess	2012-05-31
\| \| \| \|	default.
*	handle really long urls	Joey Hess	2012-02-16
\| \| \| \| \|	Using the whole url as a key can make the filename too long. Truncate and use a md5sum for uniqueness if necessary.
*	addurl --fast: Verifies that the url can be downloaded (only getting its ↵	Joey Hess	2012-02-10
\| \| \| \|	head), and records the size in the key.
*	fsck --from	Joey Hess	2012-01-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fscking a remote is now supported. It's done by retrieving the contents of the specified files from the remote, and checking them, so can be an expensive operation. (Several optimisations are possible, to speed it up, of course.. This is the slow and stupid remote fsck to start with.) Still, if the remote is a special remote, or a git repository that you cannot run fsck in locally, it's nice to have the ability to fsck it. If you have any directory special remotes, now would be a good time to fsck them, in case you were hit by the data loss bug fixed in the previous release!
*	convert fsckKey to a Maybe	Joey Hess	2012-01-19
\| \| \| \|	This way it's clear when a backend does not implement its own fsck checks.
*	type alias cleanup	Joey Hess	2011-12-31
\|
*	more partial function removal	Joey Hess	2011-12-15
\| \| \| \| \|	Left a few Prelude.head's in where it was checked not null and too hard to remove, etc.
*	Prevent key names from containing newlines.	Joey Hess	2011-12-06
\| \| \| \| \| \| \| \| \|	There are several places where it's assumed a key can be written on one line. One is in the format of the .git/annex/unused files. The difficult one is that filenames derived from keys are fed into git cat-file --batch, which has a line based input. (And no -z option.) So, for now it's best to block such keys being created.
*	add support for using hashDirLower in addition to hashDirMixed	Joey Hess	2011-11-28
\| \| \| \| \| \| \| \| \| \|	Supporting multiple directory hash types will allow converting to a different one, without a flag day. gitAnnexLocation now checks which of the possible locations have a file. This means more statting of files. Several places currently use gitAnnexLocation and immediately check if the returned file exists; those need to be optimised.
*	reorder repo parameters last	Joey Hess	2011-11-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	Many functions took the repo as their first parameter. Changing it consistently to be the last parameter allows doing some useful things with currying, that reduce boilerplate. In particular, g <- gitRepo is almost never needed now, instead use inRepo to run an IO action in the repo, and fromRepo to get a value from the repo. This also provides more opportunities to use monadic and applicative combinators.
*	use SHA256 by default	Joey Hess	2011-11-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To get old behavior, add a .gitattributes containing: * annex.backend=WORM I feel that SHA256 is a better default for most people, as long as their systems are fast enough that checksumming their files isn't a problem. git-annex should default to preserving the integrity of data as well as git does. Checksum backends also work better with editing files via unlock/lock. I considered just using SHA1, but since that hash is believed to be somewhat near to being broken, and git-annex deals with large files which would be a perfect exploit medium, I decided to go to a SHA-2 hash. SHA512 is annoyingly long when displayed, and git-annex displays it in a few places (and notably it is shown in ls -l), so I picked the shorter hash. Considered SHA224 as it's even shorter, but feel it's a bit weird. I expect git-annex will use SHA-3 at some point in the future, but probably not soon! Note that systems without a sha256sum (or sha256) program will fall back to defaulting to SHA1.
*	Record uuid when auto-initializing a remote so it shows in status.	Joey Hess	2011-11-02
\|
*	playing with >=>	Joey Hess	2011-10-31
\| \| \| \| \|	Apparently in haskell if you teach a man to fish, he'll write more pointfree code.
*	minor syntax changes	Joey Hess	2011-10-11
\|
*	rename	Joey Hess	2011-10-05
\|
*	rename	Joey Hess	2011-10-04
\|
*	factor out common imports	Joey Hess	2011-10-03
\| \| \| \|	no code changes
*	go go gadget hlint	Joey Hess	2011-09-20
\|
*	split groups of related functions out of Utility	Joey Hess	2011-08-22
\|
*	moved files around	Joey Hess	2011-08-20
\|