aboutsummaryrefslogtreecommitdiff
path: root/Utility/FileSystemEncoding.hs
Commit message (Collapse)AuthorAge
* adeiu, MissingHGravatar Joey Hess2017-05-16
| | | | | | | | | | | | | | | | Removed dependency on MissingH, instead depending on the split library. After laying groundwork for this since 2015, it was mostly straightforward. Added Utility.Tuple and Utility.Split. Eyeballed System.Path.WildMatch while implementing the same thing. Since MissingH's progress meter display was being used, I re-implemented my own. Bonus: Now progress is displayed for transfers of files of unknown size. This commit was sponsored by Shane-o on Patreon.
* stop using MissingH for MD5Gravatar Joey Hess2017-05-15
| | | | | | | | | | Cryptonite is faster and allocates less, and I want to get rid of MissingH use. Note that the new dependency on memory is free; it's a dependency of cryptonite. This commit was supported by the NSF-funded DataLad project.
* work around ghc segfaultGravatar Joey Hess2016-12-30
| | | | | | | | | | | | | hSetEncoding of a closed handle segfaults. https://ghc.haskell.org/trac/ghc/ticket/7161 3b9d9a267b7c9247d36d9b622e1b836724ca5fb0 introduced the crash. In particular, stdin may get closed (by eg, getContents) and then trying to set its encoding will crash. We didn't need to adjust stdin's encoding anyway, but only stderr, to work around https://github.com/yesodweb/persistent/issues/474 Thanks to Mesar Hameed for assistance related to reproducing this bug.
* Always use filesystem encoding for all file and handle reads and writes.Gravatar Joey Hess2016-12-24
| | | | | This is a big scary change. I have convinced myself it should be safe. I hope!
* optimise read and write for Keys database (untested)Gravatar Joey Hess2015-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Writes are optimised by queueing up multiple writes when possible. The queue is flushed after the Annex monad action finishes. That makes it happen on program termination, and also whenever a nested Annex monad action finishes. Reads are optimised by checking once (per AnnexState) if the database exists. If the database doesn't exist yet, all reads return mempty. Reads also cause queued writes to be flushed, so reads will always be consistent with writes (as long as they're made inside the same Annex monad). A future optimisation path would be to determine when that's not necessary, which is probably most of the time, and avoid flushing unncessarily. Design notes for this commit: - separate reads from writes - reuse a handle which is left open until program exit or until the MVar goes out of scope (and autoclosed then) - writes are queued - queue is flushed periodically - immediate queue flush before any read - auto-flush queue when database handle is garbage collected - flush queue on exit from Annex monad (Note that this may happen repeatedly for a single database connection; or a connection may be reused for multiple Annex monad actions, possibly even concurrent ones.) - if database does not exist (or is empty) the handle is not opened by reads; reads instead return empty results - writes open the handle if it was not open previously
* use intercalate instead of MissingH's joinGravatar Joey Hess2015-11-17
| | | | The two functions are identical.
* avoid throwing exception when String is not encoded using the filesystem ↵Gravatar Joey Hess2015-08-12
| | | | | | | | encoding Since _encodeFilePath generates a String that doesn't use the filesystem encoding, when this exception is caught, we know we already have such a String, and can just return it as-is.
* fix test suite fail in LANG=CGravatar Joey Hess2015-08-12
| | | | | | | | | | | | This was caused by 88aeb849f620a13da47508045daae461a223c997 an Arbitrary String is not necessarily encoded using the filesystem encoding, and in a non-utf8 locale, encodeBS throws an exception on such a string. All I could think to do is limit test data to ascii. This shouldn't be a problem in practice, because the all Strings in git-annex that are not generated by Arbitrary should be loaded in a way that does apply the filesystem encoding.
* Fix setting/setting/viewing metadata that contains unicode or other special ↵Gravatar Joey Hess2015-08-11
| | | | | | | | | | | | | | | | | characters, when in a non-unicode locale. Oh boy, not again. So, another place that the filesystem encoding needs to be applied. Yay. In passing, I changed decodeBS so if a NUL is embedded in the input, the resulting FilePath doesn't get truncated at that NUL. This was needed to make prop_b64_roundtrips pass, and on reviewing the callers of decodeBS, I didn't see any where this wouldn't make sense. When a FilePath is used to operate on the filesystem, it'll get truncated at a NUL anyway, whereas if a String is being used for something else, it might conceivably have a NUL in it, and we wouldn't want it to get truncated when going through decodeBS. (NB: There may be a speed impact from this change.)
* disable horrible tab warning, needed in every file that Setup.hs pulls inGravatar Joey Hess2015-05-10
| | | | | | | | This is certianly a cabal bug for not passing the build options in the cabal file when building Setup.hs. And, why oh why did ghc enable this warning by default? So unhappy with this choice.
* metadata: Fix encoding problem that led to mojibake when storing metadata ↵Gravatar Joey Hess2015-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | strings that contained both unicode characters and a space (or '!') character. The fix is to stop using w82s, which does not properly reconstitute unicode strings. Instrad, use utf8 bytestring to get the [Word8] to base64. This passes unicode through perfectly, including any invalid filesystem encoded characters. Note that toB64 / fromB64 are also used for creds and cipher embedding. It would be unfortunate if this change broke those uses. For cipher embedding, note that ciphers can contain arbitrary bytes (should really be using ByteString.Char8 there). Testing indicated it's not safe to use the new fromB64 there; I think that characters were incorrectly combined. For credpair embedding, the username or password could contain unicode. Before, that unicode would fail to round-trip through the b64. So, I guess this is not going to break any embedded creds that worked before. This bug may have affected some creds before, and if so, this change will not fix old ones, but should fix new ones at least.
* update my email address and homepage urlGravatar Joey Hess2015-01-21
|
* fix some mixed space+tab indentationGravatar Joey Hess2014-10-09
| | | | | | | | | This fixes all instances of " \t" in the code base. Most common case seems to be after a "where" line; probably vim copied the two space layout of that line. Done as a background task while listening to episode 2 of the Type Theory podcast.
* relicense general utility library code to BSDGravatar Joey Hess2014-05-10
| | | | | Omitted a couple of files what have had significant contributions from others.
* Windows: Fix some filename encoding bugs.Gravatar Joey Hess2014-03-19
| | | | | | http://git-annex.branchable.com/bugs/Unicode_file_names_ignored_on_Windows/ Not a complete fix yet.
* Fix a few bugs involving filenames that are at or near the filesystem's ↵Gravatar Joey Hess2013-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | maximum filename length limit. Started with a problem when running addurl on a really long url, because the whole url is munged into the filename. Ended up doing a fairly extensive review for places where filenames could get too large, although it's hard to say I'm not missed any.. Backend.Url had a 128 character limit, which is fine when the limit is 255, but not if it's a lot shorter on some systems. So check the pathconf() limit. Note that this could result in fromUrl creating different keys for the same url, if run on systems with different limits. I don't see this is likely to cause any problems. That can already happen when using addurl --fast, or if the content of an url changes. Both Command.AddUrl and Backend.Url assumed that urls don't contain a lot of multi-byte unicode, and would fail to truncate an url that did properly. A few places use a filename as the template to make a temp file. While that's nice in that the temp file name can be easily related back to the original filename, it could lead to `git annex add` failing to add a filename that was at or close to the maximum length. Note that in Command.Add.lockdown, the template is still derived from the filename, just with enough space left to turn it into a temp file. This is an important optimisation, because the assistant may lock down a bunch of files all at once, and using the same template for all of them would cause openTempFile to iterate through the same set of names, looking for an unused temp file. I'm not very happy with the relatedTemplate hack, but it avoids that slowdown. Backend.WORM does not limit the filename stored in the key. I have not tried to change that; so git annex add will fail on really long filenames when using the WORM backend. It seems better to preserve the invariant that a WORM key always contains the complete filename, since the filename is the only unique material in the key, other than mtime and size. Since nobody has complained about add failing (I think I saw it once?) on WORM, probably it's ok, or nobody but me uses it. There may be compatability problems if using git annex addurl --fast or the WORM backend on a system with the 255 limit and then trying to use that repo in a system with a smaller limit. I have not tried to deal with those. This commit was sponsored by Alexander Brem. Thanks!
* add decodeW8Gravatar Joey Hess2012-09-13
|
* [Word8] to filesystem encoded StringGravatar Joey Hess2012-06-20
| | | | My, GHC makes this hard.
* perhaps more clear typeGravatar Joey Hess2012-03-10
|
* fix key directory hash calculation codeGravatar Joey Hess2012-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix Key directory hash calculation code to behave as it did before version 3.20120227 when a key contains non-ascii. The hash directories for a given Key are based on its md5sum. Prior to ghc 7.4, Keys contained raw, undecoded bytes, so the md5sum was taken of each byte in turn. With the ghc 7.4 filename encoding change, keys contains decoded unicode characters (possibly with surrigates for undecodable bytes). This changes the result of the md5sum, since the md5sum used is pure haskell and supports unicode. And that won't do, as git-annex will start looking in a different hash directory for the content of a key. The surrigates are particularly bad, since that's essentially a ghc implementation detail, so could change again at any time. Also, changing the locale changes how the bytes are decoded, which can also change the md5sum. Symptoms would include things like: * git annex fsck would complain that no copies existed of a file, despite its symlink pointing to the content that was locally present * git annex fix would change the symlink to use the wrong hash directory. Only WORM backend is likely to have been affected, since only it tends to include much filename data (SHA1E could in theory also be affected). I have not tried to support the hash directories used by git-annex versions 3.20120227 to 3.20120308, so things added with those versions with WORM will require manual fixups. Sorry for the inconvenience!
* factor out Utility.FileSystemEncodingGravatar Joey Hess2012-03-09
|
* refactorGravatar Joey Hess2012-03-09