diff options
4 files changed, 34 insertions, 1 deletions
diff --git a/Backend/Hash.hs b/Backend/Hash.hs index da0f7df9b..1d5436823 100644 --- a/Backend/Hash.hs +++ b/Backend/Hash.hs @@ -94,7 +94,7 @@ selectExtension f | otherwise = intercalate "." ("":es) where es = filter (not . null) $ reverse $ - take 2 $ map (filter validInExtension) $ + take 2 $ filter (all validInExtension) $ takeWhile shortenough $ reverse $ splitc '.' $ takeExtensions f shortenough e = length e <= 4 -- long enough for "jpeg" @@ -3,6 +3,9 @@ git-annex (6.20180228) UNRELEASED; urgency=medium * Support exporttree=yes for rsync special remotes. * Dial back optimisation when building on arm, which prevents ghc and llc from running out of memory when optimising some files. + * Improve SHA*E extension extraction code to not treat parts of the + filename that contain punctuation or other non-alphanumeric characters + as extensions. Before, such characters were filtered out. -- Joey Hess <id@joeyh.name> Wed, 28 Feb 2018 11:53:03 -0400 diff --git a/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum.mdwn b/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum.mdwn index 84ca70bea..0534925ea 100644 --- a/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum.mdwn +++ b/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum.mdwn @@ -3,6 +3,8 @@ Files with special unicode characters(in this case japanese) for some reason hav This is an issue because it causes errors when using glacier-cli when uploading copies to Glacier vault. +[[!meta title="kanji in key extension cause glacier-cli upload error"]] + ### What steps will reproduce the problem? Here's how it looks for me: diff --git a/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum/comment_5_7f5a6ba6ed7b6f720874f8ded6edaa3c._comment b/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum/comment_5_7f5a6ba6ed7b6f720874f8ded6edaa3c._comment new file mode 100644 index 000000000..1d8e1cabe --- /dev/null +++ b/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum/comment_5_7f5a6ba6ed7b6f720874f8ded6edaa3c._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2018-03-05T14:47:20Z" + content=""" +The easy workaround to bugs like this migrate the file to the +SHA256 backend rather than SHA256E. + +It may be obvious to us that a file ending in "(feat. xy).mp3" +has an extension of ".mp3" and not of ". xy).mp3", but this is not very +obvious to git-annex, which would like to treat a file ending in ".tar.gz" +as having that compound extension. + +The only rule I can think of that would help git-annex understand this is +to not allow punctuation (other than "." in file extensions). Which it +actually already filters out of extensions, which is why the extension it +comes up with is ".xy.mp3". But it could notice the space and closing paren +in the filename and assume those are not part of an extension. It might +bite some file with an extension like .foo_", I can't recall seeing many +such extensions. Ok, made this change. + +It remains a bug in the glacier special remote if unicode characters +prevent uploading to it. We can't limit file +extensions to ascii, it's perfectly reasonable to use your native language +characters in a file extension. Leaving bug open since my change does +nothing about whatever upload bug glacier-cli has. Is the python program +failing? +"""]] |