aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorGravatar Joey Hess <joeyh@joeyh.name>2018-03-05 11:25:01 -0400
committerGravatar Joey Hess <joeyh@joeyh.name>2018-03-05 11:25:01 -0400
commitda3f2ee6994daafe58b890c3fb87ccf5ef61f3f2 (patch)
tree661efe702c741449882fd21e1840dae1b1548253 /doc
parentdf575f0db7c945a26735d0944b05c7e989cdfcda (diff)
Improve SHA*E extension extraction code
Do not treat parts of the filename that contain punctuation or other non-alphanumeric characters as extensions. Before, such characters were filtered out. Note that in 38bd7ca3cce455c20edcee656c706939087c6a69 "foo.ba__________r" was munged to ".bar" and so incorrectly treated as an extension. That was fixed by changing the filter order, but not allowing punctuation seems a better fix. This assumes that extensions containing punctuation are rare. "_" seems the most likely character; I used it in ikiwiki "._comment" files. But I can't recall seeing it anywhere else. It certianly seems that no commonly used extensions contain punctuation. If git-annex doesn't treat "._comment" as an extension, it's not likely to break software that expects to see that extension like some software expects to see .epub or .mp3. This commit was sponsored by Jack Hill on Patreon.
Diffstat (limited to 'doc')
-rw-r--r--doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum.mdwn2
-rw-r--r--doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum/comment_5_7f5a6ba6ed7b6f720874f8ded6edaa3c._comment28
2 files changed, 30 insertions, 0 deletions
diff --git a/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum.mdwn b/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum.mdwn
index 84ca70bea..0534925ea 100644
--- a/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum.mdwn
+++ b/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum.mdwn
@@ -3,6 +3,8 @@ Files with special unicode characters(in this case japanese) for some reason hav
This is an issue because it causes errors when using glacier-cli when uploading copies to Glacier vault.
+[[!meta title="kanji in key extension cause glacier-cli upload error"]]
+
### What steps will reproduce the problem?
Here's how it looks for me:
diff --git a/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum/comment_5_7f5a6ba6ed7b6f720874f8ded6edaa3c._comment b/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum/comment_5_7f5a6ba6ed7b6f720874f8ded6edaa3c._comment
new file mode 100644
index 000000000..1d8e1cabe
--- /dev/null
+++ b/doc/bugs/git-annex_adds_unicode_characters_at_end_of_checksum/comment_5_7f5a6ba6ed7b6f720874f8ded6edaa3c._comment
@@ -0,0 +1,28 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 5"""
+ date="2018-03-05T14:47:20Z"
+ content="""
+The easy workaround to bugs like this migrate the file to the
+SHA256 backend rather than SHA256E.
+
+It may be obvious to us that a file ending in "(feat. xy).mp3"
+has an extension of ".mp3" and not of ". xy).mp3", but this is not very
+obvious to git-annex, which would like to treat a file ending in ".tar.gz"
+as having that compound extension.
+
+The only rule I can think of that would help git-annex understand this is
+to not allow punctuation (other than "." in file extensions). Which it
+actually already filters out of extensions, which is why the extension it
+comes up with is ".xy.mp3". But it could notice the space and closing paren
+in the filename and assume those are not part of an extension. It might
+bite some file with an extension like .foo_", I can't recall seeing many
+such extensions. Ok, made this change.
+
+It remains a bug in the glacier special remote if unicode characters
+prevent uploading to it. We can't limit file
+extensions to ascii, it's perfectly reasonable to use your native language
+characters in a file extension. Leaving bug open since my change does
+nothing about whatever upload bug glacier-cli has. Is the python program
+failing?
+"""]]