diff options
author | Peter De Wachter <pdewacht@gmail.com> | 2019-01-01 22:06:58 +0100 |
---|---|---|
committer | fguillot <fred@miniflux.net> | 2019-01-02 21:05:05 -0800 |
commit | 15505ee4a2bd4963d0cbc9d1820e9be641b221ca (patch) | |
tree | ceeff52dc41c012046642dde991722755f556ba5 /reader/atom | |
parent | 31e2669c4db077b15fce496e92e20a05d7a979cb (diff) |
Make UTF-8 the default encoding for XML feeds
Consider the feed http://planet.haskell.org/atom.xml
- This is a UTF-8 encoded XML file
- No encoding declaration in the XML header
- No Unicode byte order mark
- Served with HTTP Content-Type "text/xml" (no charset parameter)
Miniflux lets charset.NewReader handle this. The charset package
implements the HTML5 character encoding algorithm, which, in this
situation, defaults to windows-1252 encoding if there are no UTF-8
characters in the first 1000 bytes. So for this feed, we get the wrong
encoding.
I inserted an explicit "utf8.Valid()" check, which fixes this problem.
Diffstat (limited to 'reader/atom')
0 files changed, 0 insertions, 0 deletions