msg31755 - (view) |
Author: Christoph Zwerschke (cito) * |
Date: 2007-04-10 20:58 |
If a .po file has a BOM (byte order mark) at the beginning, as is often the case for utf-8 files created on Windows, msgfmt.py complines about a syntax error.
The attached patch fixes this problem.
|
msg31756 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2007-04-11 16:07 |
Martin, is this your code?
|
msg31757 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2007-04-11 22:13 |
It's my code, but I will need to establish first whether it's a bug. That depends on what the PO specification says, and, if is it silent on the matter, what GNU gettext does.
|
msg31758 - (view) |
Author: Christoph Zwerschke (cito) * |
Date: 2007-04-12 09:10 |
It may well be that GNU gettext also chokes on a BOM, because they aren't used under Linux. But I think as a Python tool it should be more Windows-tolerant. The annoying thing is that you get a syntax error, but everything looks right because the BOM is usually invisible. Such error messages are really frustrating. Either the BOM should be silently ignored (as in the patch) or the users should get a friendly error message asking them to save the file without BOM. If GNU behaves badly to Windows users, that's not an excuse to do the same. They are already suffering enough because of their (or their bosses') bad choice of OS ;-)
|
msg70042 - (view) |
Author: Christoph Zwerschke (cito) * |
Date: 2008-07-19 16:17 |
Small improvement of the patch: Instead of hardcoding the BOM as
'\xef\xbb\xbf', we should use codecs.BOM_UTF8.
|
msg125940 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-01-10 22:18 |
Extract of the Unicode standard: "Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature".
See also the following section explaing issues with UTF-8 BOM:
http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
I agree that Python should handle (UTF-8) BOM to read a CSV file (#7185), because the file format is common on Windows.
But msgfmt is an UNIX tool: I would expect that Python behaves like the original msgfmt tool, fail with a fatal error on the BOM "invisible character". How do you explain to a user msgfmt fails but not msgfmt.py?
About the patch: *ignore* the BOM is not a good idea. The BOM announces the encoding (eg. UTF-8): if a Content-Type header announces another encoding, you should raise an error.
|
msg125941 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-01-10 22:19 |
See also issue #7651: "Python3: guess text file charset using the BOM".
|
msg290519 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-03-26 09:47 |
Corresponding GNU gettext issue [1] was closed as "Not a Bug".
[1] https://savannah.gnu.org/bugs/?18345
|
msg290524 - (view) |
Author: Christoph Zwerschke (cito) * |
Date: 2017-03-26 10:53 |
> Corresponding GNU gettext issue [1] was closed as "Not a Bug".
Though I think the rationale given there pointing to RFC3629 section 6 is wrong, since that section explicitly refers to Internet protocols, but PO files are not an Internet protocol.
Anyway, if silently ignoring BOMs is considered a bad idea, then at least there should be a more helpful error message. Because the BOM is invisible, users - who may not even be aware that something like a BOM exist or that their editor saves files with BOM - may be frustrated about the current error message because they don't see any invalid character when they open the PO file in their editor. A more explicit error message like "PO files should not be saved with a byte order mark" might point users in the right direction.
After all, these tools are supposed to be used directly by human beings on the command line. Who said that command line tools must not be user friendly?
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:23 | admin | set | github: 44827 |
2021-04-22 15:26:29 | iritkatriel | set | title: msgfmt cannot cope with BOM -> msgfmt cannot cope with BOM - improve error message resolution: not a bug -> versions:
+ Python 3.11, - Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 |
2017-03-26 10:53:38 | cito | set | status: pending -> open
messages:
+ msg290524 versions:
+ Python 3.4, Python 3.5, Python 3.6 |
2017-03-26 09:47:55 | serhiy.storchaka | set | status: open -> pending
nosy:
+ serhiy.storchaka messages:
+ msg290519
resolution: not a bug |
2011-01-10 22:19:48 | vstinner | set | nosy:
loewis, rhettinger, cito, vstinner, eric.araujo messages:
+ msg125941 |
2011-01-10 22:18:38 | vstinner | set | nosy:
loewis, rhettinger, cito, vstinner, eric.araujo messages:
+ msg125940 |
2011-01-06 17:03:44 | pitrou | set | nosy:
+ vstinner stage: test needed -> needs patch
versions:
+ Python 2.7, Python 3.2, Python 3.3, - Python 2.6 |
2010-06-11 14:58:50 | eric.araujo | set | nosy:
+ eric.araujo
|
2009-05-15 02:21:09 | ajaksu2 | set | versions:
+ Python 2.6, Python 3.1, - Python 2.5 nosy:
loewis, rhettinger, cito components:
+ Unicode keywords:
+ patch type: behavior stage: test needed |
2008-07-19 16:17:29 | cito | set | messages:
+ msg70042 |
2007-04-10 20:58:04 | cito | create | |