This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: UTF-8 encoding not enforced
Type: behavior Stage:
Components: Unicode Versions: Python 3.4
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, ezio.melotti, jwilk, loewis, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2013-12-10 11:48 by jwilk, last changed 2022-04-11 14:57 by admin.

File name Uploaded Description Edit jwilk, 2013-12-10 11:49 jwilk, 2013-12-11 10:42
Messages (3)
msg205795 - (view) Author: Jakub Wilk (jwilk) Date: 2013-12-10 11:48
I created a Python file which contained a non-UTF-8 string literal (but no Unicode literals), and added "UTF-8" encoding declaration to it. I expected that Python will raise SyntaxError when importing such module, but it doesn't:

$ python --version
Python 2.7.6

$ python -c 'import test1' && echo ok

Curiously enough, if I change the declaration to "UTF8", then the exception is raised as expected:

$ sed -e 's/UTF-8/UTF8/' < >
$ python -c 'import test2'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "", line 2
SyntaxError: 'utf8' codec can't decode byte 0xa1 in position 5: invalid start byte
msg205827 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-12-10 15:24
Yes, this is a silly bug where we "shortcut" decoding of utf-8 files by not checking if its valid UTF-8. However, this behavior has been around for a long time, so I'm not going to change it in 2.7.x.
msg205901 - (view) Author: Jakub Wilk (jwilk) Date: 2013-12-11 10:42
With a slightly adapted test case, I see the same behavior in Python 3.3.3. Perhaps it would be worth fixing the bug in Python 3.4?
Date User Action Args
2022-04-11 14:57:55adminsetgithub: 64141
2013-12-11 14:17:06benjamin.petersonsetstatus: closed -> open
resolution: wont fix ->
versions: + Python 3.4, - Python 2.7
2013-12-11 10:42:12jwilksetfiles: +

messages: + msg205901
2013-12-10 15:24:52benjamin.petersonsetstatus: open -> closed

nosy: + benjamin.peterson
messages: + msg205827

resolution: wont fix
2013-12-10 11:59:02serhiy.storchakasetnosy: + loewis, serhiy.storchaka
2013-12-10 11:53:30ezio.melottisetnosy: + ezio.melotti, vstinner
type: behavior
components: + Unicode
2013-12-10 11:49:25jwilksetfiles: +
2013-12-10 11:48:59jwilkcreate