classification
Title: UTF-8 encoding not enforced
Type: behavior Stage:
Components: Unicode Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, ezio.melotti, jwilk, loewis, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2013-12-10 11:48 by jwilk, last changed 2013-12-11 14:17 by benjamin.peterson.

Files
File name Uploaded Description Edit
test1.py jwilk, 2013-12-10 11:49
test3.py jwilk, 2013-12-11 10:42
Messages (3)
msg205795 - (view) Author: Jakub Wilk (jwilk) Date: 2013-12-10 11:48
I created a Python file which contained a non-UTF-8 string literal (but no Unicode literals), and added "UTF-8" encoding declaration to it. I expected that Python will raise SyntaxError when importing such module, but it doesn't:

$ python --version
Python 2.7.6

$ python -c 'import test1' && echo ok
ok


Curiously enough, if I change the declaration to "UTF8", then the exception is raised as expected:

$ sed -e 's/UTF-8/UTF8/' < test1.py > test2.py
$ python -c 'import test2'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "test2.py", line 2
SyntaxError: 'utf8' codec can't decode byte 0xa1 in position 5: invalid start byte
msg205827 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-12-10 15:24
Yes, this is a silly bug where we "shortcut" decoding of utf-8 files by not checking if its valid UTF-8. However, this behavior has been around for a long time, so I'm not going to change it in 2.7.x.
msg205901 - (view) Author: Jakub Wilk (jwilk) Date: 2013-12-11 10:42
With a slightly adapted test case, I see the same behavior in Python 3.3.3. Perhaps it would be worth fixing the bug in Python 3.4?
History
Date User Action Args
2013-12-11 14:17:06benjamin.petersonsetstatus: closed -> open
resolution: wont fix ->
versions: + Python 3.4, - Python 2.7
2013-12-11 10:42:12jwilksetfiles: + test3.py

messages: + msg205901
2013-12-10 15:24:52benjamin.petersonsetstatus: open -> closed

nosy: + benjamin.peterson
messages: + msg205827

resolution: wont fix
2013-12-10 11:59:02serhiy.storchakasetnosy: + loewis, serhiy.storchaka
2013-12-10 11:53:30ezio.melottisetnosy: + ezio.melotti, vstinner
type: behavior
components: + Unicode
2013-12-10 11:49:25jwilksetfiles: + test1.py
2013-12-10 11:48:59jwilkcreate