Author nicolasg
Recipients docs@python, nicolasg
Date 2011-12-03.13:17:21
SpamBayes Score 2.39652e-11
Marked as misclassified No
Message-id <1322918242.84.0.309365391343.issue13525@psf.upfronthosting.co.za>
In-reply-to
Content
Current Behaviour

The tutorial of Python 3.2.x has an example to set an encoding in a source file: http://docs.python.org/py3k/tutorial/interpreter.html#source-code-encoding

It explains to set the following line at the start of the source code:
# -*- coding: cp-1252 -*-

However when done exactly so, Python raises the following exception:
SyntaxError: encoding problem: with BOM

The problem seems to be that Python knows Windows codepage 1252 as windows-1252 (its IANA charset name, see http://www.iana.org/assignments/charset-reg/windows-1252 ) or alternatively as cp1252 (without dash) but not as cp-1252 (with dash).

As this is an example in the tutorial is particularly problematic, as users might not understand how to do it correctly.

This is still the case in the tutorial of Python 3.3 alpha: http://docs.python.org/dev/tutorial/interpreter.html#source-code-encoding


Expected Behaviour

The tutorial should give a correct example, for example with:
# -*- coding: windows-1252 -*-

Alternatively a totally other example as for Python 2.7 would be nice too: http://docs.python.org/tutorial/interpreter.html#source-code-encoding


Notes:
I have tested this with following Python implementations:
- Python 3.2.1 (openSUSE 12.1) on Linux
- Python 3.2.2 on Windows 7 SP1 64 Bits
- Python 3.2.2 on MacOS 10.5.8
(Always on the command line; I have not tested in IDLE.)
History
Date User Action Args
2011-12-03 13:17:22nicolasgsetrecipients: + nicolasg, docs@python
2011-12-03 13:17:22nicolasgsetmessageid: <1322918242.84.0.309365391343.issue13525@psf.upfronthosting.co.za>
2011-12-03 13:17:22nicolasglinkissue13525 messages
2011-12-03 13:17:21nicolasgcreate