Author ezio.melotti
Recipients bupjae, ezio.melotti
Date 2009-02-02.01:56:52
SpamBayes Score 1.86253e-08
Marked as misclassified No
Message-id <1233539816.04.0.865880973678.issue5127@psf.upfronthosting.co.za>
In-reply-to
Content
Here (winxpsp2, Py3, cp850-terminal) the license works fine:
>>> license
Type license() to see the full license text

and license() works as well.

I get this output for the chr()s:
>>> chr(0x10000)
'\U00010000'
>>> chr(0x11000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programs\Python30\lib\io.py", line 1491, in write
    b = encoder.encode(s)
  File "C:\Programs\Python30\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position
1-2: character maps to <undefined>

I believe that chr(0x10000) and chr(0x11000) should have the opposite
behavior.
U+10000 (LINEAR B SYLLABLE B008 A) belongs to the 'Lo' category and
should be printed (and possibly raise a UnicodeError, see issue5110
[1]), U+11000 belongs to the 'Cn' category and should be escaped[2].

On Linux with Py3 and a UTF-8 terminal, chr(0x10000) prints '\U00010000'
and chr(0x11000) prints the char (actually I see two boxes, but it
shouldn't be a problem of Python). The license() works fine too.

Also note that with cp850 the error message is 'character maps to
<undefined>' and with cp949 is 'illegal multibyte sequence'.

[1]: http://bugs.python.org/issue5110
[2]: http://www.python.org/dev/peps/pep-3138/#specification
History
Date User Action Args
2009-02-02 01:56:56ezio.melottisetrecipients: + ezio.melotti, bupjae
2009-02-02 01:56:56ezio.melottisetmessageid: <1233539816.04.0.865880973678.issue5127@psf.upfronthosting.co.za>
2009-02-02 01:56:54ezio.melottilinkissue5127 messages
2009-02-02 01:56:53ezio.melotticreate