Issue1602
Created on 2007-12-12 09:56 by mark, last changed 2009-05-19 09:46 by amaury.forgeotdarc.
|
msg58487 - (view) |
Author: Mark Summerfield (mark) |
Date: 2007-12-12 09:56 |
|
I am not sure if this is a Python bug or simply a limitation of cmd.exe.
I am using Windows XP Home.
I run cmd.exe with the /u option and I have set my console font to
"Lucida Console" (the only TrueType font offered), and I run chcp 65001
to set the utf8 code page.
When I run the following program:
for x in range(32, 2000):
print("{0:5X} {0:c}".format(x))
one blank line is output.
But if I do chcp 1252 the program prints up to 7F before hitting a
unicode encoding error.
This is different behaviour from Python 2.5.1 which (with a suitably
modified print line) after chcp 65001 prints up to 7F and then fails
with "IOError: [Errno 0] Error".
|
|
msg58621 - (view) |
Author: Mark Summerfield (mark) |
Date: 2007-12-14 11:31 |
|
I've looked into this a bit more, and from what I can see, code page
65001 just doesn't work---so it is a Windows problem not a Python problem.
A possible solution might be to read/write UTF16 which "managed" Windows
applications can do.
|
|
msg58651 - (view) |
Author: Christian Heimes (christian.heimes) |
Date: 2007-12-15 02:08 |
|
We are aware of multiple Windows related problems. We are planing to
rewrite parts of the Windows specific API to use the widechar variants.
Maybe that will help.
|
|
msg87086 - (view) |
Author: Antoine Pitrou (pitrou) |
Date: 2009-05-03 23:57 |
|
Yes, it is a Windows problem. There simply doesn't seem to be a true
Unicode codepage for command-line apps. Recommend closing.
|
|
msg88059 - (view) |
Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) |
Date: 2009-05-19 00:08 |
|
Just in case it helps, this behaviour is on Win XP Pro, Python 2.5.1:
First, I added an alias for 'cp65001' to 'utf_8' in
Lib/encodings/aliases.py .
Then, I opened a command prompt with a bitmap font.
c:\windows\system32>python
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print u"\N{EM DASH}"
—
I switched the font to Lucida Console, and retried (without exiting the
python interpreter, although the behaviour is the same when exiting and
entering again: )
>>> print u"\N{EM DASH}"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: [Errno 13] Permission denied
Then I tried (by pressing Alt+0233 for é, which is invalid in my normal
cp1253 codepage):
>>> print u"née"
and the interpreter exits without any information. So it does for:
>>> a=u"née"
Then I created a UTF-8 text file named 'test65001.py':
# -*- coding: utf_8 -*-
a=u"néeα"
print a
and tried to run it directly from the command line:
c:\windows\system32>python d:\src\PYTHON\test65001.py
néeαTraceback (most recent call last):
File "d:\src\PYTHON\test65001.py", line 4, in <module>
print a
IOError: [Errno 2] No such file or directory
You see? It printed all the characters before failing.
Also the following works:
c:\windows\system32>echo heéε
heéε
and
c:\windows\system32>echo heéε >D:\src\PYTHON\dummy.txt
creates successfully a UTF-8 file (without any UTF-8 BOM marks at the
beginning).
So it's possible that it is a python bug, or at least something can be
done about it.
|
|
msg88077 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) |
Date: 2009-05-19 09:46 |
|
an immediate thing to do is to declare cp65001 as an encoding:
Index: Lib/encodings/aliases.py
===================================================================
--- Lib/encodings/aliases.py (revision 72757)
+++ Lib/encodings/aliases.py (working copy)
@@ -511,6 +511,7 @@
'utf8' : 'utf_8',
'utf8_ucs2' : 'utf_8',
'utf8_ucs4' : 'utf_8',
+ 'cp65001' : 'utf_8',
## uu_codec codec
#'uu' : 'uu_codec',
This is not enough unfortunately, because the win32 API function
WriteFile() returns the number of characters written, not the number of
(utf8) bytes:
>>> print("\u0124\u0102" + 'abc')
ĤĂabc
c
[44420 refs]
>>>
Additionally, there is a bug in the ReadFile, which returns an empty
string (and no error) when a non-ascii character is entered, which is
the behavior of an EOF condition...
Maybe the solution is to use the win32 console API directly...
|
|
| Date |
User |
Action |
Args |
| 2009-05-19 09:46:13 | amaury.forgeotdarc | set | messages:
+ msg88077 |
| 2009-05-19 07:54:22 | pitrou | set | nosy:
+ amaury.forgeotdarc
|
| 2009-05-19 00:09:03 | tzot | set | nosy:
+ tzot messages:
+ msg88059
|
| 2009-05-03 23:57:04 | pitrou | set | nosy:
+ pitrou messages:
+ msg87086
|
| 2009-05-03 23:51:10 | haypo | set | nosy:
haypo, christian.heimes, mark, ezio.melotti components:
+ Windows |
| 2009-05-03 23:50:37 | haypo | set | nosy:
haypo, christian.heimes, mark, ezio.melotti components:
- Windows |
| 2009-04-27 23:38:12 | ajaksu2 | set | nosy:
+ haypo, ezio.melotti
versions:
+ Python 3.1 stage: test needed |
| 2008-01-06 22:29:44 | admin | set | keywords:
- py3k versions:
Python 3.0 |
| 2007-12-15 02:08:14 | christian.heimes | set | priority: low keywords:
+ py3k messages:
+ msg58651 nosy:
+ christian.heimes |
| 2007-12-14 11:31:28 | mark | set | messages:
+ msg58621 |
| 2007-12-12 09:56:30 | mark | create | |
|