msg58965 - (view) |
Author: Vlastimil Brom (vbr) |
Date: 2007-12-22 21:09 |
While testing the 3.0a2 build (on Win XPh SP2, Czech), I found a
possible bug in the input() function;
if the prompt text contains non-ascii characters (even those present in
the default charset of the system locale - Czech in this case) the
prompt is displayed incorrectly; however, the inserted value is treated
as expected.
The print() function deals with these characters correctly.
This bug occurs in the system console (cmd.exe) only, using idle
everything works ok.
============ a minimal snapshot of the session follows ==========
Python 3.0a2 (r30a2:59397:59399, Dec 6 2007, 22:34:52) [MSC v.1500 32
bit (Inte
l)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> input("ěšč: ")
─Ť┼í─Ź: 7
'7'
>>> print("ěšč: ")
ěšč:
>>>
==================================
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> input(u"ěšč: ")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-
2: ordin
al not in range(128)
>>> print u"ěšč: "
ěšč:
>>>
|
msg58969 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2007-12-23 10:11 |
Would you like to work on a patch?
|
msg59039 - (view) |
Author: Vlastimil Brom (vbr) |
Date: 2007-12-29 19:53 |
First sorry about a delayed response, but moreover, I fear, preparing a
patch would be far beyond my programming competence; sorry about that.
|
msg59125 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2008-01-03 06:14 |
I think I understand what's going on. The trail leads from the last "if
(tty) {" block in builtin_input() to PyOS_Readline() which in turn ends
up calling PyOS_StdioReadline() (because that's the most likely
initialization of PyOS_ReadlineFunctionPointer). And this, finally,
uses fprintf() to stderr to print the prompt. That apparently doesn't
use the same encoding, or perhaps by now the string has been encoded as
UTF-8.
This is clearly a problem. But what to do about it...
|
msg59140 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2008-01-03 18:19 |
Windows needs its own PyOS_StdioReadline() function in order to support
wide chars. We can either use the low level functions _putwch() and
_getwche(). Or we could probably use the more higher functions
_cwprintf_s() (secure console wide char print format, oh I love MS'
naming schema) and _cgetws_s().
|
msg59141 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2008-01-03 18:40 |
Cool.
I suspect Unix will also require a customized version to be used in case
GNU readline isn't present.
And I wouldn't be surprised if GNU readline itself doesn't handle UTF-8
properly either!
|
msg59142 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2008-01-03 18:51 |
Guido van Rossum wrote:
> I suspect Unix will also require a customized version to be used in case
> GNU readline isn't present.
>
> And I wouldn't be surprised if GNU readline itself doesn't handle UTF-8
> properly either!
GNU readline can handle UTF-8 chars fine on my system:
äßé: ä
ä
My locales are set to de_DE.UTF-8
Christian
|
msg59144 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2008-01-03 19:18 |
If possible, I would like to see the C library phased out of Python on
Windows, for file I/O. In this case, it would mean that ReadConsoleW is
used directly for character input. Notice that _cgetws does not take a
file handle as a parameter, but implicitly uses _coninpfh.
As a consequence, PyOS_StdioReadline probably should change its
parameter from FILE* to "file handle", and consequently rename it to,
say, PyOS_Readline.
|
msg59685 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2008-01-11 00:19 |
Isn't it enough to encode the prompt with the console encoding, instead
of letting the default utf-8 conversion? This patch corrects the issue
on Windows:
Index: ../Python/bltinmodule.c
===================================================================
--- ../Python/bltinmodule.c (revision 59843)
+++ ../Python/bltinmodule.c (working copy)
@@ -1358,12 +1358,19 @@
else
Py_DECREF(tmp);
if (promptarg != NULL) {
- po = PyObject_Str(promptarg);
+ PyObject *stringpo = PyObject_Str(promptarg);
+ if (stringpo == NULL) {
+ Py_DECREF(stdin_encoding);
+ return NULL;
+ }
+ po = PyUnicode_AsEncodedString(stringpo,
+ PyUnicode_AsString(stdin_encoding), NULL);
+ Py_DECREF(stringpo);
if (po == NULL) {
Py_DECREF(stdin_encoding);
return NULL;
}
- prompt = PyUnicode_AsString(po);
+ prompt = PyString_AsString(po);
if (prompt == NULL) {
Py_DECREF(stdin_encoding);
Py_DECREF(po);
|
msg59695 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2008-01-11 08:36 |
> Isn't it enough to encode the prompt with the console encoding, instead
> of letting the default utf-8 conversion? This patch corrects the issue
> on Windows:
Sounds right. Technically, you should be using the stdout encoding, but
I don't think it should ever differ from the stdin_encoding.
|
msg73458 - (view) |
Author: Vlastimil Brom (vbr) |
Date: 2008-09-20 07:38 |
While I am not sure about the status of this somewhat older issue, I
just wanted to mention, that the behaviour remains the same in Python
3.0rc1 (XPh SP3, Czech)
Python 3.0rc1 (r30rc1:66507, Sep 18 2008, 14:47:08) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> input("ěšč: ")
─Ť┼í─Ź: řžý
'řžý'
>>> print("ěšč: ")
ěšč:
>>>
Is the patch above supposed to have been committed, or are there yet
another difficulties?
(Not that it is a huge problem (for me), as applications dealing with
non ascii text probably would use a gui, rather than relying on a
console, but it's a kind of surprising.)
|
msg73462 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2008-09-20 10:46 |
Amaury, what further review of the patch do you desire? I had already
commented that I consider the patch correct, except that it might use
stdout_encoding instead.
Also, I wouldn't consider this a release blocker. It is somewhat
annoying that input produces moji-bake in certain cases (i.e. non-ASCII
characters in the prompt, and a non-UTF-8 terminal), but if the patch
wouldn't make it into 3.0, we can still fix it in 3.0.1.
|
msg73464 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2008-09-20 15:04 |
Given MvL's review, assuming it fixes the Czech problem, I'm all for
applying it.
|
msg73471 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2008-09-20 20:20 |
Here is a new version of the patch: the PyString* functions were renamed
to PyBytes*, and it now uses stdout_encoding.
About the "release blocker" status: I agree it is not so important, I
just wanted to express my "it's been here for long, it's almost ready,
it would be a pity not to have it in the final 3.0" feelings.
|
msg73527 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2008-09-21 20:32 |
I'm ok with this patch.
|
msg73536 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2008-09-21 22:11 |
Committed r66545.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:29 | admin | set | github: 46029 |
2008-09-21 22:11:31 | amaury.forgeotdarc | set | status: open -> closed resolution: fixed messages:
+ msg73536 |
2008-09-21 20:32:00 | benjamin.peterson | set | keywords:
- needs review nosy:
+ benjamin.peterson messages:
+ msg73527 |
2008-09-20 20:20:22 | amaury.forgeotdarc | set | files:
+ inputprompt.patch keywords:
+ patch messages:
+ msg73471 |
2008-09-20 15:04:33 | gvanrossum | set | messages:
+ msg73464 |
2008-09-20 10:46:40 | loewis | set | messages:
+ msg73462 |
2008-09-20 08:59:17 | amaury.forgeotdarc | set | priority: normal -> release blocker keywords:
+ needs review |
2008-09-20 07:38:10 | vbr | set | messages:
+ msg73458 |
2008-01-11 08:36:32 | loewis | set | messages:
+ msg59695 |
2008-01-11 00:19:28 | amaury.forgeotdarc | set | messages:
+ msg59685 |
2008-01-06 22:29:44 | admin | set | keywords:
- py3k versions:
Python 3.0 |
2008-01-03 19:18:31 | loewis | set | messages:
+ msg59144 |
2008-01-03 18:51:15 | christian.heimes | set | messages:
+ msg59142 |
2008-01-03 18:40:34 | gvanrossum | set | messages:
+ msg59141 |
2008-01-03 18:19:44 | christian.heimes | set | messages:
+ msg59140 |
2008-01-03 06:15:00 | gvanrossum | set | priority: normal nosy:
+ gvanrossum, christian.heimes messages:
+ msg59125 keywords:
+ py3k |
2007-12-30 22:24:01 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc |
2007-12-29 19:53:16 | vbr | set | messages:
+ msg59039 |
2007-12-23 10:11:16 | loewis | set | nosy:
+ loewis messages:
+ msg58969 |
2007-12-22 21:09:32 | vbr | create | |