This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: msvcrt bytes cleanup
Type: behavior Stage:
Components: Extension Modules, Windows Versions: Python 3.1
process
Status: closed Resolution: fixed
Dependencies: 5499 Superseder:
Assigned To: Nosy List: benjamin.peterson, ocean-city, pitrou, vstinner
Priority: release blocker Keywords: patch

Created on 2009-03-03 11:56 by ocean-city, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
py3k_fix_msvcrt.patch ocean-city, 2009-03-03 11:56
msvcrt_wchar.patch vstinner, 2009-03-17 17:22
msvcrt_wchar-2.patch vstinner, 2009-04-18 17:29
Messages (11)
msg83071 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-03-03 11:56
I came from issue5391. Here is quote of Victor's message.

>* msvcrt.putch(char), msvcrt.ungetch(char): msvcrt has also:
>  - msvcrt.getch()->byte string of 1 byte
>  - msvcrt.getwch()->unicode string of 1 character
>  - msvcrt.putwch(unicode string of 1 character)
>  - msvcrt_ungetwch(unicode string of 1 character)
>  Hum, putch(), ungetch(), getch() use inconsistent types 
>(unicode/bytes) and should be fixed. Another issue should be open for 
>that.
>
>Notes: msvcrt.putwch() accepts string of length > 1 and 
>msvcrt.ungetwch() doesn't check string length (and so may crash with 
>length=0 or length > 1?).

And msvcrt.ungetwch() calls _ungetch not _ungetwch. Here is the patch
hopefully fixing these issue. (I cannot test wide version of functions
because VC6 don't have them)
msg83685 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-17 17:12
> msvcrt.ungetwch() calls _ungetch not _ungetwch

... are you sure that someone already used these functions? :-)

If you suppose that issue5499 is fixed, you can leave msvcrt_putch() 
and msvcrt_ungetch unchanged and use "C" format in msvcrt_ungetwch() 
("Py_UNICODE ch;" have to be replaced by "int ch;" for the 
format "C").
msg83686 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-17 17:22
Patch implementing my proposition (depends on issue5499).
msg85182 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-04-02 08:32
issue5499 is fixed, so msvcrt_wchar.patch can now be used :-) Anyone 
available for a review and/or _a test_? I don't have Windows, so it's 
hard for me to test my patch.
msg85410 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-04-04 16:56
There seems to be a problem with ungetwch():

>>> s = msvcrt.getwch()
# Here I type the Euro sign (€)
>>> ascii(s)
"'\\u20ac'"
>>> msvcrt.ungetwch(s)
>>> u = msvcrt.getwch()
>>> ascii(u)
"'\\xac'"
msg85426 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-04-04 18:58
I think this can wait until the first beta.
msg85455 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-04-05 00:39
> There seems to be a problem with ungetwch()

I tested Visual C++ Express 2008 and it looks like _ungetwch() only 
keep 8 lower bits (like _ungetwch(x & 255)). But it's a bug in 
Microsoft library, not in Python code (I added some printf to be 
sure).

My patch (msvcrt_wchar.patch) makes the situation better, but it's not 
perfect because of a bug in Microsoft's library.

msvcrt.getwch() works correctly with characters with code > 255 (eg. 
euro sign, U+20ac, 8364 in decimal).
msg85461 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-04-05 01:59
MSDN says _ungetwch returns WEOF instead of EOF when error occurs.
http://msdn.microsoft.com/en-us/library/yezzac74(VS.80).aspx

I cannot see any remarks about masking behavior. :-(
msg86124 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-04-18 17:27
> I cannot see any remarks about masking behavior. :-(

I asked on a french Windows developer channel. The answer is that the 
Windows terminal uses "ANSI" charset even if it's possible to use 
unicode. So it's a bug in Microsoft msvcrt library (directly in the 
terminal implementation), not in Python.

Anyway I think that my patch (msvcrt_wchar.patch) makes the situation 
better ;-)
msg86125 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-04-18 17:29
> MSDN says _ungetwch returns WEOF instead of EOF when error occurs.

Ok, I updated my patch (to use WEOF).
msg86916 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-05-01 21:42
Applied in r72185.
History
Date User Action Args
2022-04-11 14:56:46adminsetgithub: 49660
2009-05-01 21:42:45benjamin.petersonsetstatus: open -> closed
resolution: fixed
messages: + msg86916
2009-04-18 17:29:44vstinnersetfiles: + msvcrt_wchar-2.patch

messages: + msg86125
2009-04-18 17:27:41vstinnersetmessages: + msg86124
2009-04-05 01:59:39ocean-citysetmessages: + msg85461
2009-04-05 00:39:56vstinnersetmessages: + msg85455
2009-04-04 23:34:52benjamin.petersonsetpriority: deferred blocker -> release blocker
2009-04-04 18:58:31benjamin.petersonsetpriority: release blocker -> deferred blocker
nosy: + benjamin.peterson
messages: + msg85426

2009-04-04 16:56:46pitrousettype: behavior

messages: + msg85410
nosy: + pitrou
2009-04-02 08:32:20vstinnersetmessages: + msg85182
2009-04-02 00:41:39ocean-citysetpriority: release blocker
2009-03-31 19:23:46ocean-citysetdependencies: + only accept byte for getarg('c') and unicode for getarg('C')
2009-03-17 17:22:46vstinnersetfiles: + msvcrt_wchar.patch

messages: + msg83686
2009-03-17 17:12:20vstinnersetnosy: + vstinner
messages: + msg83685
2009-03-03 11:56:13ocean-citycreate