classification
Title: py3k fails under Windows if "-c" or "-m" is given a non-ascii value
Type: behavior Stage:
Components: Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, benjamin.peterson, loewis, pitrou
Priority: release blocker Keywords: patch

Created on 2008-08-27 18:05 by pitrou, last changed 2019-01-10 12:01 by ossdev07. This issue is now closed.

Files
File name Uploaded Description Edit
convert_args.patch pitrou, 2008-08-27 19:17
find_module_unicode.patch amaury.forgeotdarc, 2008-09-06 22:20
find_module_unicode_2.patch amaury.forgeotdarc, 2008-09-08 08:59
command_unicode.patch amaury.forgeotdarc, 2008-09-08 12:24
command_unicode_2.patch amaury.forgeotdarc, 2008-09-21 21:36
Pull Requests
URL Status Linked Edit
PR 11497 closed ossdev07, 2019-01-10 12:01
PR 11497 closed ossdev07, 2019-01-10 12:01
PR 11497 closed ossdev07, 2019-01-10 12:01
Messages (15)
msg72036 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-27 18:05
The explanation is quite simple: in Py_Main, the arguments are converted
from wide to byte strings, but the required length of the byte string is
assumed equal to that of the wide string.

Which gives:

$ ./python -c "print('à')"
Fatal Python error: not enough memory to copy -c argument
Erreur de segmentation (core dumped)
$ ./python -m à
Fatal Python error: not enough memory to copy -m argument
Erreur de segmentation (core dumped)
msg72040 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-27 19:17
Here is a patch which works under Linux. Under Windows it doesn't choke
when converting arguments anymore, but it fails later in the process (in
the parser for '-c', in the importing logic for '-m').

Here is an example:

$ ./python -c "print(ord('ሀ'))"
4608
$ cat > ሀ.py
print(__file__)

$ ./python -m ሀ
/home/antoine/py3k/mbstowcs/ሀ.py
$ ./python ሀ.py 
ሀ.py
msg72682 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-09-06 18:41
Hmm. I suppose anything is better than segfaulting. I think the patch is
fine for now, though.
msg72692 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-09-06 20:47
Committed in r66269.
msg72714 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-06 22:20
This patch corrects the "-m" case on windows: the path has to be
decoded/recoded using the filesystem encoding, and not the default utf-8.
Review is needed, of course.
msg72720 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-09-06 22:50
Looks good and works under Linux.
One small nit, you could just as well use "NN(ssi)" for the
Py_BuildValue and remove Py_DECREF(fob), so as to be more consistent.
msg72771 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-08 08:59
Updated patch.
msg72773 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-08 12:24
./python -c "print('à')"
does not work on my Linux machine with latest py3k (r66303), certainly
because my terminal uses a latin-1 encoding: wcstombs will convert the
argument back to the terminal encoding, whereas PyRun_SimpleString
expects a UTF-8 string.

I join another patch, which propagates the wchar_t as far as possible,
and encodes it as utf-8; with test.

This also corrects the Windows case.
msg72800 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-09-08 22:31
I think the patch good; go ahead.
msg72826 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-09 07:07
Applied both patches as r66331.
msg72828 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-09 07:37
Unfortunately, my patch does not work: see the compile warnings in "main.c":
http://www.python.org/dev/buildbot/3.0/x86%20osx.5%203.0/builds/344/step-compile/0

I reverted the change, and will try something else...
msg73533 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-21 21:36
Today I learned something: wchar_t can be 2 or 4 bytes, PyUNICODE can be 
2 or 4 bytes, and all combinations are possible.
My error was to use PyUnicode_FromUnicode on a wchar_t*; PyUnicode_FromWideChar is the obvious function to use.

Attached a new patch (command_unicode_2.patch) for review.
msg75764 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-11-11 22:23
Raising to release blocker, just to trigger another review...
msg75765 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-11-11 22:47
Go ahead.
msg75768 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-11-11 23:05
Fixed as r67190. Thanks for the review.
History
Date User Action Args
2019-01-10 12:01:43ossdev07setpull_requests: + pull_request11035
2019-01-10 12:01:34ossdev07setpull_requests: + pull_request11034
2019-01-10 12:01:27ossdev07setpull_requests: + pull_request11033
2008-11-11 23:05:34amaury.forgeotdarcsetstatus: open -> closed
resolution: fixed
messages: + msg75768
2008-11-11 22:47:35benjamin.petersonsetkeywords: - needs review
messages: + msg75765
2008-11-11 22:23:05amaury.forgeotdarcsetpriority: critical -> release blocker
messages: + msg75764
2008-09-21 21:36:56amaury.forgeotdarcsetpriority: high -> critical
keywords: + needs review
messages: + msg73533
files: + command_unicode_2.patch
2008-09-09 07:37:16amaury.forgeotdarcsetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg72828
2008-09-09 07:07:13amaury.forgeotdarcsetstatus: open -> closed
resolution: fixed
messages: + msg72826
2008-09-08 22:31:28benjamin.petersonsetmessages: + msg72800
2008-09-08 12:24:52amaury.forgeotdarcsetfiles: + command_unicode.patch
messages: + msg72773
2008-09-08 09:00:00amaury.forgeotdarcsetfiles: + find_module_unicode_2.patch
messages: + msg72771
2008-09-06 22:50:45pitrousetmessages: + msg72720
2008-09-06 22:20:19amaury.forgeotdarcsetfiles: + find_module_unicode.patch
nosy: + amaury.forgeotdarc
messages: + msg72714
2008-09-06 20:47:48pitrousetpriority: deferred blocker -> high
type: crash -> behavior
messages: + msg72692
title: py3k aborts if "-c" or "-m" is given a non-ascii value -> py3k fails under Windows if "-c" or "-m" is given a non-ascii value
2008-09-06 18:41:36benjamin.petersonsetkeywords: - needs review
nosy: + benjamin.peterson
messages: + msg72682
2008-09-04 01:20:10benjamin.petersonsetpriority: release blocker -> deferred blocker
2008-08-28 00:18:27pitrousetkeywords: + needs review
nosy: + loewis
2008-08-27 20:43:27amaury.forgeotdarcsetpriority: release blocker
2008-08-27 19:17:16pitrousetfiles: + convert_args.patch
keywords: + patch
messages: + msg72040
2008-08-27 18:05:40pitroucreate