classification
Title: subprocess.call fails with unicode strings in command line
Type: Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, andersjm, brotch, gregcouch, kcwu, mclausch, ocean-city, xianyiteng (8)
Priority: normal Keywords patch

Created on 2007-07-24 18:24 by mclausch, last changed 2009-05-12 03:33 by kcwu.

Files
File name Uploaded Description Edit Remove
CreateProcessW.patch ocean-city, 2008-03-02 07:58
Python-2.5.2-subprocess.patch gregcouch, 2008-10-01 19:40 Alternate Python-only patch
Messages (9)
msg32546 - (view) Author: Matt (mclausch) Date: 2007-07-24 18:24
On Windows, subprocess.call() fails with an exception if either the executable or any of the arguments contain upper level characters. See below:

>>> cmd = [ u'test_\xc5_exec.bat', u'arg1', u'arg2' ]
>>> subprocess.call(cmd)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python25\lib\subprocess.py", line 443, in call
    return Popen(*popenargs, **kwargs).wait()
  File "C:\Python25\lib\subprocess.py", line 593, in __init__
    errread, errwrite)
  File "C:\Python25\lib\subprocess.py", line 815, in _execute_child
    startupinfo)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 5: ordinal not in range(128)
msg32547 - (view) Author: brotchie (brotch) Date: 2007-08-05 08:36
Python's default character coding is 'ascii' which can't convert unicode > 127 into chars.

Forcing the unicode string to encode as 'iso-8859-1' 

eg.
subprocess.call(cmd.encode('iso-8859-1')) 

resolves the problem and runs the correct command.
msg32548 - (view) Author: Matt (mclausch) Date: 2007-08-20 21:12
Sorry, I should have been more specific. I'm looking for a general solution, not just one for characters in iso-8859-1. For instance, I need to execute a subprocess where the executable or the arguments may contain Japanese characters.

So another example would be:
cmd = [ u'test_\u65e5\u672c\u8a9e_exec.bat', u'arg1', u'arg2' ]
subprocess.call(cmd)
msg63176 - (view) Author: Hirokazu Yamamoto (ocean-city) Date: 2008-03-02 07:58
I tried to fix this problem using CreateProcessW.
(environment variables are still ANSI)

I don't know Python C API well, maybe I'm doing
something wrong. (I confirmed test_subprocess.py
passes)
msg74142 - (view) Author: Greg Couch (gregcouch) Date: 2008-10-01 19:40
We're having the same problem.  My quick fix was to patch subprocess.py
so the command line and executable are converted to the filesystem
encoding (mbcs).
msg87566 - (view) Author: Kuang-che Wu (kcwu) Date: 2009-05-11 07:56
ocrean-city's patch applied cleanly with trunk and it works for me.
Could anybody review and commit? I could help if any refinement required.
msg87580 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) Date: 2009-05-11 17:26
The first patch will introduce regressions for strings that cannot be
decoded with the filesystem encoding. It is necessary to provide a
fallback to the CreateProcessA function.

I'd prefer the python-only patch, except for the "sys=sys" argument to
the function. Is it really needed?
msg87597 - (view) Author: Greg Couch (gregcouch) Date: 2009-05-12 00:41
I like the C patch better.  It only tries to decode non-unicode objects
with the filesystem (mbcs) encoding.  This fits in with Python 3.0
perfectly where all strings are unicode.  In 2.5, strings are assumed to
be in the mbcs encoding, to match the Windows ANSI API, so decoding
those with the mbcs encoding shouldn't alter the set of acceptable
strings (which is what the C patch is doing if I read the code correctly).
msg87605 - (view) Author: Kuang-che Wu (kcwu) Date: 2009-05-12 03:33
There is slight difference between C and python patch.
C version: convert mbcs argument to unicode
py version: convert unicode argument to mbcs

Actually, python version patch may not work if the string is unicode and
cannot encoded by mbcs. For example, my windows system is Chinese
(cp950) and the program I want to execute contains Japanese characters.
Encode Japanese characters with mbcs (in this case, it is cp950) will
fail. This is also what Matt (mclausch) said.

On the other hand, the C version patch. I don't think fall-back is
necessary. If the string is failed to convert from mbcs to unicode, it
will be eventually failed inside CreateProcessA() because CreateProcessA
internally (after win2k) will try to convert from mbcs to unicode and
call CreateProcessW.
History
Date User Action Args
2009-05-12 03:33:21kcwusetmessages: + msg87605
2009-05-12 00:41:07gregcouchsetmessages: + msg87597
2009-05-11 17:26:26amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg87580
2009-05-11 07:56:31kcwusetnosy: + kcwu
messages: + msg87566
2008-12-14 19:53:42xianyitengsetnosy: + xianyiteng
2008-10-01 19:40:51gregcouchsetfiles: + Python-2.5.2-subprocess.patch
nosy: + gregcouch
messages: + msg74142
2008-08-26 13:35:10andersjmsetnosy: + andersjm
2008-03-02 07:58:17ocean-citysetfiles: + CreateProcessW.patch
keywords: + patch
messages: + msg63176
nosy: + ocean-city
2007-07-24 18:24:11mclauschcreate