classification
Title: [2.7] subprocess.call fails with unicode strings in command line
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: tim.golden Nosy List: Safihre, amaury.forgeotdarc, andersjm, brotch, gregcouch, jnoller, kcwu, mclausch, ocean-city, terry.reedy, tim.golden, xianyiteng
Priority: normal Keywords: patch

Created on 2007-07-24 18:24 by mclausch, last changed 2017-10-04 10:00 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
CreateProcessW.patch ocean-city, 2008-03-02 07:58
Python-2.5.2-subprocess.patch gregcouch, 2008-10-01 19:40 Alternate Python-only patch
Messages (16)
msg32546 - (view) Author: Matt (mclausch) Date: 2007-07-24 18:24
On Windows, subprocess.call() fails with an exception if either the executable or any of the arguments contain upper level characters. See below:

>>> cmd = [ u'test_\xc5_exec.bat', u'arg1', u'arg2' ]
>>> subprocess.call(cmd)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python25\lib\subprocess.py", line 443, in call
    return Popen(*popenargs, **kwargs).wait()
  File "C:\Python25\lib\subprocess.py", line 593, in __init__
    errread, errwrite)
  File "C:\Python25\lib\subprocess.py", line 815, in _execute_child
    startupinfo)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 5: ordinal not in range(128)
msg32547 - (view) Author: brotchie (brotch) Date: 2007-08-05 08:36
Python's default character coding is 'ascii' which can't convert unicode > 127 into chars.

Forcing the unicode string to encode as 'iso-8859-1' 

eg.
subprocess.call(cmd.encode('iso-8859-1')) 

resolves the problem and runs the correct command.
msg32548 - (view) Author: Matt (mclausch) Date: 2007-08-20 21:12
Sorry, I should have been more specific. I'm looking for a general solution, not just one for characters in iso-8859-1. For instance, I need to execute a subprocess where the executable or the arguments may contain Japanese characters.

So another example would be:
cmd = [ u'test_\u65e5\u672c\u8a9e_exec.bat', u'arg1', u'arg2' ]
subprocess.call(cmd)
msg63176 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-03-02 07:58
I tried to fix this problem using CreateProcessW.
(environment variables are still ANSI)

I don't know Python C API well, maybe I'm doing
something wrong. (I confirmed test_subprocess.py
passes)
msg74142 - (view) Author: Greg Couch (gregcouch) Date: 2008-10-01 19:40
We're having the same problem.  My quick fix was to patch subprocess.py
so the command line and executable are converted to the filesystem
encoding (mbcs).
msg87566 - (view) Author: Kuang-che Wu (kcwu) Date: 2009-05-11 07:56
ocrean-city's patch applied cleanly with trunk and it works for me.
Could anybody review and commit? I could help if any refinement required.
msg87580 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-05-11 17:26
The first patch will introduce regressions for strings that cannot be
decoded with the filesystem encoding. It is necessary to provide a
fallback to the CreateProcessA function.

I'd prefer the python-only patch, except for the "sys=sys" argument to
the function. Is it really needed?
msg87597 - (view) Author: Greg Couch (gregcouch) Date: 2009-05-12 00:41
I like the C patch better.  It only tries to decode non-unicode objects
with the filesystem (mbcs) encoding.  This fits in with Python 3.0
perfectly where all strings are unicode.  In 2.5, strings are assumed to
be in the mbcs encoding, to match the Windows ANSI API, so decoding
those with the mbcs encoding shouldn't alter the set of acceptable
strings (which is what the C patch is doing if I read the code correctly).
msg87605 - (view) Author: Kuang-che Wu (kcwu) Date: 2009-05-12 03:33
There is slight difference between C and python patch.
C version: convert mbcs argument to unicode
py version: convert unicode argument to mbcs

Actually, python version patch may not work if the string is unicode and
cannot encoded by mbcs. For example, my windows system is Chinese
(cp950) and the program I want to execute contains Japanese characters.
Encode Japanese characters with mbcs (in this case, it is cp950) will
fail. This is also what Matt (mclausch) said.

On the other hand, the C version patch. I don't think fall-back is
necessary. If the string is failed to convert from mbcs to unicode, it
will be eventually failed inside CreateProcessA() because CreateProcessA
internally (after win2k) will try to convert from mbcs to unicode and
call CreateProcessW.
msg112739 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-04 02:48
I fail to see why subprocess.call(cmd.encode('whatever')) is not a general solution. Auto-encoding strikes me as wrong. Someone who wants that should write their own wrapper. In any case, 2.7 is out and closed to new features, while 3.x fixes this and numerous other unicode issues.
msg112767 - (view) Author: Kuang-che Wu (kcwu) Date: 2010-08-04 07:11
> I fail to see why subprocess.call(cmd.encode('whatever')) is not a general solution.
Because 'whatever' encoding doesn't exist.

Assume cmd contains Japanese characters and my system is Chinese windows. subprocess.call expect the argument is encoded in mbcs, which is cp950. However, cp950 encoding doesn't contain Japanese characters.

subprocess.call(cmd.encode('cp950')) will fail because cp950 doesn't contain Japanese characters.
subprocess.call(cmd.encode('cp932')) will fail because subprocess.call will decode fail or incorrectly.
msg112825 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-04 15:56
Thanks for the simple explanation.
msg112835 - (view) Author: Greg Couch (gregcouch) Date: 2010-08-04 17:28
So Terry, can you reopen this bug then?  It's not out of date.
msg112854 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-04 18:35
I will not reopen this now for the reasons I already stated after "In any case ...". To expand on that.

1. 2.7 is in maintenance (bug-fix only) mode and I view this a feature request. To persuade someone otherwise, quote some doc that clearly says subprocess should behave as requested. I nosy-ed Jesse Noller so he can contradict me if he wishes.

2. The underlying issue seems to be the use of limited encodings, which was and is being fixed as well as possible in 3.x. Since there has been no mention of this issue being a problem with subprocess in 3.1, I presume there is none. If there is, say so and I will reopen.

The discussion shows disagreement on both the goal and approach to change. I am dubious that there will be an acceptable general solution. Even if this is persuasively seen as a bug and there is a good patch, I am dubious that any of the current developers will want to spent the necessary time to properly review a workaround to an issue that was already fixed the right way in 3.x.
msg113288 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2010-08-08 17:26
To confirm the situation on 3.x: a unicode string with non-ascii-encodable characters is fine. The easy test here in the uk is a pound sign:

<code>
import subprocess

FILENAME = "abc£.bat"
FILENAME.encode ("ascii")
#
# UnicodeEncodeError
#
with open (FILENAME, "w") as f:
  f.write ("echo hello\n")

subprocess.call ([FILENAME])
#
# "hello" output as expected
#

</code>

So no action for 3.x. I'm sympathetic (in principle) to making a change to 2.7 but I haven't looked over the "competing" patches and assessed the ins-and-outs.
msg303677 - (view) Author: Safihre (Safihre) Date: 2017-10-04 09:12
Although this issue is very old, in case anyone else like us need this functionality I created a package that implements the proposed C-fix.
https://pypi.python.org/pypi/subprocessww
Simply "import subprocessww" and POpen is patched. We tested it and it does the job pretty well, haven't run into special situations yet.

We really want to upgrade our app to Python 3, but currently lack the manpower to go over our app line by line. It's not a simple 2to3 conversion, unfortunately.
History
Date User Action Args
2017-10-04 10:00:39vstinnersettitle: subprocess.call fails with unicode strings in command line -> [2.7] subprocess.call fails with unicode strings in command line
2017-10-04 09:12:44Safihresetnosy: + Safihre
messages: + msg303677
2010-08-08 17:26:02tim.goldensetassignee: tim.golden

messages: + msg113288
nosy: + tim.golden
2010-08-08 12:05:47mightyiamsetnosy: - mightyiam
2010-08-04 18:35:59terry.reedysetnosy: + jnoller
messages: + msg112854
2010-08-04 17:28:45gregcouchsetmessages: + msg112835
2010-08-04 15:56:03terry.reedysetmessages: + msg112825
2010-08-04 07:11:29kcwusetmessages: + msg112767
2010-08-04 02:48:10terry.reedysetstatus: open -> closed

type: enhancement
versions: + Python 2.7, - Python 2.5
nosy: + terry.reedy

messages: + msg112739
resolution: out of date
2009-12-31 12:29:38mightyiamsetnosy: + mightyiam
2009-05-12 03:33:21kcwusetmessages: + msg87605
2009-05-12 00:41:07gregcouchsetmessages: + msg87597
2009-05-11 17:26:26amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg87580
2009-05-11 07:56:31kcwusetnosy: + kcwu
messages: + msg87566
2008-12-14 19:53:42xianyitengsetnosy: + xianyiteng
2008-10-01 19:40:51gregcouchsetfiles: + Python-2.5.2-subprocess.patch
nosy: + gregcouch
messages: + msg74142
2008-08-26 13:35:10andersjmsetnosy: + andersjm
2008-03-02 07:58:17ocean-citysetfiles: + CreateProcessW.patch
keywords: + patch
messages: + msg63176
nosy: + ocean-city
2007-07-24 18:24:11mclauschcreate