classification
Title: subprocess.Popen(universal_newlines=True) does not work for certain locales
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, chris.jerdonek, haypo, loewis, python-dev
Priority: normal Keywords: easy, patch

Created on 2012-08-08 21:36 by chris.jerdonek, last changed 2012-08-19 19:22 by asvetlov. This issue is now closed.

Files
File name Uploaded Description Edit
failing-test-case-1.patch chris.jerdonek, 2012-08-08 21:36 review
issue-15595-1.patch chris.jerdonek, 2012-08-08 21:43 review
issue-15595-2.patch chris.jerdonek, 2012-08-09 01:02 review
issue-15595-3.patch chris.jerdonek, 2012-08-09 01:48 review
issue-15595-4.patch chris.jerdonek, 2012-08-19 18:48 review
Messages (15)
msg167719 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-08 21:36
subprocess.Popen() with universal_newlines=True does not convert line breaks correctly when the preferred encoding is UTF-16.  For example, the following code--

code = r"import sys; sys.stdout.buffer.write('a\r\nb'.encode('utf-16'))"
args = [sys.executable, '-c', code]
popen = Popen(args, universal_newlines=True, stdin=PIPE, stdout=PIPE)
print(popen.communicate(input=''))

yields--

('a\n\nb', None)

instead of--

('a\nb', None)

The reason is that the code attempts to convert newlines before decoding to unicode instead of after:

http://hg.python.org/cpython/file/85266c6f9ae4/Lib/subprocess.py#l830

I am attaching a failing test case.  I will upload a patch shortly.

Also see the related documentation issue 15561.
msg167720 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-08-08 21:38
> when the preferred encoding is UTF-16

What is your operating system? How do you set the locale encoding to UTF-16? Do you mean UTF-16, UTF-16-LE or UTF-16-BE?
msg167721 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-08 21:43
Attaching patch for review.
msg167722 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-08 21:48
Victor, I have not personally experienced this issue.  I just noticed that the order of operations is wrong or not portable in the _translate_newlines() method when I was looking at the code for another reason.
msg167727 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-08 22:07
> How do you set the locale encoding to UTF-16? Do you mean UTF-16, UTF-16-LE or UTF-16-BE?

I confirmed that the issue occurs for all of these.  For testing purposes, you can do--

locale.getpreferredencoding = lambda do_setlocale: 'utf-16'
msg167741 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-08-09 00:17
There is a bug, but I'm not conviced that multibyte encodings are used as locale encoding.

About your patch: you should test the 3 types of newlines, so use a string like: '1\r\n2\r3\n4'.

+            # Popen() defaults to locale.getpreferredencoding(False).
+            locale.getpreferredencoding = lambda do_setlocale: 'utf-16'

FYI it's not directly Popen() which uses the locale encoding, but TextIOWrapper.
msg167744 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-09 00:24
Thanks for your comments.  

> FYI it's not directly Popen() which uses the locale encoding, but TextIOWrapper.

Yes, I will note that to be more clear.  Would you like me to add tests for UTF-16-LE, etc. (via a helper test method), or will UTF-16 alone suffice?
msg167745 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-08-09 00:37
> Would you like me to add tests for UTF-16-LE, etc. (via a helper test method), or will UTF-16 alone suffice?

You can use a loop insteadd of an helper function.

I fail to see other encoding having the same issue, except the UTF-32 family (utf-32, utf-32-le, utf-32-be).

I don't think that writing a test for the 6 codecs is required, you can use utf-16 and utf-32-be. So you test without BOM and with BOM, and utf-16 and utf-32.
msg167748 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-09 01:02
Updating patch with Victor's comments.
msg167750 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-09 01:20
It looks like this also affects 3.2, but I will need to modify the test slightly because in 3.2, TextIOWrapper calls locale.getpreferredencoding() without any arguments.
msg167753 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-09 01:48
Here is a patch also suitable for applying to Python 3.2.
msg168585 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-08-19 18:27
Looks good for me, but please use assertEqual instead assertEquals.
msg168587 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-19 18:48
Thank you, Andrew.  Here is a patch updated with that change (and also merging with tip).
msg168588 - (view) Author: Roundup Robot (python-dev) Date: 2012-08-19 19:20
New changeset a0f7c2f79bce by Andrew Svetlov in branch '3.2':
Issue #15595: Fix subprocess.Popen(universal_newlines=True)
http://hg.python.org/cpython/rev/a0f7c2f79bce

New changeset aceb820154c3 by Andrew Svetlov in branch 'default':
Issue #15595: Fix subprocess.Popen(universal_newlines=True)
http://hg.python.org/cpython/rev/aceb820154c3
msg168589 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-08-19 19:22
Fixed. Thank you.
History
Date User Action Args
2012-08-19 19:22:45asvetlovsetstatus: open -> closed
resolution: fixed
messages: + msg168589

stage: patch review -> resolved
2012-08-19 19:20:17python-devsetnosy: + python-dev
messages: + msg168588
2012-08-19 18:48:40chris.jerdoneksetfiles: + issue-15595-4.patch

messages: + msg168587
2012-08-19 18:27:36asvetlovsetmessages: + msg168585
2012-08-13 12:54:31chris.jerdoneksetnosy: + asvetlov
2012-08-09 01:48:43chris.jerdoneksetfiles: + issue-15595-3.patch

messages: + msg167753
2012-08-09 01:20:02chris.jerdoneksetmessages: + msg167750
versions: + Python 3.2
2012-08-09 01:02:04chris.jerdoneksetfiles: + issue-15595-2.patch

messages: + msg167748
2012-08-09 00:37:21hayposetmessages: + msg167745
2012-08-09 00:24:42chris.jerdoneksetmessages: + msg167744
2012-08-09 00:17:40hayposetmessages: + msg167741
2012-08-08 23:19:27hayposetnosy: + loewis
2012-08-08 22:07:04chris.jerdoneksetmessages: + msg167727
2012-08-08 21:48:22chris.jerdoneksetmessages: + msg167722
2012-08-08 21:43:42chris.jerdoneksetfiles: + issue-15595-1.patch

messages: + msg167721
stage: needs patch -> patch review
2012-08-08 21:38:06hayposetnosy: + haypo
messages: + msg167720
2012-08-08 21:36:14chris.jerdonekcreate