classification
Title: CharacterEncoderError when reading from sys.stdin from piped input in cmd.exe
Type: enhancement Stage:
Components: Windows Versions: Python 3.2
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eric.araujo, loewis, pbos, r.david.murray, vstinner
Priority: normal Keywords:

Created on 2010-08-04 13:58 by pbos, last changed 2010-08-13 01:09 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
pycat.py pbos, 2010-08-04 13:58 Small program which repeats what's read from stdin.
Messages (5)
msg112810 - (view) Author: Peter Boström (pbos) Date: 2010-08-04 13:58
When reading from piped stdin, python has trouble decoding some special characters.

To reproduce, run the following command from cmd.exe:

  echo ü | C:\Python31\python.exe pycat.py

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 0: character maps to <undefined>

I've been able to reproduce this in a German version of Windows Vista, which I use at work. I detected this error when trying to pipe (and parse) output from the ping command, which contains non-simple characters. If I don't pipe and just type into the program, it works just fine, even with "strange" characters.
msg113125 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-06 19:32
This is because python doesn't know the encoding of stdin, and so uses ASCII (I assume that's what 'charmap' is on windows...on my unix box the error message mentions ascii, not charmap).  You can tell python to use an alternate encoding by default via the PYTHONIOENCODING environment variable:

rdmurray:py3k>export PYTHONIOENCODING='utf8'
rdmurray:py3k>echo ü | ./python pycat.py    
ü

Of course, you'll need to use the Widows way of setting environment variables.

I think some consideration is being given to making this simpler, but I couldn't find an issue number for it, so I'm adding haypo as nosy since I think he was involved in that discussion.  If my memory is right, and there's no existing issue, maybe we could adopt this one for it.
msg113240 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-08 08:40
If stdin is a pipe, stdin uses ASCII encoding. Do you consider that as a bug?
msg113272 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-08 14:55
It is not a bug as it stands, no, so I've changed the type to feature request.  I thought you and...Ezio? were talking about some way to improve the encoding situation when reading from/writing to a pipe.  If I'm wrong, you can just close this issue as won't fix.
msg113732 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 01:09
> I thought you and...Ezio? were talking about some way to improve 
> the encoding situation when reading from/writing to a pipe.

I don't want to change that. If you come with arguments in favor of changing that (and maybe some ideas to choose the encoding): please open a new issue.

I close *this* particular issue because it is not a bug. Please open a new issue if you would like to change the current behaviour. (Or reopen the issue if you completly disagree with me ;-))
History
Date User Action Args
2010-08-13 01:09:21vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg113732
2010-08-08 14:55:41r.david.murraysettype: behavior -> enhancement
messages: + msg113272
versions: + Python 3.2, - Python 3.1
2010-08-08 08:40:35vstinnersetmessages: + msg113240
2010-08-08 03:05:25eric.araujosetnosy: + loewis, eric.araujo
2010-08-06 19:32:17r.david.murraysetnosy: + vstinner, r.david.murray
type: behavior
messages: + msg113125
2010-08-04 13:58:47pboscreate