This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author davispuh
Recipients davispuh, eryksun, ezio.melotti, martin.panter, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Date 2016-06-09.01:07:01
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1465434424.79.0.411072479352.issue27179@psf.upfronthosting.co.za>
In-reply-to
Content
> Note that patch 3 requires setting `encoding` for even python.exe as a child process, because sys.std* default to ANSI when isatty(fd) isn't true.

I've updated my patch so that Python outputs in consoles encoding for pipes too.

So now in PowerShell

>[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
>python -c "print('ā')" | Out-String
ā
> python -c "import subprocess; print(subprocess.getoutput('python -c ""print(\'ā\')""'))"
ā
>[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(775)
>python -c "print('ā')" | Out-String
ā
> python -c "import subprocess; print(subprocess.getoutput('python -c ""print(\'ā\')""'))"
ā


> What I wish is for Python to default to using UTF-8 for its own pipe and disk file I/O. The old behavior could be selected by setting some hypothetical environment variable, such as PYTHONIOUSELOCALE.

I actually don't really see need for this, specifying PYTHONIOENCODING="UTF-8" it will be used for pipes.


> If subprocess defaults to the console's current codepage (when available), it would be nice to have a way to conveniently select the OEM or ANSI codepage. The codecs module could define string constants based on GetOEMCP() and GetACP(), such as codecs.CP_OEMCP (e.g. 'cp437') and codecs.CP_ACP (e.g. 'cp1252'). subprocess could import these constants on Windows.

also updated in my patch and implemented something like this but IMO easier, basically "ansi" and "oem" is a valid encoding on Windows and can be used anywhere where encoding can be specified as a parameter. Look at patch to see how it's implemented.


Ok, so now does my patch look acceptable? What would be issues with it? IMO it greatly improves current situation (fixes #27048 and solves #6135) and I don't see any issues with it.

Things that are changed:
* "ansi" and "oem" are valid encodings on Windows
* console's code page is used for console and pipe (if there's no console then ANSI is used like now)
* subprocess uses "ansi" for DETACHED_PROCESS and "oem" for CREATE_NEW_CONSOLE, CREATE_NO_WINDOW
* encoding and errors parameters can be specified for Popen
* custom parameters (including encoding and errors) can be specified for subprocess.getstatusoutput and getoutput

Also if it's needed I see how easily can add support for separate encodings and errors for stdin/out/err
for example with

    if (type(encoding) is str):
        encoding_stdin = encoding_stdout = encoding_stderr = encoding
    elif (type(encoding) is tuple):
        encoding_stdin, encoding_stdout, encoding_stderr = encoding
    else:
        encoding_stdin = encoding_stdout = encoding_stderr = None

then could use 
    subprocess.check_output('', encoding='oem')
and
    subprocess.check_output('', encoding=('oem', 'ansi', 'ansi'))



Known issues (present in both cases with and without my patch):
* when using cmd.exe and python is outputting to pipe then for some unknown reason error happens

with cmd.exe
> python -c "print('\n')" | echo
ECHO is on.
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp775'>
OSError: [Errno 22] Invalid argument

It doesn't matter which code page for console is set and what is being outputted.
It happens for both released 3.5.1 and repo default branch but it doesn't happen when PowerShell is used.

I looked into it but didn't found why it happens, only that
    n = write(fd, buf, (int)count);
in _Py_write_impl (fileutils.c) returns -1 and errno is EINVAL
I verified that all parameters are correct fd, buf (it isn't NULL) and count (parameters are same as when running without pipe)
so I've no idea what causes it.


* Python corrupts characters when reading from stdin

with PowerShell
> Out-String -InputObject "ā" | python -c "import sys; print(sys.stdin.encoding,sys.stdin.read())"
cp1257 ?

It happens for both released 3.5.1 and repo default branch.
With my patch used encoding will be based on console's code page, but it doesn't matter because seems it gets corrupted even before it gets used. I tested it when using console encodings: oem, ansi and utf-8 and also these same with PYTHONIOENCODING too and in all cases it was corrupted, replaced with "?".

I didn't looked further into this.
History
Date User Action Args
2016-06-09 01:07:05davispuhsetrecipients: + davispuh, paul.moore, vstinner, tim.golden, ezio.melotti, martin.panter, zach.ware, eryksun, steve.dower
2016-06-09 01:07:04davispuhsetmessageid: <1465434424.79.0.411072479352.issue27179@psf.upfronthosting.co.za>
2016-06-09 01:07:04davispuhlinkissue27179 messages
2016-06-09 01:07:03davispuhcreate