This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: restore accepting detached stdin in fileinput binary mode
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: akira, python-dev, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-10-23 07:16 by akira, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fileinput-detached-stdin.diff akira, 2014-10-23 07:16 review
Messages (8)
msg229859 - (view) Author: Akira Li (akira) * Date: 2014-10-23 07:16
The patch for Issue #21075: "fileinput.FileInput now reads bytes from standard stream if binary mode is specified" broke code that used
sys.stdin = sys.stdin.detach() with FileInput(mode='rb') in Python 3.3

I've attached the patch that makes FileInput to accept detached sys.stdin 
(without 'buffer' attribute) in binary mode.
msg229865 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-23 09:12
The code

    sys.stdin = sys.stdin.detach()

is incorrect because sys.stdin should be text stream, but detach() returns binary stream.
msg229868 - (view) Author: Akira Li (akira) * Date: 2014-10-23 11:26
It is incorrect that sys.stdin is *always* a text stream. It often is,
but not always.

There are cases when it is not e.g., 

   $ tar zcf - stuff | gpg -e | ssh user@server 'cat - > stuff.tar.gz.gpg'

tar's stdout is *not* a text stream.
gpg's stdin/stdout are *not* text streams.
ssh's stdin is *not* a text stream.
etc.

If any of the steps are implemented in Python then it is useful to
consider sys.stdin as a binary stream.

Any script written before Python 3.4.1 (#21075) that used FileInput binary mode
*had to* use sys.stdin = sys.stdin.detach()

A bugfix release should not break working code.
msg229869 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-23 11:49
> It is incorrect that sys.stdin is *always* a text stream. It often is,
> but not always.
> 
> There are cases when it is not e.g.,
> 
>    $ tar zcf - stuff | gpg -e | ssh user@server 'cat - > stuff.tar.gz.gpg'
> 
> tar's stdout is *not* a text stream.
> gpg's stdin/stdout are *not* text streams.
> ssh's stdin is *not* a text stream.
> etc.

This is not related to Python. Terms "character", "string", "text", "file" can 
have different meaning in different domains. In Python we use Python 
terminology. There is no such thing as sys.stdin in Posix-compatible shell, 
because Posix-compatible shell has no the sys module and doesn't use a dot to 
access attributes.

> Any script written before Python 3.4.1 (#21075) that used FileInput binary
> mode *had to* use sys.stdin = sys.stdin.detach()
> 
> A bugfix release should not break working code.

Correct solution in this case would be to use the workaround "sys.stdin = 
sys.stdin.detach()" conditionally, only in Python versions which have a bug.
msg229870 - (view) Author: Akira Li (akira) * Date: 2014-10-23 12:38
> This is not related to Python. Terms "character", "string", "text", "file" can have different meaning in different domains. In Python we use Python terminology. There is no such thing as sys.stdin in Posix-compatible shell, because Posix-compatible shell has no the sys module and doesn't use a dot to access attributes.

I use Python terminology (text - Unicode string, binary data - bytes).

Though text vs. binary data distinction is language independent (
it doesn't matter how Unicode type is called in a particular language).

Python can be used to implement `tar`, `gpg`, `ssh`, `7z`, etc. I don't
see what POSIX has anything to do with that fact.

It is very simple actually: 

  text -> encode <character encoding> -> bytes
  bytes -> decode <character encoding> -> text

In most cases text should be human readable.

It doesn't make sense to encode/decode input/output of gpg-like utilities using a character encoding. *Therefore* the notion of 
sys.stdin being a bytes stream (io.BufferedReader) can be useful
in this case.

The lines produced by FileInput are often (after optional processing)
written to sys.stdout. If binary mode is used then FileInput(mode='rb') 
yields bytes therefore it is also useful to consider sys.stdout
a binary stream (io.BufferedWriter) in this case.

It introduces a nice symmetry:

  text FileInput mode -> text streams
  binary FileInput mode -> binary streams

By design, FileInput treats stdin as any other file. It
even supports a special name for it: '-'. A file may be in
binary mode; stdin should be able too.

sys.stdout is used outside of FileInput therefore no changes in 
FileInput itself are necessary but sys.stdin is used inside FileInput
that is why the change is needed.

> Correct solution in this case would be to use the workaround "sys.stdin = 
sys.stdin.detach()" conditionally, only in Python versions which have a bug.

Do you mean every Python 3 version before Python 3.4.1?

Correct solution is to avoid blaming users 
(your fault -> you change your programs) for our mistakes  
and fix the bug in Python itself. The patch is attached.
msg229874 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-10-23 14:32
I actually agree that this should be applied not only for backward compatibility reasons, but because it is better duck typing.  It unfortunately leaves code still having to potentially deal with "if python version is 3.4.1 or 3.4.2", but there is nothing that can be done about that.
msg257361 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-01-02 20:45
New changeset ded1336bff49 by R David Murray in branch '3.5':
#22709: Use stdin as-is if it does not have a buffer attribute.
https://hg.python.org/cpython/rev/ded1336bff49

New changeset 688d32cdbc0c by R David Murray in branch 'default':
Merge: #22709: Use stdin as-is if it does not have a buffer attribute.
https://hg.python.org/cpython/rev/688d32cdbc0c
msg257363 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-01-02 20:46
Hopefully 'better late than never' applies to this.  Sigh.
History
Date User Action Args
2022-04-11 14:58:09adminsetgithub: 66898
2016-01-02 20:46:26r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg257363

stage: commit review -> resolved
2016-01-02 20:45:09python-devsetnosy: + python-dev
messages: + msg257361
2015-12-03 19:38:58r.david.murraysetstage: commit review
2014-10-23 14:32:22r.david.murraysetnosy: + r.david.murray

messages: + msg229874
versions: - Python 3.6
2014-10-23 12:38:24akirasetmessages: + msg229870
2014-10-23 11:49:35serhiy.storchakasetmessages: + msg229869
2014-10-23 11:26:21akirasetmessages: + msg229868
2014-10-23 09:12:08serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg229865
2014-10-23 07:16:00akiracreate