Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restore accepting detached stdin in fileinput binary mode #66898

Closed
4kir4 mannequin opened this issue Oct 23, 2014 · 8 comments
Closed

restore accepting detached stdin in fileinput binary mode #66898

4kir4 mannequin opened this issue Oct 23, 2014 · 8 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@4kir4
Copy link
Mannequin

4kir4 mannequin commented Oct 23, 2014

BPO 22709
Nosy @bitdancer, @4kir4, @serhiy-storchaka
Files
  • fileinput-detached-stdin.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-01-02.20:46:26.241>
    created_at = <Date 2014-10-23.07:16:00.767>
    labels = ['type-bug', 'library']
    title = 'restore accepting detached stdin in fileinput binary mode'
    updated_at = <Date 2016-01-02.20:46:26.240>
    user = 'https://github.com/4kir4'

    bugs.python.org fields:

    activity = <Date 2016-01-02.20:46:26.240>
    actor = 'r.david.murray'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-01-02.20:46:26.241>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2014-10-23.07:16:00.767>
    creator = 'akira'
    dependencies = []
    files = ['36997']
    hgrepos = []
    issue_num = 22709
    keywords = ['patch']
    message_count = 8.0
    messages = ['229859', '229865', '229868', '229869', '229870', '229874', '257361', '257363']
    nosy_count = 4.0
    nosy_names = ['r.david.murray', 'akira', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue22709'
    versions = ['Python 3.4', 'Python 3.5']

    @4kir4
    Copy link
    Mannequin Author

    4kir4 mannequin commented Oct 23, 2014

    The patch for Issue bpo-21075: "fileinput.FileInput now reads bytes from standard stream if binary mode is specified" broke code that used
    sys.stdin = sys.stdin.detach() with FileInput(mode='rb') in Python 3.3

    I've attached the patch that makes FileInput to accept detached sys.stdin
    (without 'buffer' attribute) in binary mode.

    @4kir4 4kir4 mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Oct 23, 2014
    @serhiy-storchaka
    Copy link
    Member

    The code

        sys.stdin = sys.stdin.detach()

    is incorrect because sys.stdin should be text stream, but detach() returns binary stream.

    @4kir4
    Copy link
    Mannequin Author

    4kir4 mannequin commented Oct 23, 2014

    It is incorrect that sys.stdin is *always* a text stream. It often is,
    but not always.

    There are cases when it is not e.g.,

       $ tar zcf - stuff | gpg -e | ssh user@server 'cat - > stuff.tar.gz.gpg'

    tar's stdout is *not* a text stream.
    gpg's stdin/stdout are *not* text streams.
    ssh's stdin is *not* a text stream.
    etc.

    If any of the steps are implemented in Python then it is useful to
    consider sys.stdin as a binary stream.

    Any script written before Python 3.4.1 (bpo-21075) that used FileInput binary mode
    *had to* use sys.stdin = sys.stdin.detach()

    A bugfix release should not break working code.

    @serhiy-storchaka
    Copy link
    Member

    It is incorrect that sys.stdin is *always* a text stream. It often is,
    but not always.

    There are cases when it is not e.g.,

    $ tar zcf - stuff | gpg -e | ssh user@server 'cat - > stuff.tar.gz.gpg'

    tar's stdout is *not* a text stream.
    gpg's stdin/stdout are *not* text streams.
    ssh's stdin is *not* a text stream.
    etc.

    This is not related to Python. Terms "character", "string", "text", "file" can
    have different meaning in different domains. In Python we use Python
    terminology. There is no such thing as sys.stdin in Posix-compatible shell,
    because Posix-compatible shell has no the sys module and doesn't use a dot to
    access attributes.

    Any script written before Python 3.4.1 (bpo-21075) that used FileInput binary
    mode *had to* use sys.stdin = sys.stdin.detach()

    A bugfix release should not break working code.

    Correct solution in this case would be to use the workaround "sys.stdin =
    sys.stdin.detach()" conditionally, only in Python versions which have a bug.

    @4kir4
    Copy link
    Mannequin Author

    4kir4 mannequin commented Oct 23, 2014

    This is not related to Python. Terms "character", "string", "text", "file" can have different meaning in different domains. In Python we use Python terminology. There is no such thing as sys.stdin in Posix-compatible shell, because Posix-compatible shell has no the sys module and doesn't use a dot to access attributes.

    I use Python terminology (text - Unicode string, binary data - bytes).

    Though text vs. binary data distinction is language independent (
    it doesn't matter how Unicode type is called in a particular language).

    Python can be used to implement tar, gpg, ssh, 7z, etc. I don't
    see what POSIX has anything to do with that fact.

    It is very simple actually:

    text -> encode <character encoding> -> bytes
    bytes -> decode <character encoding> -> text

    In most cases text should be human readable.

    It doesn't make sense to encode/decode input/output of gpg-like utilities using a character encoding. *Therefore* the notion of
    sys.stdin being a bytes stream (io.BufferedReader) can be useful
    in this case.

    The lines produced by FileInput are often (after optional processing)
    written to sys.stdout. If binary mode is used then FileInput(mode='rb')
    yields bytes therefore it is also useful to consider sys.stdout
    a binary stream (io.BufferedWriter) in this case.

    It introduces a nice symmetry:

    text FileInput mode -> text streams
    binary FileInput mode -> binary streams

    By design, FileInput treats stdin as any other file. It
    even supports a special name for it: '-'. A file may be in
    binary mode; stdin should be able too.

    sys.stdout is used outside of FileInput therefore no changes in
    FileInput itself are necessary but sys.stdin is used inside FileInput
    that is why the change is needed.

    Correct solution in this case would be to use the workaround "sys.stdin =
    sys.stdin.detach()" conditionally, only in Python versions which have a bug.

    Do you mean every Python 3 version before Python 3.4.1?

    Correct solution is to avoid blaming users
    (your fault -> you change your programs) for our mistakes
    and fix the bug in Python itself. The patch is attached.

    @bitdancer
    Copy link
    Member

    I actually agree that this should be applied not only for backward compatibility reasons, but because it is better duck typing. It unfortunately leaves code still having to potentially deal with "if python version is 3.4.1 or 3.4.2", but there is nothing that can be done about that.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 2, 2016

    New changeset ded1336bff49 by R David Murray in branch '3.5':
    bpo-22709: Use stdin as-is if it does not have a buffer attribute.
    https://hg.python.org/cpython/rev/ded1336bff49

    New changeset 688d32cdbc0c by R David Murray in branch 'default':
    Merge: bpo-22709: Use stdin as-is if it does not have a buffer attribute.
    https://hg.python.org/cpython/rev/688d32cdbc0c

    @bitdancer
    Copy link
    Member

    Hopefully 'better late than never' applies to this. Sigh.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants