Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip module does the wrong thing with an os.fdopen()'ed fileobj #57990

Closed
gpshead opened this issue Jan 13, 2012 · 9 comments
Closed

gzip module does the wrong thing with an os.fdopen()'ed fileobj #57990

gpshead opened this issue Jan 13, 2012 · 9 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@gpshead
Copy link
Member

gpshead commented Jan 13, 2012

BPO 13781
Nosy @gpshead, @seirl
Files
  • gzip_fdopen_prob.py
  • gzip-fdopen.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-01-18.23:16:40.961>
    created_at = <Date 2012-01-13.22:31:55.847>
    labels = ['type-bug']
    title = "gzip module does the wrong thing with an os.fdopen()'ed fileobj"
    updated_at = <Date 2014-04-05.20:49:17.490>
    user = 'https://github.com/gpshead'

    bugs.python.org fields:

    activity = <Date 2014-04-05.20:49:17.490>
    actor = 'ezio.melotti'
    assignee = 'nadeem.vawda'
    closed = True
    closed_date = <Date 2012-01-18.23:16:40.961>
    closer = 'nadeem.vawda'
    components = []
    creation = <Date 2012-01-13.22:31:55.847>
    creator = 'gregory.p.smith'
    dependencies = []
    files = ['24228', '24258']
    hgrepos = []
    issue_num = 13781
    keywords = ['patch']
    message_count = 9.0
    messages = ['151203', '151414', '151446', '151511', '151520', '151521', '151568', '151572', '151581']
    nosy_count = 5.0
    nosy_names = ['gregory.p.smith', 'nadeem.vawda', 'python-dev', 'jld', 'antoine.pietri']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue13781'
    versions = ['Python 2.7']

    @gpshead
    Copy link
    Member Author

    gpshead commented Jan 13, 2012

    gzip.GzipFile accepts a fileobj parameter with an open file object.

    Unfortunately gzip requires a filename be embedded in the gzip file and the gzip module code uses fileobj.name to get that.

    This results in the fake "<fdopen>" name from posixmodule.c being embedded in the output gzipped file when using Python 2.x. This causes problems when ungzipping these files with gzip -d or ungzip implementations that always rely on the embedded filename when writing their output file rather than stripping a suffix from the input filename as they cannot open a file called "<fdopen>" or if they do, each successive ungzip overwrites the previous...

    On Python 3.x the problem is different, the gzip module fails entirely when given an os.fdopen()'ed file object:

    $ ./python gzip_fdopen_prob.py 
    out_file <_io.BufferedWriter name='FOO.gz'>
    out_fd 3
    fd_out_file <_io.BufferedWriter name=3>
    fd_out_file.name 3
    Traceback (most recent call last):
      File "gzip_fdopen_prob.py", line 13, in <module>
        gz_out_file = gzip.GzipFile(fileobj=fd_out_file)
      File "/home/gps/oss/cpython/default/Lib/gzip.py", line 184, in __init__
        self._write_gzip_header()
      File "/home/gps/oss/cpython/default/Lib/gzip.py", line 221, in _write_gzip_header
        fname = os.path.basename(self.name)
      File "/home/gps/oss/cpython/default/Lib/posixpath.py", line 132, in basename
        i = p.rfind(sep) + 1
    AttributeError: 'int' object has no attribute 'rfind'

    (code attached)

    The os.fdopen()'ed file object is kindly using the integer file descriptor as its .name attribute. That might or might not be an issue, but regardless of that:

    1. GzipFile should not fail in this case.
    2. GzipFile should never embed a fake made up filename in its output.

    Fixing the gzip module to catch errors and use an empty b'' filename for the gzip code in the above error is easy.

    What should be done about the .name attribute on fake file objects? I don't think it should exist at all.

    (another quick test shows that gzip in python 3.x can't output to a BytesIO fileobj at all, it thinks it is readonly)

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jan 16, 2012

    For 3.x, I think that ignoring non-string names is a reasonable fix. The docs
    for io.FileIO specify that its name attribute can be either a path or an integer
    file descriptor, and changing this doesn't seem to serve any purpose.

    As for the case of 2.7's bogus "<fdopen>" name attribute, I'm not sure what the
    best course of action is. I agree that ideally we would want to get rid of the
    attribute altogether (for objects returned by fdopen), or change the semantics
    to those used by FileIO in 3.x, but making that sort of change in a bugfix
    release seems unwise.

    One alternative would be for GzipFile to specifically check whether a file
    object was returned by fdopen(), and if so ignore the fake name. I'm not sure
    how this could be accomplished, though - just checking for name == "<fdopen>" is
    too fragile for my liking, and I can't see any other obvious way of
    distinguishing objects created by fdopen() from those created by open().

    (another quick test shows that gzip in python 3.x can't output to a BytesIO
    fileobj at all, it thinks it is readonly)

    Are you sure about this? I can't reproduce the problem. Running this script:

        import gzip, io
    
        b = io.BytesIO()
        with gzip.GzipFile(fileobj=b, mode="w") as g:
            g.write(b"asdf ghjk")
        print(b.getvalue())
        b.seek(0)
        with gzip.GzipFile(fileobj=b, mode="r") as g:
            print(g.read())

    I get the following output:

    b'\x1f\x8b\x08\x00\xe1\xa4\x14O\x02\xffK,NISH\xcf\xc8\xca\x06\x00P\xd2\x1cJ\t\x00\x00\x00'
    b'asdf ghjk'
    

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jan 17, 2012

    Attached is a fix for 3.x.

    @gpshead
    Copy link
    Member Author

    gpshead commented Jan 18, 2012

    thanks that looks good.

    As far as fixing this for 2.7 goes, i don't like the _sound_ of it because it is gross... But i'm actually okay with having special case code in the gzip module that rejects '<fdopen>' as an actual filename and uses '' instead in that case. It is VERY unlikely that anyone ever intentionally wants to use that as a filename.

    Anything more than that (changing the actual '<fdopen>' string for example) seems too invasive and might break someone's doctests and does genuinely make it more difficult to see what a fdopened file object is from its repr.

    @gpshead gpshead self-assigned this Jan 18, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 18, 2012

    New changeset 7d405058e458 by Nadeem Vawda in branch '3.2':
    Issue bpo-13781: Fix GzipFile to work with os.fdopen()'d file objects.
    http://hg.python.org/cpython/rev/7d405058e458

    New changeset fe36edf3a341 by Nadeem Vawda in branch 'default':
    Merge: bpo-13781: Fix GzipFile to work with os.fdopen()'d file objects.
    http://hg.python.org/cpython/rev/fe36edf3a341

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jan 18, 2012

    As far as fixing this for 2.7 goes, i don't like the _sound_ of it
    because it is gross... But i'm actually okay with having special case
    code in the gzip module that rejects '<fdopen>' as an actual filename
    and uses '' instead in that case. It is VERY unlikely that anyone ever
    intentionally wants to use that as a filename.

    I agree - it sounds ugly, but pragmatically it seems like the best option.
    Given that the output will still be a valid gzip file even in this rare
    case, it seems unlikely to cause trouble even then.

    @gpshead
    Copy link
    Member Author

    gpshead commented Jan 18, 2012

    Looks like you've got commit privs (yay) so i'm assigning this to you to take care of that way for 2.7 as well.

    I'd add a comment to the fdopen C code where the "<fdopen>" constant lives as well as to the gzip.py module around the special case for this mentioning that they should be kept in sync. (not that either is _ever_ likely to be changed in 2.7)

    @gpshead gpshead assigned nadeemvawda and unassigned gpshead Jan 18, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 18, 2012

    New changeset a08e9e84f33f by Nadeem Vawda in branch '2.7':
    Issue bpo-13781: Fix GzipFile to work with os.fdopen()'d file objects.
    http://hg.python.org/cpython/rev/a08e9e84f33f

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jan 18, 2012

    Done.

    @nadeemvawda nadeemvawda mannequin closed this as completed Jan 18, 2012
    @nadeemvawda nadeemvawda mannequin added the type-bug An unexpected behavior, bug, or error label Jan 18, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant