Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgi.FieldStorage has different (wrong?) behavior on Python3 than Python2 #67989

Closed
dstufft opened this issue Mar 29, 2015 · 11 comments
Closed

cgi.FieldStorage has different (wrong?) behavior on Python3 than Python2 #67989

dstufft opened this issue Mar 29, 2015 · 11 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@dstufft
Copy link
Member

dstufft commented Mar 29, 2015

BPO 23801
Nosy @benjaminp, @bitdancer, @berkerpeksag, @dstufft
Files
  • reproduce.py
  • cgi-read-until-boundary.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-03-29.20:43:32.564>
    created_at = <Date 2015-03-29.02:22:20.895>
    labels = ['type-bug']
    title = 'cgi.FieldStorage has different (wrong?) behavior on Python3 than Python2'
    updated_at = <Date 2015-03-29.20:51:22.006>
    user = 'https://github.com/dstufft'

    bugs.python.org fields:

    activity = <Date 2015-03-29.20:51:22.006>
    actor = 'dstufft'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-03-29.20:43:32.564>
    closer = 'python-dev'
    components = []
    creation = <Date 2015-03-29.02:22:20.895>
    creator = 'dstufft'
    dependencies = []
    files = ['38721', '38722']
    hgrepos = []
    issue_num = 23801
    keywords = ['patch']
    message_count = 11.0
    messages = ['239471', '239472', '239478', '239480', '239490', '239497', '239516', '239517', '239518', '239519', '239520']
    nosy_count = 5.0
    nosy_names = ['benjamin.peterson', 'r.david.murray', 'python-dev', 'berker.peksag', 'dstufft']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue23801'
    versions = ['Python 3.4', 'Python 3.5', 'Python 3.6']

    @dstufft
    Copy link
    Member Author

    dstufft commented Mar 29, 2015

    While working on PyPI 2.0 (which is currently running Python 3) I discovered that setup.py upload was causing an exception. After tracing things I determined that the reason for this is that Python 3 fails to handle leading whitespace in a multipart body.

    I've attached a minimum reproducer that runs without error on Python 2.6 and Python 2.7 which fails on Python 3.2, 3.3, and 3.4.

    If I go into the cgi.py module and add a print() statement that will print the header of each part, I get output that looks like:

    b'----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\r\nContent-Disposition: form-data; name="protcol_version"\r\n\r\n'
    b'Content-Disposition: form-data; name="summary"\r\n\r\n'
    b'Content-Disposition: form-data; name="home_page"\r\n\r\n'
    b'Content-Disposition: form-data; name="filetype"\r\n\r\n'
    b'Content-Disposition: form-data; name="content"; filename="jasmin-13.13.13.tar.gz"\r\n\r\n'

    The first line of that is obviously suspicious since it includes the inner boundary marker. Looking at the Python 3.x code it throws away the first line off the fp and then proceeds to process the rest of the fp. However in this case the first line is just a blank b'\r\n'. Looking at the Python 2.7 code it throws away an entire first part, not just the first line.

    I'm guessing that the "read first line and throw it away code" needs to continue reading lines until it's read enough lines to get to the boundary marker.

    @dstufft dstufft added the type-bug An unexpected behavior, bug, or error label Mar 29, 2015
    @dstufft
    Copy link
    Member Author

    dstufft commented Mar 29, 2015

    Added a patch that fixes this issue by reading lines until we find the line that is our expected boundary marker.

    @dstufft
    Copy link
    Member Author

    dstufft commented Mar 29, 2015

    Added R David Murray to the nosy list because this is kinda similar to the email stuff and there doesn't seem to be anyone better to look at this patch that I can find...

    @dstufft
    Copy link
    Member Author

    dstufft commented Mar 29, 2015

    Also adding Berker Peksag because they've touched this module recently :)

    @benjaminp
    Copy link
    Contributor

    The loop might be more elegantly expressed as

    for line in self.fp:
        self.bytes_read += len(line)
        if line.strip() != b"--" + self.innerboundary:
            break

    but otherwise lgtm

    @bitdancer
    Copy link
    Member

    The code you are modifying looks completely wrong-headed to me. Unfortunately I don't have time to rewrite the cgi module right now :). Given that, your fix looks fine: the first part of a multipart (up to the first inner boundary) is the 'preamble', which can contain pretty much arbitrary content, and in the context of cgi should indeed be completely ignored.

    Benjamin's suggested modification has the if logic reversed.

    @dstufft
    Copy link
    Member Author

    dstufft commented Mar 29, 2015

    @benjamin

    The reason I didn't do that to begin with, was the code currently checks if the first line is a bytes object or not in order to be able to raise an error if it's returning str instead of bytes. I didn't want to redo that check on every iteration, so I left the original part alone and then used the while loop to handle doing more if needed.

    Would you prefer the code written your way and either drop the bytes check or incur the cost of doing the type check on every line?

    @benjaminp
    Copy link
    Contributor

    Oh, I see. I think your way is fine then.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 29, 2015

    New changeset 3af77d1c5c47 by Donald Stufft in branch '3.4':
    Closes bpo-23801 - Ignore entire preamble to multipart in cgi.FieldStorage
    https://hg.python.org/cpython/rev/3af77d1c5c47

    @python-dev python-dev mannequin closed this as completed Mar 29, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 29, 2015

    New changeset b9364903d91c by Benjamin Peterson in branch 'default':
    merge 3.4 (bpo-23801)
    https://hg.python.org/cpython/rev/b9364903d91c

    @dstufft
    Copy link
    Member Author

    dstufft commented Mar 29, 2015

    Thanks everyone for taking a look at this!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants