New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgi.FieldStorage has different (wrong?) behavior on Python3 than Python2 #67989
Comments
While working on PyPI 2.0 (which is currently running Python 3) I discovered that I've attached a minimum reproducer that runs without error on Python 2.6 and Python 2.7 which fails on Python 3.2, 3.3, and 3.4. If I go into the cgi.py module and add a print() statement that will print the header of each part, I get output that looks like: b'----------------GHSKFJDLGDS7543FJKLFHRE75642756743254\r\nContent-Disposition: form-data; name="protcol_version"\r\n\r\n' The first line of that is obviously suspicious since it includes the inner boundary marker. Looking at the Python 3.x code it throws away the first line off the fp and then proceeds to process the rest of the fp. However in this case the first line is just a blank b'\r\n'. Looking at the Python 2.7 code it throws away an entire first part, not just the first line. I'm guessing that the "read first line and throw it away code" needs to continue reading lines until it's read enough lines to get to the boundary marker. |
Added a patch that fixes this issue by reading lines until we find the line that is our expected boundary marker. |
Added R David Murray to the nosy list because this is kinda similar to the email stuff and there doesn't seem to be anyone better to look at this patch that I can find... |
Also adding Berker Peksag because they've touched this module recently :) |
The loop might be more elegantly expressed as for line in self.fp:
self.bytes_read += len(line)
if line.strip() != b"--" + self.innerboundary:
break but otherwise lgtm |
The code you are modifying looks completely wrong-headed to me. Unfortunately I don't have time to rewrite the cgi module right now :). Given that, your fix looks fine: the first part of a multipart (up to the first inner boundary) is the 'preamble', which can contain pretty much arbitrary content, and in the context of cgi should indeed be completely ignored. Benjamin's suggested modification has the if logic reversed. |
The reason I didn't do that to begin with, was the code currently checks if the first line is a bytes object or not in order to be able to raise an error if it's returning str instead of bytes. I didn't want to redo that check on every iteration, so I left the original part alone and then used the while loop to handle doing more if needed. Would you prefer the code written your way and either drop the bytes check or incur the cost of doing the type check on every line? |
Oh, I see. I think your way is fine then. |
New changeset 3af77d1c5c47 by Donald Stufft in branch '3.4': |
New changeset b9364903d91c by Benjamin Peterson in branch 'default': |
Thanks everyone for taking a look at this! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: