Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgi.FieldStorage can't parse simple body with Content-Length and no Content-Disposition #71964

Closed
rr mannequin opened this issue Aug 16, 2016 · 34 comments
Closed

cgi.FieldStorage can't parse simple body with Content-Length and no Content-Disposition #71964

rr mannequin opened this issue Aug 16, 2016 · 34 comments
Assignees
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@rr
Copy link
Mannequin

rr mannequin commented Aug 16, 2016

BPO 27777
Nosy @jackjansen, @srittau, @Cito, @ned-deily, @merwok, @ethanfurman, @PierreQuentel, @bertjwregeer, @berkerpeksag, @AraHaan, @remilapeyre, @eykamp, @bertjwregeer, @kulikjak, @ar45, @bnmnetp
PRs
  • bpo-27777 Fix for several reported cgi.py issues #7804
  • bpo-27777 : cgi.FieldStorage can't parse simple body with Con… #10771
  • bpo-27777: cgi.FieldStorage can't parse simple body with Cont… #11764
  • bpo-27777: cgi.FieldStorage can't parse simple body with Cont… #11764
  • bpo-27777: cgi.FieldStorage can't parse simple body with Cont… #11764
  • bpo-27777: fix cgi.FieldStorage parsing for body with Content-Length and no Content-Disposition #21457
  • Files
  • test
  • cgi.py
  • bug27777.patch: patchset for bug27777
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ethanfurman'
    closed_at = None
    created_at = <Date 2016-08-16.13:55:45.934>
    labels = ['type-bug', 'library', '3.9', '3.10', '3.11']
    title = "cgi.FieldStorage can't parse simple body with Content-Length and no Content-Disposition"
    updated_at = <Date 2022-02-11.13:48:50.611>
    user = 'https://bugs.python.org/rr'

    bugs.python.org fields:

    activity = <Date 2022-02-11.13:48:50.611>
    actor = 'eric.araujo'
    assignee = 'ethan.furman'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2016-08-16.13:55:45.934>
    creator = 'rr-'
    dependencies = []
    files = ['44124', '44125', '45527']
    hgrepos = []
    issue_num = 27777
    keywords = ['patch']
    message_count = 29.0
    messages = ['272856', '272857', '272858', '277482', '277483', '277536', '281075', '281793', '281794', '319179', '319438', '319443', '319451', '319480', '319534', '319993', '320503', '320506', '330626', '334882', '334890', '342504', '349264', '366308', '368837', '377493', '386175', '387809', '413071']
    nosy_count = 20.0
    nosy_names = ['jackjansen', 'srittau', 'cito', 'ned.deily', 'eric.araujo', 'ethan.furman', 'quentel', 'X-Istence', 'berker.peksag', 'Decorater', 'rr-', 'remi.lapeyre', 'watusimoto', 'Bert JW Regeer', 'kulikjak', 'Aron Podrigal', 'elgow', 'Fran Boon', 'Tim Nyborg2', 'bnmnetp']
    pr_nums = ['7804', '10771', '11764', '11764', '11764', '21457']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue27777'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @rr
    Copy link
    Mannequin Author

    rr mannequin commented Aug 16, 2016

    Sending requests with Content-Length but without Content-Disposition headers causes following error:

    Traceback (most recent call last):
      File "./test", line 19, in <module>
        form = cgi.FieldStorage(fp=env['wsgi.input'], environ=env)
      File "/usr/lib/python3.5/cgi.py", line 561, in __init__
        self.read_single()
      File "/usr/lib/python3.5/cgi.py", line 740, in read_single
        self.read_binary()
      File "/usr/lib/python3.5/cgi.py", line 762, in read_binary
        self.file.write(data)
    TypeError: write() argument must be str, not bytes

    I've attached a test file that reproduces the issue.

    The issue is because read_single decides whether to read the content as binary or text depending on content-length - if it's > 0, it uses read_binary which assumes binary input, and rewrites this input to self.file, assuming self.file is opened in binary mode.

    At the same, self.file is opened in text mode, because self._binary_file is set to False, which in turn is because there's no Content-Disposition header.

    At very least, the decision whether to use binary or text should be consistent in both places (self.length >= 0 vs self._binary_file).

    Related: https://bugs.python.org/issue27308
    Note that unlike https://bugs.python.org/issue24764 this issue does NOT concern multipart requests.

    @rr rr mannequin added the stdlib Python modules in the Lib dir label Aug 16, 2016
    @AraHaan
    Copy link
    Mannequin

    AraHaan mannequin commented Aug 16, 2016

    hmm into looking it should check if it is in actuality a binary file the length of the file data does not really determine anything on encoding really.

    if self._binary_file:

    would suffice on determining binary mode or not.

    @AraHaan
    Copy link
    Mannequin

    AraHaan mannequin commented Aug 16, 2016

    Here is a patch to review (note I only had disc space to clone 3.6 so I had to manually download this version of the file).

    @bertjwregeer
    Copy link
    Mannequin

    bertjwregeer mannequin commented Sep 27, 2016

    On line #890 in self.make_file() the check for _binary_file should be changed to also check for self.length >= 0.

    https://github.com/python/cpython/blob/3.4/Lib/cgi.py#L890

    becomes:

    if self._binary_file or self.length >= 0:

    _binary_file is only ever set if there is a content disposition, which there is not in the test case provided. In the case of no content disposition we can use the content-length as a hint that we have a file that has been uploaded. All files uploaded should be treated as binary if they are not a text type.

    This is a duplicate of bpo-27308, however the patch in that report is incorrect IMHO.

    @bertjwregeer
    Copy link
    Mannequin

    bertjwregeer mannequin commented Sep 27, 2016

    Updated versions this applies to.

    @bertjwregeer bertjwregeer mannequin added the 3.7 (EOL) end of life label Sep 27, 2016
    @berkerpeksag
    Copy link
    Member

    Thanks for triaging this, Bert. Would you like to propose a patch with a test case?

    Note that we can't fix this in 3.3 and 3.4 because they are in security-fix-only mode. See https://docs.python.org/devguide/index.html#status-of-python-branches for details.

    @berkerpeksag berkerpeksag added the type-bug An unexpected behavior, bug, or error label Sep 27, 2016
    @bertjwregeer
    Copy link
    Mannequin

    bertjwregeer mannequin commented Nov 18, 2016

    @berker.peksag:

    Attached is a patch with a test case that exercises this issue.

    Code path is that read_single() checks if the length is greater than 0, and then it reads binary, otherwise it reads it as a single line. This fixes make_file so that if self.length is greater than or equal to 0, it opens the file in binary mode, this matches the checks that write stuff as binary.

    This also undoes the change that was made in https://bugs.python.org/issue24764. Fixing this issue fixed that one as well, and arguably throwing data away doesn't seem like a good idea.

    @ned-deily
    Copy link
    Member

    Berker asks in IRC whether this change should go into 3.6.0 (at rc1). While it is affecting a relatively self-contained part of the standard library (cgi), the issue doesn't seem to be "release critical". Further, it is changing behavior that was changed barely a year ago for bpo-24764. My preference would be to try to have this change reviewed and/or tested by at least some of the people involved with the earlier issue and, if there is a consensus for it, target the change for 3.6.1.

    @bertjwregeer
    Copy link
    Mannequin

    bertjwregeer mannequin commented Nov 26, 2016

    Unfortunately I need to spin another patch, the one I created didn't solve the issue for one of WebOb's users:

    Pylons/webob#300 (Thanks Julien Meyer!)

    I have his permission to grab his test/patch and update this patch, I will get this done later today.

    That being said, this is a real issue, and WebOb will be shipping a fix for Python less than 3.6 anyway, so whether this gets released in 3.6 or not doesn't matter to me. I'd prefer this to be fixed in the standard library for all users, rather than just for WebOb users.

    Even if this were released for 3.6.1, WebOb will have to carry the fix for the foreseeable future.

    @eykamp
    Copy link
    Mannequin

    eykamp mannequin commented Jun 9, 2018

    I've been experiencing the same issue, which is triggered in the exception handling of web.py.

    Bert's proposed fix, adding the zero byte check (if self._binary_file or self.length >= 0:) addresses the issue I'm seeing (tested on 3.5, it's what's available where I can reproduce the error).

    This issue seems to be languishing. Is there any way we could push this forward, even if it doesn't address every problem with the lib?

    @eykamp
    Copy link
    Mannequin

    eykamp mannequin commented Jun 13, 2018

    This also manifests itself when using web.py: if the underlying code throws an exception, this is emitted:

    File "/usr/local/lib/python3.5/dist-packages/web/webapi.py", line 364, in input
    out = rawinput(method)
    File "/usr/local/lib/python3.5/dist-packages/web/webapi.py", line 341, in rawinput
    a = cgi.FieldStorage(fp=fp, environ=e, keep_blank_values=1)
    File "/usr/lib/python3.5/cgi.py", line 561, in __init
    _
    self.read_single()
    File "/usr/lib/python3.5/cgi.py", line 740, in read_single
    self.read_binary()
    File "/usr/lib/python3.5/cgi.py", line 762, in read_binary
    self.file.write(data)
    TypeError: write() argument must be str, not bytes

    @berkerpeksag
    Copy link
    Member

    Thank you for the ping, Chris. I will try to combine Bert's and Julien's patches and prepare a PR this weekend.

    @berkerpeksag berkerpeksag added the 3.8 only security fixes label Jun 13, 2018
    @eykamp
    Copy link
    Mannequin

    eykamp mannequin commented Jun 13, 2018

    I've already got a PR based on the patch listed under the Files section (it's prepared, not yet submitted), but if you want to do something more, I'll step back and let you do it.

    @berkerpeksag
    Copy link
    Member

    That's even better! :) Please submit your work as a pull request.

    Did you take a look at Pylons/webob#300 as well? Can we use the test in the PR? Is it possible to adapt it solve both this and WebOb issues?

    @eykamp
    Copy link
    Mannequin

    eykamp mannequin commented Jun 14, 2018

    I'll get a PR submitted this weekend, and post back here. It will not explicitly address that other case, as I don't have the capacity or wherewithal for that. Alas.

    @eykamp
    Copy link
    Mannequin

    eykamp mannequin commented Jun 19, 2018

    Packaged patch offered below into PR 7804

    #7804

    @bertjwregeer
    Copy link
    Mannequin

    bertjwregeer mannequin commented Jun 26, 2018

    I'll take a look and see if I can get the other fixes from WebOb and add them to a patch, and create a follow-up PR.

    If I can stop carrying a monkey patch for the standard library I am all for it!

    Thanks for running with this!

    @eykamp
    Copy link
    Mannequin

    eykamp mannequin commented Jun 26, 2018

    I don't know if you've read the dialog on the PR (there was also some offline between Ned and myself), but the patch breaks a test when running under a fresh build of Python. I can't reproduce it here without setting up a build system, which I haven't had time to do, and won't be able to do at leat through the end of next week.

    If you can run the tests on a fresh build of Python, and confirm that they break, that would be helpful.

    @PierreQuentel
    Copy link
    Mannequin

    PierreQuentel mannequin commented Nov 28, 2018

    I have submitted another Pull Request (10771) that seems to fix the bug while passing all the tests in test_cgi.py

    @ar45
    Copy link
    Mannequin

    ar45 mannequin commented Feb 5, 2019

    I am experiencing the same issue.
    #10771 looks good.

    While were at it, and if PR 10771 is accepted, maybe we can change

    line = self.fp.readline(1<<16) # bytes
    to read instead of readline since we anyway read till EOF.

    @ar45
    Copy link
    Mannequin

    ar45 mannequin commented Feb 5, 2019

    A different approach. Always honor content length, and do not try to read more than.

    @elgow
    Copy link
    Mannequin

    elgow mannequin commented May 14, 2019

    This bug is triggered by xml-rpc calls from the xmlrpc.client in the Python 3.5 standard library to a mod_wsgi/Python 3.5 endpoint.

    @Cito
    Copy link
    Mannequin

    Cito mannequin commented Aug 8, 2019

    This also happens when sending POST requests with JSON payload from a browser with XMLHttpRequest to a Python 3.7 backend using FieldStorage. It seems XMLHttpRequest adds the content length automatically.

    @ethanfurman ethanfurman self-assigned this Dec 3, 2019
    @FranBoon
    Copy link
    Mannequin

    FranBoon mannequin commented Apr 13, 2020

    What is happening with this bug?
    I am amazed that nearly 4 years on it doesn't seem to have been resolved.
    The issue took me a fairly long time to debug the cause of, but once known the issue seems relatively simple to resolve & there are a couple of Pull Requests which fix the issue. This is my first time looking into the core of Python's own development (which I guess is a testament to how well this normally works), so I may be being naive, but what is the blocker here? Is there anything I can do to help? Test/Review existing PRs? (They both look good to me)
    Create a new PR? (Seems unnecessary)

    I really am genuinely keen to help resolve this for at least Python 3.7+ (Am aware that 3.6 is security fixes only)

    @TimNyborg2
    Copy link
    Mannequin

    TimNyborg2 mannequin commented May 14, 2020

    Echoing Fran Boon, I'm wondering what needs to happen to get the fixes merged and this issue resolved. It affects web servers run on several frameworks, which is more of a problem now, since so many of us migrated to py3 in advance of py2 EOL.

    @kulikjak
    Copy link
    Mannequin

    kulikjak mannequin commented Sep 25, 2020

    We internally tested the most recent PR and found some issues with it:
    #21457 (comment)

    We ended up using a much simpler patch, which seems to work as expected.

    --- Python-3.7.8/Lib/cgi.py
    +++ Python-3.7.8/Lib/cgi.py
    @@ -703,7 +703,10 @@
                     if not data:
                         self.done = -1
                         break
    -                self.file.write(data)
    +                if self._binary_file:
    +                    self.file.write(data)
    +                else:
    +                    self.file.write(data.decode())
                     todo = todo - len(data)
     
         def read_lines(self):

    @bnmnetp
    Copy link
    Mannequin

    bnmnetp mannequin commented Feb 3, 2021

    Thanks Jakub,

    Your patch fixed an increasingly frequent problem with my site.

    How can I help to get this merged so I don't have to have a custom version of cgi.py??

    @Cito
    Copy link
    Mannequin

    Cito mannequin commented Feb 28, 2021

    Just created a test case for this problem after a pentest provoked this error on one of my web apps. Then I found this bug report which already has a similar test case attached.

    The problem is that read_binary() as the name says reads binary data, but then writes it to a file which may or may not be binary, depending on whether self._binary_file is set, which depends on whether a filename was set via the content-disposition header.

    Jakub's patch looks good and works for me. Please merge this!

    @merwok
    Copy link
    Member

    merwok commented Feb 11, 2022

    Both active PRs have comments pointing out issues, that’s why this is still open. A clean fix with unit tests and no regression is needed.

    @merwok merwok added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Feb 11, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @AlexWaygood
    Copy link
    Member

    The cgi module is now deprecated following the acceptance of PEP 594, so bugfixes and improvements for this module will no longer be accepted. I am therefore closing this issue.

    @merwok
    Copy link
    Member

    merwok commented Apr 11, 2022

    Is it stated in the PEP that the deprecation for 3.13 means that fixes can’t be done in current bugfix branches?

    @AlexWaygood
    Copy link
    Member

    AlexWaygood commented Apr 11, 2022

    Is it stated in the PEP that the deprecation for 3.13 means that fixes can’t be done in current bugfix branches?

    It is not, but it is assumed that such fixes will be low priority. Given the large issue/PR backlog, and the fact that this module has been deprecated due to the fact that it effectively does not have an active maintainer in the core dev team at the moment, I felt that it would be unlikely that a fix would be merged for this issue. The dev guide also states that triagers may close issues and PRs proposing fixes for deprecated modules.

    But you're a core dev, while I'm only a triager — feel free to reopen the issue and/or the PR(s) if you'd like to shepherd some of the fixes to being merged!

    @merwok
    Copy link
    Member

    merwok commented Apr 11, 2022

    No, sounds fine to me in this case. The ticket is still here with its info, and someone motivated could make a PR that addresses the problems.

    Thanks for finding the devguide reference!

    @AlexWaygood
    Copy link
    Member

    I've been informed that "mannequin users" are not necessarily pinged if changes are made to an issue, even if they appear in the "nosy" field (@ezio-melotti is working on a fix).

    So: @jackjansen, @srittau, @Cito, @PierreQuentel, @bertjwregeer, @berkerpeksag, @AraHaan, @remilapeyre, @eykamp, @bertjwregeer, @kulikjak, @ar45, @bnmnetp: cc'ing you in to let you know that this issue has been closed.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants