Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgi.FieldStorage can't parse multipart part headers with Content-Length and no filename in Content-Disposition #68952

Closed
peterlandry opened this issue Jul 31, 2015 · 14 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@peterlandry
Copy link

BPO 24764
Nosy @vstinner, @ethanfurman, @PierreQuentel, @bertjwregeer, @berkerpeksag
Files
  • cgi_multipart.patch: Test case and naive bugfix
  • cgi_multipart.patch: Improved patch that removes the header when present
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ethanfurman'
    closed_at = <Date 2015-08-18.17:45:31.731>
    created_at = <Date 2015-07-31.16:07:21.508>
    labels = ['type-bug', 'library']
    title = "cgi.FieldStorage can't parse multipart part headers with Content-Length and no filename in Content-Disposition"
    updated_at = <Date 2020-07-20.20:52:04.982>
    user = 'https://bugs.python.org/PeterLandry'

    bugs.python.org fields:

    activity = <Date 2020-07-20.20:52:04.982>
    actor = 'Rhodri James'
    assignee = 'ethan.furman'
    closed = True
    closed_date = <Date 2015-08-18.17:45:31.731>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2015-07-31.16:07:21.508>
    creator = 'Peter Landry'
    dependencies = []
    files = ['40084', '40145']
    hgrepos = []
    issue_num = 24764
    keywords = ['patch']
    message_count = 14.0
    messages = ['247751', '247752', '247753', '247799', '248185', '248198', '248199', '248258', '248306', '248720', '248729', '248778', '248779', '281076']
    nosy_count = 8.0
    nosy_names = ['vstinner', 'ethan.furman', 'quentel', 'X-Istence', 'python-dev', 'berker.peksag', 'Peter Landry', 'pradeep']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue24764'
    versions = ['Python 3.4', 'Python 3.5', 'Python 3.6']

    @peterlandry
    Copy link
    Author

    cgi.FieldStorage can't parse a multipart with a Content-Length header set on a part:

    [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.49)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import cgi
    >>> from io import BytesIO
    >>>
    >>> BOUNDARY = "JfISa01"
    >>> POSTDATA = """--JfISa01
    ... Content-Disposition: form-data; name="submit-name"
    ... Content-Length: 5
    ...
    ... Larry
    ... --JfISa01"""
    >>> env = {
    ...     'REQUEST_METHOD': 'POST',
    ...     'CONTENT_TYPE': 'multipart/form-data; boundary={}'.format(BOUNDARY),
    ...     'CONTENT_LENGTH': str(len(POSTDATA))}
    >>> fp = BytesIO(POSTDATA.encode('latin-1'))
    >>> fs = cgi.FieldStorage(fp, environ=env, encoding="latin-1")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 571, in __init__
        self.read_multi(environ, keep_blank_values, strict_parsing)
      File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 726, in read_multi
        self.encoding, self.errors)
      File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 573, in __init__
        self.read_single()
      File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 736, in read_single
        self.read_binary()
      File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 758, in read_binary
        self.file.write(data)
    TypeError: must be str, not bytes
    >>>

    This happens because of a mismatch between the code that creates a temp file to write to and the code that chooses to read in binary mode or not:

    • the presence of filename in the Content-Disposition header triggers creation of a binary mode file
    • the present of a Content-Length header for the part triggers a binary read

    When Content-Length is present but filename is absent, bytes are written to the non-binary temp file, causing the error above.

    I've reviewed the relevant RFCs, and I'm not really sure what the correct way to handle this is. I don't believe Content-Length is addressed for part bodies in the MIME spec0, and HTTP has its own semantics1.

    At the very least, I think this behavior is confusing and unexpected. Some libraries, like Retrofit2, will by default include Content-Length, and break when submitting POST data to a python server.

    I've made an attempt to work in the way I'd expect, and attached a patch, but I'm really not sure if it's the proper decision. My patch kind of naively accepts the existing semantics of Content-Length that presume bytes, and treats the creation of a non-binary file as the "bug".

    @peterlandry peterlandry added the stdlib Python modules in the Lib dir label Jul 31, 2015
    @peterlandry
    Copy link
    Author

    I realized my formatting was poor, making it hard to quickly test the issue. Here's a cleaner version:

        import cgi
        from io import BytesIO
    
        BOUNDARY = "JfISa01"
        POSTDATA = """--JfISa01
        Content-Disposition: form-data; name="submit-name"
        Content-Length: 5
    Larry
    --JfISa01"""
    env = {
        'REQUEST_METHOD': 'POST',
        'CONTENT_TYPE': 'multipart/form-data; boundary={}'.format(BOUNDARY),
        'CONTENT_LENGTH': str(len(POSTDATA))}
    fp = BytesIO(POSTDATA.encode('latin-1'))
    fs = cgi.FieldStorage(fp, environ=env, encoding="latin-1")
    

    @vstinner
    Copy link
    Member

    @pierre Quentel: Hi! Are you still working on CGI? Can you please review this patch? Thanks.

    --

    Previous large change in the cgi module: issue bpo-4953. Pierre helped a lot on this one.

    @PierreQuentel
    Copy link
    Mannequin

    PierreQuentel mannequin commented Aug 1, 2015

    Yes, I will be able to review the patch next week

    2015-07-31 18:13 GMT+02:00 STINNER Victor <report@bugs.python.org>:

    STINNER Victor added the comment:

    @pierre Quentel: Hi! Are you still working on CGI? Can you please review
    this patch? Thanks.

    --

    Previous large change in the cgi module: issue bpo-4953. Pierre helped a lot
    on this one.

    ----------
    nosy: +quentel


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue24764\>


    @PierreQuentel
    Copy link
    Mannequin

    PierreQuentel mannequin commented Aug 7, 2015

    I don't really see why there is a Content-Length in the headers of a
    multipart form data. The specification at
    http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2 doesn't
    mention it, and it is absent in the example that looks like the one tested
    by Peter :

    Content-Type: multipart/form-data; boundary=AaB03x
    --AaB03x
    Content-Disposition: form-data; name="submit-name"

    Larry
    --AaB03x
    Content-Disposition: form-data; name="files"; filename="file1.txt"
    Content-Type: text/plain
    ... contents of file1.txt ...
    --AaB03x--

    In case a user agent would insert it, I think the best would be to
    ignore it. That is, inside read_multi(), after

                headers = parser.close()

    add these lines :

                if 'content-length' in headers:
                    del headers['content-length']

    It's hard to see the potential side effects but I think it's cleaner
    than the proposed patch, which is not correct anyway for another
    reason : the attribute value is set to a bytes objects, instead of a
    string.

    Peter, does this make sense ? If so, can you submit another patch ?

    @peterlandry
    Copy link
    Author

    Yeah, I think that makes the most sense to me as well. I tried to make a minimum-impact patch, but this feels cleaner.

    If we remove the Content-Length header, the limit kwarg might occur at an odd place in the part itself, but that feels unavoidable if someone submits an incorrect Content-Length for the request itself.

    I'll re-work the patch and make sure the tests I added still add value. Thanks!

    @peterlandry
    Copy link
    Author

    A new patch that simply removes Content-Length from part headers when present.

    @PierreQuentel
    Copy link
    Mannequin

    PierreQuentel mannequin commented Aug 8, 2015

    Victor, you can apply the patch and close the issue.
    Le 7 août 2015 17:12, "Peter Landry" <report@bugs.python.org> a écrit :

    Peter Landry added the comment:

    A new patch that simply removes Content-Length from part headers when
    present.

    ----------
    Added file: http://bugs.python.org/file40145/cgi_multipart.patch


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue24764\>


    @vstinner
    Copy link
    Member

    vstinner commented Aug 8, 2015

    Not today. I'm in holiday. Ping me in two weeks or find another core dev.

    @berkerpeksag berkerpeksag added the type-bug An unexpected behavior, bug, or error label Aug 10, 2015
    @pradeep
    Copy link
    Mannequin

    pradeep mannequin commented Aug 17, 2015

    Hi Victor,
    I found similar problem at
    https://bugs.launchpad.net/barbican/+bug/1485452,
    is problem mentioned in the bug is same with mentioned bug?

    @peterlandry
    Copy link
    Author

    Pradeep, that error seems to be in Barbican. This bug and patch only addresses content-length headers in MIME multipart messages.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 18, 2015

    New changeset 11e9f34169d1 by Victor Stinner in branch '3.4':
    cgi.FieldStorage.read_multi ignores Content-Length
    https://hg.python.org/cpython/rev/11e9f34169d1

    New changeset 5b9209e4c3e4 by Victor Stinner in branch '3.5':
    (Merge 3.4) cgi.FieldStorage.read_multi ignores Content-Length
    https://hg.python.org/cpython/rev/5b9209e4c3e4

    New changeset 0ff1acc89cf0 by Victor Stinner in branch 'default':
    (Merge 3.5) cgi.FieldStorage.read_multi ignores Content-Length
    https://hg.python.org/cpython/rev/0ff1acc89cf0

    @vstinner
    Copy link
    Member

    I applied the fix, thanks Peter for the report and the fix, thanks Pierre for the review.

    https://bugs.launchpad.net/barbican/+bug/1485452,
    is problem mentioned in the bug is same with mentioned bug?

    I don't know, but you can try to apply the patch locally if you want, or download the Python 3.4 using Mercurial.

    @bertjwregeer
    Copy link
    Mannequin

    bertjwregeer mannequin commented Nov 18, 2016

    I've uploaded a patchset to bug bpo-27777 that fixes this issue by fixing make_file, and doesn't cause Python to throw out the content-length information.

    It also fixes FieldStorage for when you provide it a non-multipart form submission and there is no list in FS.

    Please see http://bugs.python.org/issue27777

    @ethanfurman ethanfurman self-assigned this Dec 3, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants