classification
Title: support encoded filename in Content-Disposition for HTTP in cgi.FieldStorage
Type: enhancement Stage: patch review
Components: email, Library (Lib), Unicode Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Myroslav.Opyr, aclover, barry, demian.brecht, ezio.melotti, martin.panter, pawciobiel, piotr.dobrogost, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-02-10 14:54 by Myroslav.Opyr, last changed 2020-01-08 10:45 by aclover.

Files
File name Uploaded Description Edit
test_cgi.py-v2.7.5-rfc6266_filename.patch Myroslav.Opyr, 2015-02-11 08:53 test revealing the issue
cgi.py-v2.7.5-rfc6266_filename.patch Myroslav.Opyr, 2015-02-11 10:32 rfc6266 powered fix
Pull Requests
URL Status Linked Edit
PR 6027 pawciobiel, 2018-03-22 15:22
Messages (11)
msg235688 - (view) Author: Myroslav Opyr (Myroslav.Opyr) * Date: 2015-02-10 14:54
cgi.FieldStorage has problems parsing the multipart/form-data request with file fields with non-latin filenames. It drops the filename parameter formatted according to RFC6266 [1] (most modern browsers do). There is already python implementation for that RFC in rfc6266 module [2].

Ref:
 [1] https://tools.ietf.org/html/rfc6266
 [2] https://pypi.python.org/pypi/rfc6266
msg235732 - (view) Author: Myroslav Opyr (Myroslav.Opyr) * Date: 2015-02-11 08:53
In test_cgi.py-v2.7.5-rfc6266_filename.patch there is a patch to test_cgi.py (Python 2.7.5) that reveals the issue.
msg235734 - (view) Author: Myroslav Opyr (Myroslav.Opyr) * Date: 2015-02-11 10:32
As a proof of concept there is fix for the issue powered by rfc6266 library[1]. See cgi.py-v2.7.5-rfc6266_filename.patch

References:
 [1] https://pypi.python.org/pypi/rfc6266
msg235909 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-02-13 18:39
Since that library is not part of the stdlib, this is not an appropriate patch for CPython.

Note that this issue is also relevant to the email library, which intends to support RFC2616 header parsing/generation, and therefore should also be enhanced to support RFC 6266.
msg236101 - (view) Author: Myroslav Opyr (Myroslav.Opyr) * Date: 2015-02-16 12:52
Hi David,

According to "Test Cases for HTTP Content-Disposition header field" overview [1], this is not about email headers, but only about HTTP headers. It look like email standards and http standars are different in this area.

I do know that my patch is poor. It is just proof of concept, to show that there is an issue in stdlib and one of the possible fast patches to get functionality needed.

Regards,

Myroslav

Ref:
 [1] http://greenbytes.de/tech/tc2231/
msg236365 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-02-21 14:47
I know it is called the 'email' package, but the intent is to support http header parsing as well (cf email.policy.HTTP).
msg314265 - (view) Author: PaweĊ‚ (pawciobiel) * Date: 2018-03-22 15:22
I didn't find this and created a duplicate
https://bugs.python.org/issue33027

I've added similar/updated changes
https://github.com/python/cpython/pull/6027

@r.david.murray wouldn't it be wise to do one step at a time rather than implementing full support for RFC6266? Please tell exactly what is your expectations so I can fix the patch if it needs to be fixed.

This is also related to RFC5987
https://tools.ietf.org/html/rfc5987
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
msg314297 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-03-23 02:26
I haven't read the http rfcs, but my understanding is that they follow the MIME standards, and the email library already has code to do proper parsing and decoding of encoded filenames in Content-Disposition headers.  It should be possible to call that code for this use case (the http libraries already depend on the email libraries, although I'm not sure if cgi itself does currently).  There may be additional considerations involved in fully supporting the http RFCs, but to determine that someone will need to read both and understand them, which is not a small undertaking :)

In the meantime, I'm pretty sure that using the existing mime header parsing code in the email library (see email.headerregistry) will provide better parsing than the only-handles-simple-cases heuristic in your PR.  Granted, I don't think you have to deal with multi-part headers in http, but I vaguely remember that there are other subtleties not handled by a simple split on '.
msg359224 - (view) Author: And Clover (aclover) * Date: 2020-01-02 23:23
HTTP generally isn't an RFC 822-family standard. Its headers look a lot like it, but they have their own defined syntax that differs in niggling little details. Using mail parsing code for HTTP isn't usually the right thing.

HTTP has always used its own syntax definitions for the headers on the main request/response entities, but it has traditionally partially deferred to RFC 822-family specs for the definitions of structured entity bodies. This is moot, however, as the reality of what browsers support has rarely coincided with those specs.

Nowadays HTML5.2 explicitly defers to RFC 7578 for definition of multipart/form-data headers. (This RFC is a replacement for the vague and broken RFC 2388.) As is to be expected for an HTML5-related spec, RFC 7578 shrugs and documents existing browser behaviour [section 4.2]:

- some browsers do UTF-8
- some browsers do data mangling (IE's %-encoding sadness)
- some browsers might do something else

but it explicitly rules out the solution proposed here:

"The encoding method described in [RFC5987], which would add a 'filename*' parameter to the Content-Disposition header field, MUST NOT be used."

The introductions of both RFC 5987 and RFC 6266 explicitly exclude multipart/form-data headers from their remit.

So in summary:

- we shouldn't do anything
- the situation with submitted filenames will continue to be broken for everyone indefinitely
msg359533 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2020-01-07 18:40
Are you saying there is no (http) RFC compliant way to fix this, or no way to fix it with the email library parsers?  If the latter, the library is pretty flexible and for internal stdlib use it would probably be permissible to directly call methods in the internal parsing module, if those would be useful.

I haven't re-read the issue to reload my brain, so this question may be off point (except for the first clause of the question).
msg359577 - (view) Author: And Clover (aclover) * Date: 2020-01-08 10:45
> Are you saying there is no (http) RFC compliant way to fix this

Sadly, yes.

And though RFCs aren't always a fair representation of real-world use, RFC 7578 is informative as well as normative: at present nothing produces "filename*=" in multipart/form-data.
History
Date User Action Args
2020-03-22 06:01:40martin.panterlinkissue26527 superseder
2020-01-08 10:45:02acloversetmessages: + msg359577
2020-01-07 21:23:15vstinnersetnosy: - vstinner
2020-01-07 18:40:10r.david.murraysetmessages: + msg359533
2020-01-02 23:23:52acloversetnosy: + aclover
messages: + msg359224
2018-03-23 02:26:43r.david.murraysetmessages: + msg314297
2018-03-22 15:22:11pawciobielset
components: + Unicode
versions: + Python 2.7, Python 3.4, Python 3.5, Python 3.7, Python 3.8
nosy: + ezio.melotti, pawciobiel, vstinner
title: RFC6266 support (Content-Disposition for HTTP) -> support encoded filename in Content-Disposition for HTTP in cgi.FieldStorage
messages: + msg314265
stage: patch review
pull_requests: + pull_request5937
2016-04-09 12:13:29piotr.dobrogostsetnosy: + piotr.dobrogost
2015-02-21 14:47:13r.david.murraysetmessages: + msg236365
2015-02-16 12:52:23Myroslav.Opyrsetmessages: + msg236101
2015-02-13 18:39:28r.david.murraysetversions: - Python 2.7
nosy: + r.david.murray, barry

messages: + msg235909

components: + email
2015-02-11 10:50:29Myroslav.Opyrsetnosy: + serhiy.storchaka
2015-02-11 10:32:26Myroslav.Opyrsetfiles: + cgi.py-v2.7.5-rfc6266_filename.patch

messages: + msg235734
2015-02-11 08:53:52Myroslav.Opyrsetfiles: + test_cgi.py-v2.7.5-rfc6266_filename.patch
keywords: + patch
messages: + msg235732
2015-02-10 21:15:24martin.pantersetnosy: + martin.panter

title: RFC6266 support -> RFC6266 support (Content-Disposition for HTTP)
2015-02-10 18:59:15demian.brechtsetnosy: + demian.brecht
2015-02-10 14:54:47Myroslav.Opyrcreate