Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multipart/form-data encoding #47494

Open
catlee mannequin opened this issue Jun 30, 2008 · 42 comments
Open

multipart/form-data encoding #47494

catlee mannequin opened this issue Jun 30, 2008 · 42 comments
Labels
stdlib Python modules in the Lib dir topic-email type-feature A feature request or enhancement

Comments

@catlee
Copy link
Mannequin

catlee mannequin commented Jun 30, 2008

BPO 3244
Nosy @warsaw, @orsenthil, @pitrou, @catlee, @devdanzin, @merwok, @bitdancer, @vadmium
Dependencies
  • bpo-3243: Support iterable bodies in httplib
  • Files
  • rfc2388.py: Simple implementation of multipart/form-data.
  • http_formdata.patch: Patch against py3k implementing http.formdata
  • http_formdata.patch: New patch with module docstring.
  • http_formdata.patch: New patch
  • Issue3244.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2008-06-30.18:04:19.325>
    labels = ['easy', 'type-feature', 'expert-email']
    title = 'multipart/form-data encoding'
    updated_at = <Date 2016-08-14.07:37:52.023>
    user = 'https://github.com/catlee'

    bugs.python.org fields:

    activity = <Date 2016-08-14.07:37:52.023>
    actor = 'martin.panter'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['email']
    creation = <Date 2008-06-30.18:04:19.325>
    creator = 'catlee'
    dependencies = ['3243']
    files = ['17543', '17546', '17547', '17548', '26146']
    hgrepos = []
    issue_num = 3244
    keywords = ['patch', 'easy']
    message_count = 42.0
    messages = ['69015', '76356', '76357', '81498', '93657', '93666', '97571', '107004', '107005', '107019', '107024', '107025', '107029', '107031', '107032', '107034', '107038', '107041', '107042', '107044', '107045', '107046', '107050', '107056', '107058', '108692', '128608', '128611', '128612', '128613', '128618', '128620', '128621', '141905', '141906', '161511', '161512', '161522', '161526', '163926', '163969', '272655']
    nosy_count = 26.0
    nosy_names = ['barry', 'guettli', 'orsenthil', 'pitrou', 'catlee', 'gotgenes', 'ajaksu2', 'jnoller', 'eric.araujo', 'forest_atq', 'fsteinel', 'r.david.murray', 'shazow', 'bgamari', 'daniel.ugra', 'alexz', 'tamentis', 'checat', 'catalin.iacob', 'Chris.Waigl', 'Johannes.Hoff', 'martin.panter', 'cco3', 'atommixz', 'piotr.dobrogost', 'raylu']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue3244'
    versions = ['Python 3.4']

    @catlee
    Copy link
    Mannequin Author

    catlee mannequin commented Jun 30, 2008

    The standard library should provide a way to encode data using the
    standard multipart/form-data encoding.

    This encoding is required to support file uploads via HTTP POST (or PUT)
    requests.

    Ideally file data could be streamed to the remote server if httplib
    supported iterable request bodies (see issue bpo-3243).

    Mailing list thread:
    http://mail.python.org/pipermail/python-dev/2008-June/080783.html

    @catlee catlee mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jun 30, 2008
    @fsteinel
    Copy link
    Mannequin

    fsteinel mannequin commented Nov 24, 2008

    see http://code.activestate.com/recipes/146306/ for a user contributed
    implementation.

    @catlee
    Copy link
    Mannequin Author

    catlee mannequin commented Nov 24, 2008

    I also wrote some software to handle this:
    http://atlee.ca/software/poster/poster.encode.html

    The reason I wrote this is to avoid having the load the entire file into
    memory before posting the request.

    This, along with Issue bpo-3243, would allow streaming uploads of files via
    HTTP POST.

    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Feb 9, 2009

    So, what is the best way to go about this (beyond docs and tests)? Beat
    the linked recipe into a patch, adapt Chris' implementation?

    @devdanzin devdanzin mannequin added the easy label Feb 12, 2009
    @warsaw warsaw self-assigned this May 4, 2009
    @shazow
    Copy link
    Mannequin

    shazow mannequin commented Oct 6, 2009

    Once upon a time I wrote a library that did some of this among other things:

    http://code.google.com/p/urllib3/

    Or specifically:
    http://code.google.com/p/urllib3/source/browse/trunk/urllib3/filepost.py

    The code was borrowed from some of the recipes mentioned, but cleaned up
    and adjusted a bit. Feel free to use it or borrow from it in any way you
    like.

    @jnoller
    Copy link
    Mannequin

    jnoller mannequin commented Oct 6, 2009

    This request really does need a patch+tests+doc changes - I don't know if
    anyone with +commit has the time to distill the various implementations
    and generate something.

    @bitdancer
    Copy link
    Member

    Daniel suggested marking this as superseding bpo-727898, and I agree. But I want to note that in that issue Barry suggested that it was possible services from the email package could be useful in building this support, and that there might be a better place for it to live than urllib.

    @warsaw warsaw removed their assignment May 5, 2010
    @bitdancer bitdancer self-assigned this May 5, 2010
    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    Hi,

    I believe the attached implementation is reasonable. I'm not sure if it should be called "email.mime.formdata", "rfc2388", etc.

    I'd be happy to attach a proper patch with tests given some quick feedback.

    Thanks,
    Forest

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    Oh, hm, looks like I left a hard-coded name="files" in attach_file. I'll fix that in the patch after I've received any other feedback.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 4, 2010

    You should write your patch against Python 3.x (py3k).

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    I haven't yet touched Python 3.0, and may not have time to dig in at the moment. It wouldn't be suitable to provide a patch against 2.7?

    @pitrou
    Copy link
    Member

    pitrou commented Jun 4, 2010

    I haven't yet touched Python 3.0, and may not have time to dig in at
    the moment. It wouldn't be suitable to provide a patch against 2.7?

    2.7 is almost in release candidate phase, which means it's much too late
    for new features now.

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    Okay, I'll submit against py3k.

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    Should the module be called rfc2388 or should it go into email.mime as formdata? It seems odd to put something HTML/HTTP related into email.mime, but maybe that would be fine. In any case, httplib docs should probably point to this module with an example, right?

    @merwok
    Copy link
    Member

    merwok commented Jun 4, 2010

    I think it belongs in the http package.

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    As http.formdata?

    @merwok
    Copy link
    Member

    merwok commented Jun 4, 2010

    Seems good to me, as long as the module docstring clearly stats whether it’s useful for the client side, the server side or both.

    BTW, isn’t there overlap with cgi.FieldStorage?

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    Hi,

    Patch attached. Let me know what needs fixing.

    I had to fix a bug in email.encoders for my tests to pass. I have not run the full test suite at this point (need to build py3k to do that, maybe I'll have time later today, but if someone else has time, feel free).

    Thanks,
    Forest

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    Éric,

    Sorry, I just read your message.

    I'll post a new patch with a module docstring.

    I believe cgi.FieldStorage is only useful for parsing (i.e. on the server side). MIMEMultipartFormData is for generating multipart/form-data messages (i.e. on the client side).

    Thanks,
    Forest

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    Here's a new patch.

    @merwok
    Copy link
    Member

    merwok commented Jun 4, 2010

    Would you please open another issue for the email fix? Bonus points if you test it on trunk too, since release candidate happens in some days :)

    Do you people think we could unify client and server-side code in the new module (with an alias from cgi for b/w compat), to prevent endless questions?

    Minor remark: I think we don’t have to follow the email naming scheme here. A simpler name like FormData could be just fine.

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    Hm, there is one issue. The example in the docstring wouldn't work.

    You have to get the headers *after* the body, because the boundary isn't generated until the body has been. So this would work:

      body = msg.get_body()
      headers = dict(msg)

    But this won't:

      headers = dict(msg)
      body = msg.get_body()

    I'm not sure what the best way to deal with this is. Maybe instead of get_body we should have get_request_data which returns both headers and body. That would provide simpler semantics.

    Thoughts?

    Thanks,
    Forest

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    New patch:

    • Renames class to FormData.
    • Replaces method get_body with get_request_data to simplify semantics.
    • Drops changes to email.encoders. I'll create a new ticket to deal with that bug. Note that tests here fail without that fix.

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    See bpo-8896 for email.encoders fix.

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 4, 2010

    I don't think Python trunk has the encoders issue, as that is related to the base64 moving to the bytes type.

    @checat
    Copy link
    Mannequin

    checat mannequin commented Jun 26, 2010

    Did you test it against server-side form-data parser implementation? It will be useless if it won't work with most widespread implementations: PHP's and at least some others (consider some popular python web frameworks).

    MIME-compliance is not enough, because browsers send only subset of MIME format and server-side parsers don't expect bodies full of MIME features.

    Particularly, I believe, most implementations don't expect any "Content-Transfer-Encoding", except "binary", because only 8-bit transfers are implemented in browsers, also you should check there will be no line-splitting and header-folding in headers and content, and make sure CR+LF (not plain LF, which is in patch) is always used.

    @bgamari
    Copy link
    Mannequin

    bgamari mannequin commented Feb 15, 2011

    Has there been any progress here?

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Feb 15, 2011

    Hi,

    Sorry for the long delay. I have tested against a Python web application using restish via various WSGI web servers (CherryPy, wsgiref) and I have not seen problems. It may cause problems with other server-side implementations.

    I will not have time to do broad testing against many different server-side implementations. Is there harm in applying the patch and fixing bugs that get reported?

    Thanks,
    Forest

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Feb 15, 2011

    Looks like bgamari and I stepped on each other's requests.

    @bitdancer
    Copy link
    Member

    This patch needs a Doc component.

    Having finally taken a quick look at the RFC (I haven't read it through yet), I think this does belong in email and not http. The RFC makes it clear that while the most common implementation is http, it is designed to be generic, and as such IMO the logical place for it in the stdlib is with the rest of the MIME types, in email. From a usability standpoint, however, it would be more convenient in http, so if most people think it should go into http I won't object.

    @jnoller
    Copy link
    Mannequin

    jnoller mannequin commented Feb 15, 2011

    Yeah, despite what the RFC says, the most common usage is in web clients, and stuffing it in the email module won't be obvious to 95% of the population I think, unless that's where the implementation lives, but we can add a doc stub in the http docs pointing to it and why.

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Feb 15, 2011

    Hi,

    So is the following enough to get this applied? If so, I'm game.

    • Review RFC and enforce Content-Encoding: binary if applicable [checat].
    • Generate CR+LF line endings [checat].
    • Review RFC and address "line-splitting and header-folding" if applicable [checat].
    • Write documentation.

    I can have this done in a week or so, but I'd like to have some confidence that it will be applied if I spend the time on it.

    Thanks,
    Forest

    @bitdancer
    Copy link
    Member

    In principle I think something like this should go in. Since it is a Message subclass, I'd like it to follow the current Message API whether or not it is located in the email package. __str__ and as_string have the right default for line length (no folding). The current default for line endings is \n, and I think the class should stick with that. You can't use __str__ or as_string to generate what you send on the wire if you are supporting binary data.

    I am planning additions to the email API that will make integrating this class and adjusting the generated line endings easier. For the latter (assuming the email-sig approves) I plan a __bytes__ method that will generate "wire format", which would include using \r\n line endings and should be just what you need.

    The current email package does not support the binary content transfer encoding, only 8bit. Support for the binary CTE is another planned addition for 3.3, and I think it can be prioritized ahead of most other features, given that this code needs it.

    So, you might want to wait until the email pieces are in place, and possibly even help me develop them :)

    @johanneshoff
    Copy link
    Mannequin

    johanneshoff mannequin commented Aug 11, 2011

    Forest Bond: Thanks for this patch - I hope it will go in soon. In the meantime, could I get permission to use it as is? (I notice there is a copyright in the file) I would of course keep the attributions in the file.

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Aug 11, 2011

    Hi, Johannes. You can assume the Python license for this patch.

    -Forest

    @bitdancer
    Copy link
    Member

    Forest, could you please submit a contributor agreement?

    http://www.python.org/psf/contrib/

    Life caught up with me and I haven't made enough progress to do anything with this yet, but I still want to.

    @bitdancer bitdancer added topic-email and removed stdlib Python modules in the Lib dir labels May 24, 2012
    @bitdancer bitdancer removed their assignment May 24, 2012
    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented May 24, 2012

    Sure thing. I'll send it via e-mail later today.

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented May 24, 2012

    Okay, Contributor Agreement sent.

    @bitdancer
    Copy link
    Member

    Thanks.

    @orsenthil
    Copy link
    Member

    Thanks for the patch, Forest Bond.

    However, the way I look at this feature, it could be added into urllib.request as a separate handler called MultiPostHandler and request object when it requires it should be able to add it and then use it.
    Here is a first version of this patch, which would give the idea of how it would be added to the urllib.request (Note this is python3 adaptation of PyPi package by name MultipartPostHandler). I shall see to add this in 3.3 and shall include the tests/docs/howto.

    @forestatq
    Copy link
    Mannequin

    forestatq mannequin commented Jun 25, 2012

    Hi Senthil Kumaran,

    Thanks for the feedback & patch.

    I agree having support in urllib probably makes some sense. But why not implement basic support elsewhere and then tie it into urllib so those of us using something else can also use it? I'm using httplib in my application.

    Thanks,
    Forest

    @vadmium
    Copy link
    Member

    vadmium commented Aug 14, 2016

    I think encoding the user’s IP address into the boundary is a bad idea. Forest’s version uses the existing “email” package, which calls random.randrange(sys.maxsize) and searches through the data for conflicts.

    I haven’t really researched this, but I suspect it would be even better to use a CSPRNG like the new “secrets” module, or uuid.uuid4(). Otherwise, perhaps there is the possibility of attacks by predicting the boundary and injecting HTTP headers, splitting up requests, etc via a file upload.

    Both Forest and Senthil’s patches look like they load all the data into memory, so would not be useful for streaming, which was the original request. Hence I am putting this back to “needs patch”. bpo-3243 has been resolved, meaning that we can stream upload data as long as the Content-Length has been pre-calculated. The length could be calculated based from the length of each piece (e.g. file sizes).

    Also, with bpo-12319 (chunked encoding) about to be resolved, if people only need to use HTTP 1.1, it may be easier to upload forms using chunked encoding, where you don’t have to worry about Content-Length.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @hugovk hugovk removed the easy label Sep 8, 2023
    @iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 23, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-email type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants