Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgi module cannot handle POST with multipart/form-data in 3.x #49203

Closed
oopos mannequin opened this issue Jan 15, 2009 · 130 comments
Closed

cgi module cannot handle POST with multipart/form-data in 3.x #49203

oopos mannequin opened this issue Jan 15, 2009 · 130 comments
Labels
stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@oopos
Copy link
Mannequin

oopos mannequin commented Jan 15, 2009

BPO 4953
Nosy @warsaw, @birkenfeld, @amauryfa, @vstinner, @merwok, @bitdancer, @florentx, @PierreQuentel
Files
  • full_source_and_error.zip: full source and error show
  • opsuper.pl: Perl code for process post files from form
  • unittest.zip: cgi.FieldStorage unittest for Multipart form-data and associated files.
  • http.zip: Module cgi_new.py and tests
  • cgitest-python3.py
  • cgi_tests.zip: cgi_test.py and associated files
  • adder.html: uses get, works
  • adderpost.html: just hanged get to post, makes adder.cgi hang
  • localCGIServer.py: my wrapper around http.server to handle localhost
  • adder.cgi: action script used by adder.html. adderpost.html, hangs with adderpost.html
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-01-14.13:12:27.362>
    created_at = <Date 2009-01-15.09:02:45.958>
    labels = ['type-bug', 'library', 'expert-unicode']
    title = 'cgi module cannot handle POST with multipart/form-data in 3.x'
    updated_at = <Date 2011-01-15.01:19:29.678>
    user = 'https://bugs.python.org/oopos'

    bugs.python.org fields:

    activity = <Date 2011-01-15.01:19:29.678>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-01-14.13:12:27.362>
    closer = 'vstinner'
    components = ['Library (Lib)', 'Unicode']
    creation = <Date 2009-01-15.09:02:45.958>
    creator = 'oopos'
    dependencies = []
    files = ['12753', '12768', '14235', '20217', '20270', '20385', '20393', '20394', '20395', '20396']
    hgrepos = []
    issue_num = 4953
    keywords = ['patch']
    message_count = 130.0
    messages = ['79892', '79893', '79894', '79896', '79898', '79901', '79939', '79981', '80000', '88960', '88962', '89112', '91444', '91449', '91711', '91741', '95288', '95292', '95330', '107959', '108221', '121864', '125035', '125065', '125086', '125088', '125100', '125105', '125106', '125108', '125114', '125152', '125153', '125158', '125159', '125178', '125181', '125201', '125217', '125235', '125237', '125241', '125402', '125403', '125410', '125419', '125426', '125428', '125474', '125481', '125483', '125501', '125524', '125533', '125543', '125546', '125556', '125557', '125558', '125563', '125570', '125583', '125629', '125633', '125637', '125698', '125839', '125840', '125885', '125886', '125892', '125893', '125900', '125901', '125921', '125926', '125928', '125930', '125931', '125935', '125952', '125992', '125993', '125994', '126035', '126060', '126062', '126065', '126066', '126075', '126117', '126124', '126140', '126145', '126152', '126160', '126161', '126162', '126164', '126165', '126167', '126173', '126175', '126199', '126205', '126207', '126210', '126212', '126214', '126215', '126219', '126222', '126232', '126233', '126242', '126249', '126251', '126253', '126256', '126257', '126262', '126264', '126266', '126267', '126268', '126269', '126289', '126293', '126301', '126307']
    nosy_count = 17.0
    nosy_names = ['barry', 'georg.brandl', 'amaury.forgeotdarc', 'ggenellina', 'vstinner', 'andyharrington', 'eric.araujo', 'grahamd', 'v+python', 'r.david.murray', 'oopos', 'tcourbon', 'tobias', 'flox', 'pebbe', 'quentel', 'erob']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue4953'
    versions = ['Python 3.2']

    @oopos
    Copy link
    Mannequin Author

    oopos mannequin commented Jan 15, 2009

    Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit
    (Intel)] on win32
    ---------------------------------------------
    Hi,I user Python!

    UnicodeDecodeError Python 3.0: G:\cgi\python\python.exe
    Thu Jan 15 16:46:34 2009

    A problem occurred in a Python script. Here is the sequence of function
    calls leading up to the error, in the order they occurred.
    G:\webserver\xampp\cgi-bin\testupload.py in ()
    107
    108 # get form
    109 opsform = cgi.FieldStorage()
    110
    111 print ("<br>","form-data:","<br>",opsform,"<br>")
    opsform undefined, cgi = <module 'cgi' from 'G:\cgi\python\lib\cgi.py'>,
    cgi.FieldStorage = <class 'cgi.FieldStorage'>
    G:\cgi\python\lib\cgi.py in __init__(self=FieldStorage(None, None, []),
    fp=None, headers={'content-length': '671631', 'content-type':
    'multipart/form-data;
    boundary=---------------------------9699301019407'}, outerboundary='',
    environ=<os._Environ object at 0x00C90C50>, keep_blank_values=0,
    strict_parsing=0)
    477 self.read_urlencoded()
    478 elif ctype[:10] == 'multipart/':
    479 self.read_multi(environ, keep_blank_values,
    strict_parsing)
    480 else:
    481 self.read_single()
    self = FieldStorage(None, None, []), self.read_multi = <bound method
    FieldStorage.read_multi of FieldStorage(None, None, [])>, environ =
    <os._Environ object at 0x00C90C50>, keep_blank_values = 0,
    strict_parsing = 0
    G:\cgi\python\lib\cgi.py in read_multi(self=FieldStorage(None, None,
    []), environ=<os._Environ object at 0x00C90C50>, keep_blank_values=0,
    strict_parsing=0)
    597 # Create bogus content-type header for proper multipart
    parsing
    598 parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' %
    (self.type, ib))
    599 parser.feed(self.fp.read())
    600 full_msg = parser.close()
    601 # Get subparts
    parser = <email.feedparser.FeedParser object at 0x00DD5650>, parser.feed
    = <bound method FeedParser.feed of <email.feedparser.FeedParser object
    at 0x00DD5650>>, self = FieldStorage(None, None, []), self.fp =
    <io.TextIOWrapper object at 0x00BE3FB0>, self.fp.read = <bound method
    TextIOWrapper.read of <io.TextIOWrapper object at 0x00BE3FB0>>
    G:\cgi\python\lib\io.py in read(self=<io.TextIOWrapper object at
    0x00BE3FB0>, n=-1)
    1722 # Read everything.
    1723 result = (self._get_decoded_chars() +
    1724 decoder.decode(self.buffer.read(), final=True))
    1725 self._set_decoded_chars('')
    1726 self._snapshot = None
    decoder = <encodings.gbk.IncrementalDecoder object at 0x00DB7AB0>,
    decoder.decode = <built-in method decode of IncrementalDecoder object at
    0x00DB7AB0>, self = <io.TextIOWrapper object at 0x00BE3FB0>, self.buffer
    = <io.BufferedReader object at 0x00BE3F90>, self.buffer.read = <bound
    method BufferedReader.read of <io.BufferedReader object at 0x00BE3F90>>,
    final undefined

    UnicodeDecodeError: 'gbk' codec can't decode bytes in position 157-158:
    illegal multibyte sequence
    args = ('gbk',
    b'-----------------------------9699301019407\r\n...-----------------------------9699301019407--\r\n',
    157, 159, 'illegal multibyte sequence')
    encoding = 'gbk'
    end = 159
    object =
    b'-----------------------------9699301019407\r\n...-----------------------------9699301019407--\r\n'
    reason = 'illegal multibyte sequence'
    start = 157
    with_traceback = <built-in method with_traceback of
    UnicodeDecodeError object at 0x00DB7BF0>

    l:\tmp_dir\tmpxyeojf.html contains the description of this error.

    -------------------------------------------
    Hi,
    I am newbie for python under the windows.
    I find that the cgi module always work wrong for the binary files to upload.
    I find that it cannot auto to discern the files' mode and alway use the
    default mode : 'TEXT'.
    So I want to change the sys.stdin 's mode to BINARY to support the
    binary files.
    I got this way:
    import msvcrt,os
    msvcrt.setmode(0,os.O_BINARY) # for stdin
    msvcrt.setmode(1,os.O_BINARY) # for stdout
    but it isn't work,too.
    I know use C progam language can use this function:
    freopen("somefilename","mode","stdin or stdout") to redirect the file
    flow.
    Can every one help me ?

    Best Regards
    oopos

    I

    @oopos oopos mannequin added performance Performance or resource usage topic-unicode labels Jan 15, 2009
    @amauryfa
    Copy link
    Member

    Does it work if you change your script like this:
    opsform = cgi.FieldStorage(open(sys.stdin.fileno(), 'rb'))

    @oopos
    Copy link
    Mannequin Author

    oopos mannequin commented Jan 15, 2009

    To Amaury Forgeot d'Arc :

    Thank you.

    That error have sloved with your way:
    [quote]Does it work if you change your script like this:
    opsform = cgi.FieldStorage(open(sys.stdin.fileno(), 'rb'))[/quote]

    Now,The new problem come out:
    [code] 97 """Push some new data into this object."""

    98 # Handle any previous leftovers

    99 data, self._partial = self._partial + data, ''

    100 # Crack into lines, but preserve the newlines on the end
    of each

    101 parts = NLCRE_crack.split(data)

    data = b'-----------------------------7d91f41a302f4
    \nCo...\x0e\x0f\x0c\x10\x17\x14\x18\x18\x17\x14\x16\x16', self =
    <email.feedparser.BufferedSubFile object at 0x00DD5270>, self._partial
    = ''

    TypeError: Can't convert 'bytes' object to str implicitly
    [/code]

    I find that the CGI LIB didn't use bytes flow, it always use string
    flow.

    More info in the attch file:

    @amauryfa
    Copy link
    Member

    OK, another try. Please replace the previous version with these 3 lines:

      encoding = os.environ.get('HTTP_TRANSFER_ENCODING')
      stdin = open(sys.stdin.fileno(), 'r', encoding=encoding)
      opsform = cgi.FieldStorage(stdin)

    @oopos
    Copy link
    Mannequin Author

    oopos mannequin commented Jan 15, 2009

    Thank you for time.
    Now,I try with you saied.Bu it is taken wrong as before.

    See the files:

    @amauryfa
    Copy link
    Member

    Thanks for the test case. I reproduced it easily.
    There is indeed a real problem in CGI streams.

    The first thing to do is to start python with the -u option (add it to
    the end of the first #! line), so that stdin yields bytes instead of
    unicode chars, and \r\n are not translated on Windows.

    Even then, I noticed that in the multipart/form-data section, text
    fields are utf-8 encoded, but the file content is raw binary.
    (FWIW, I use Firefox and Apache on Windows)
    No encoding seems to be specified, neither in the content, nor in the
    environment (no HTTP_TRANSFER_ENCODING)

    And of course, the email.parser.FeedParser object used to parse it
    accepts only unicode, not bytes.
    Help needed.

    @ggenellina
    Copy link
    Mannequin

    ggenellina mannequin commented Jan 16, 2009

    An attempt to more accurately describe the issue, to attract more
    knowledgeable people, I hope...

    @ggenellina ggenellina mannequin added the stdlib Python modules in the Lib dir label Jan 16, 2009
    @ggenellina ggenellina mannequin changed the title Cannot upload binary file from form ? cgi module cannot handle POST with multipart/form-data in 3.0 Jan 16, 2009
    @ggenellina ggenellina mannequin added type-bug An unexpected behavior, bug, or error and removed performance Performance or resource usage labels Jan 16, 2009
    @oopos
    Copy link
    Mannequin Author

    oopos mannequin commented Jan 16, 2009

    Hehe.
    I only want to use python to slove the upload files instead of PHP.but
    it is hard to me or I have no much time to leran it.
    Now,I learn Perl quickly and use it upload files, it works ok.

    Thank you . I will take more time to learn Python language well.
    Best Regards!

    This is my code: (Perl Language)

    @ggenellina
    Copy link
    Mannequin

    ggenellina mannequin commented Jan 17, 2009

    You should stick to Python 2.6 (or even 2.5) for web programming - 3.0
    is not mature enough. I thought this was a feasibility study on porting
    an existing application to Python 3.0 -- not your first steps in the
    language.

    @tercero12
    Copy link
    Mannequin

    tercero12 mannequin commented Jun 5, 2009

    I'm working on a web framework for Python 3. Naturally this is a
    blocker for me. I was kinda expecting this to be addressed in 3.1 but
    now that rc1 is out and I don't see anything about it, I'm wondering
    about the status of this bug. Can we get a status update?

    @bitdancer
    Copy link
    Member

    Can you provide a test case that clearly demonstrates the problem
    (preferably a unit test, but anything easily reproducible will do)? I'm
    not sure what to do with the code attached to the case.

    @tercero12
    Copy link
    Mannequin

    tercero12 mannequin commented Jun 8, 2009

    I've attached unittest.zip. Simply unzip it to a directory and run it.
    I've included a Python2.x version of the unittest for the sake of
    clarity. The 2.x version was developed on 2.4. The 3.x version was
    developed on 3.0.1 and 3.1rc1 (with identical results).

    It seems that there are several issues with cgi.FieldStorage and
    multi-part form data.

    • Does Formstation read in a Bytes or String?
      -- It seems to expect a String but this yields invalid results for
      uploading files.
      -- A stream of Bytes would make more sense but loses it Pythonic
      "Batteries included" nature if the user has to decode the encoding
      manually for each form field.

    @tobias
    Copy link
    Mannequin

    tobias mannequin commented Aug 10, 2009

    Actually, I think this whole issue is more complex. For example,
    consider a (fictious) CGI script where users can upload an image and a
    description and the script sends a success/error message in return.

    In this case, one has to:

    • read the HTTP request header from stdin as US-ASCII
    • read the image from stdin as raw binary data
    • read the description from stdin as a string in some encoding
    • write the HTTP response header to stdout as US-ASCII
    • write the response message to stdout in some (other) encoding
    • write error messages to server log via stderr as US-ASCII
      Also, there are cases when a cgi script should return binary data
      instead (e.g., images or archive files) or apply a transfer encoding
      (e.g., gzip).

    Although FieldStorage only cares about reading, it still has to cope
    with intermixed textual and binary data. So the only practical way in my
    opinion is to use raw binary data and have FieldStorage decode strings
    on demand, since only the programmer knows whether a field should
    contain text or binary data. FieldStorage should offer two methods for
    this purpose: one for reading binary data and another for reading and
    decoding strings on-the-fly (possibly using a default encoding passed to
    its constructor).

    @tercero12
    Copy link
    Mannequin

    tercero12 mannequin commented Aug 10, 2009

    I think you hit the nail on the head. Now we just need (someone) to
    code it.

    @tercero12
    Copy link
    Mannequin

    tercero12 mannequin commented Aug 18, 2009

    I thought I'd take a crack at this today. I soon figured out the real
    issue. It is the email.parser module that handles the decoding of
    Multipart/form-data things...and it is also still quote broken w.r.t.
    handling Bytes. So this issue is dependent on
    http://bugs.python.org/issue4661 before it can be fixed.

    @warsaw
    Copy link
    Member

    warsaw commented Aug 19, 2009

    Please, please, please contact the email-sig and help pitch in. For
    many reasons I simply haven't had the cycles to work on this and I don't
    see that happening any time soon. There are folks willing to work on
    the package in the email-sig and I will add my $0.02 with design
    suggestions, but we really need manpower and motivation to get the email
    package into shape.

    @tcourbon
    Copy link
    Mannequin

    tcourbon mannequin commented Nov 15, 2009

    *bump*

    Hi there Pythoners !

    As Timothy Farrell I'm currently working (or rather, toying since it
    just for fun) on a Python3 web framework. I just started but when it
    come to file upload I experience issues which I believe are connected to
    that issue.

    Just want if their was any progress since last messages or I should jump
    in and try writing my own multipart parser (as I doubt I have the
    required skill to contribute to the email parser).

    @tercero12
    Copy link
    Mannequin

    tercero12 mannequin commented Nov 15, 2009

    Perhaps this update should go in the linked email bug. The email team
    has a goal of fixing the email module in time for the 3.2 release. I
    also, feel as though I lack the skill to fix the email module, but it
    goes beyond that since they're potentially having to change some
    interfaces. See this document:
    http://wiki.python.org/moin/Email%20SIG/DesignThoughts

    Once the email module is fixed, the cgi module will be trivial to fix.
    I'm confident enough to handle it. I don't think anyone can give you a
    date so you'll have to make the custom solution decision based on your
    timeframe and patience.

    @tcourbon
    Copy link
    Mannequin

    tcourbon mannequin commented Nov 16, 2009

    It seems that there wasn't work on that issue (which look complicated by
    the way). I'll wait, there is so much other aspects of a web framework
    to play with :)
    Thank anyway for the pointer.

    @gvanrossum
    Copy link
    Member

    Could this be related to bpo-8077?

    @tercero12
    Copy link
    Mannequin

    tercero12 mannequin commented Jun 20, 2010

    Yes, they are related but not quite the same. The solution to this problem will likely include a fix to the problem described in bpo-8077.

    @vpython
    Copy link
    Mannequin

    vpython mannequin commented Nov 21, 2010

    Regarding http://bugs.python.org/issue4953#msg91444 POST with multipart/form-data encoding can use UTF-8, other stuff is restricted to ASCII!

    From http://www.w3.org/TR/html401/interact/forms.html:
    Note. The "get" method restricts form data set values to ASCII characters. Only the "post" method (with enctype="multipart/form-data") is specified to cover the entire [ISO10646] character set.

    Hence cgi formdata can safely decode text fields using UTF-8 decoding (experimentally, that is the encoding used by Firefox to support the entire ISO10646 character set).

    @PierreQuentel
    Copy link
    Mannequin

    PierreQuentel mannequin commented Jan 2, 2011

    Hi,

    I have started working on the port of a simplified version of Karrigell (a web framework) to Python3. I experienced the same problem as the other posters : in the current version, file upload doesn't work. So I've been working on the cgi module for a few days and now have a version which correctly manages file uploads in the tests I made

    The problem in the current version (3.2b2) is that all data is read from sys.stdin, which reads strings, not bytes. This obviously can't work properly to upload binary files. In the proposed version, for multipart/form-data type, all data is read as bytes from sys.stdin.buffer ; in the CGI script, the Python interpreter must be launched with the -u option, as suggested by Amaury, otherwise sys.stdin.buffer.read() only returns the beginning of the data stream

    The headers inside the multipart/form-data are decoded to a string using sys.stdin.encoding and passed to a FeedParser (which requires strings) ; then the data is read from sys.stdin.buffer (bytes) until a boundary is found

    If the field is a file, the file object in self.file stores bytes, and the attribute "value" is a byte string. If it is not a file, the value is decoded to a string, always using sys.stdin.encoding, as for all other fields for other types of forms

    Other cosmetic changes :

    • replaced "while 1" by "while True"
    • replaced "if type(value) == type([])" by "if isintance(value,list)"

    Attached file : zip with cgi_new.py and tests in a folder called "http"
    Tested with Python 3.2b2 (r32b2:87398, Dec 19 2010, 22:51:00) [MSC v.1500 32 bit (Intel)] on win32 ; files and CGI scripts served by Apache 2.2

    @bitdancer
    Copy link
    Member

    Thank you very much for working on this! I'll try to take a look at the patch soon. A couple quick comments based on your posting: first, the email module now has a BytesFeedparser that will accept a byte stream, which I hope might simplify your patch. Second, it would be very helpful if you could upload your patch as an 'svn diff' against the current py3k trunk (see python.org/dev for details on how to do that). That will make review and application of the patch much much simpler. (This would be true even if more of the code in cgi.py has changed than not.) If you don't want to set up an svn checkout, then a context diff against the copy of cgi.py you started with would be second best. Please post any files individually as .patch or .diff or .txt files...these are preferred in the tracker over .zip files because they can be viewed without downloading.

    @PierreQuentel
    Copy link
    Mannequin

    PierreQuentel mannequin commented Jan 13, 2011

    Comment ça, no up to date patch ? cgi_32.patch is up to date, the API changes are documented, the unittests work, what else do you want ?

    @pitrou
    Copy link
    Member

    pitrou commented Jan 13, 2011

    Comment ça, no up to date patch ? cgi_32.patch is up to date, the API
    changes are documented, the unittests work, what else do you want ?

    The O_BINARY stuff looks obsolete to me.

    @vpython
    Copy link
    Mannequin

    vpython mannequin commented Jan 13, 2011

    The O_BINARY stuff was probably necessary because bpo-10841 is not yet in the build Pierre was using? I agree it in not necessary with the fix for that issue, but neither does it hurt.

    It could be stripped out, if you think that is best, Antoine.

    But there is a working patch.

    @merwok
    Copy link
    Member

    merwok commented Jan 13, 2011

    Can one person please

    1. Sum up the discussion and its outcome briefly

    2. Remove all patches and replace them with one diff with docs, tests and code (even if you have new files, you don’t have to put them in a zip, use svn add and they will show up in the svn diff, which is really easier to review)

    @erob
    Copy link
    Mannequin

    erob mannequin commented Jan 13, 2011

    +1

    @erob erob mannequin changed the title cgi module cannot handle POST with multipart/form-data in 3.x cgi module cannot handle POST with multipart/form-data in 3.x Jan 13, 2011
    @vstinner
    Copy link
    Member

    I tested cgi_32.patch on Windows with Apache:

    • a test with a binary file works: I get a binary file instead of a text file
    • a test with a non-ASCII character (a\xe9b) works: the text is correctly decoded

    I used the test script from full_source_and_error.zip.

    Comments on cgi_32.patch:

     - I don't understand why FieldStorage changes sys.stdout and sys.stderr (see remarks about IOMix above): please remove the charset argument (it is also confusing to have two encoding arguments). it should be done somewhere else
     - please remove the O_BINARY hack: the patch is for Python 3.2 and I closed issue python/issues-test-cpython#10841. If you would like a backport, another patch should be written later
     - "encoding = 'latin-1' # ?": write a real comment or remove it
     - 'self.fp.read(...) # bytes': you should add a test on the type if you are not sure that fp.read() gives bytes
     - "file: the file(-like) object from which you can read the data *as bytes*": you should mention that TextIOWrapper is also tolerated (accepted?)
     - you may set fp directly to sys.stdin.buffer (instead of sys.stdin) if fp is None (it will be easier after removing the O_BINARY thing)
     - the patch adds a tab in an empty line, please don't do that :-)
     - you should add a (private?) attribute to FieldStorage to decide if it works on bytes or unicode, instead of using "self.filename is not None" test (eg. self._use_bytes = (self.filename is not None)
     - i don't like the idea of having a generic self.__write() method supporting bytes and unicode. it would prefer two methods, eg. self.__write_text() and self.__write_binary() (they can share a third private method)
     - i don't like "stream_encoding" name: what is the "stream" here? do you process a "file", a "string" or a "stream"? why not just "self.encoding"?
     - "import email.parser,email.feedparser" one import is useless here. I prefer "from email.feedparser import FeedParser" because you get directly a ImportError if the symbol is missing. And it's already faster to get FeedParser instead of email.feedparser.FeedParser in a loop (dummy micro-optimization)
     - even I like the following change, please do it in a separated patch:
    -            if type(value) is type([]):
    +            if isinstance(value,list):

    I really don't like the IOMix thing:

    • sys.stdout.write() should not accept bytes
    • FieldStorage should not replace sys.stdout and sys.stderr: if you want to set the encoding of these files, set PYTHONIOENCODING environment variable before running your program (it changes also the encoding of sys.stdio)
    • IOMix should not accept bytes *and* unicode. It's better to have an explicit API like stdout.write('unicode') and stdout.buffer.write(b'bytes)

    Most parts of the patch are correct and fix real bugs. Since cgi is broken currently (eg. it doesn't handle binary files correctly), anything making the situation better would be nice.

    I vote +0 to commit the patch now (if the release manager agrees), and +1 if all of my remarks are fixed.

    @vpython
    Copy link
    Mannequin

    vpython mannequin commented Jan 14, 2011

    Victor, thanks for your comments, and interest in this bug. Other than the existence of the charset parameter, and whether or not to include IOMix, I think all of the others could be fixed later, and do not hurt at present. So I will just comment on those two comments.

    I would prefer to see FieldStorage not have the charset attribute also, but I don't have the practice to produce an alternate patch, and I can see that it would be a convenience for some CGI scripts to specify that parameter, and have one API call do all the work necessary to adjust the IO streams, and read all the parameters, and then the rest of the logic of the web app can follow. Personally, I adjust the stdout/stderr streams earlier in my scripts, and only optionally call FieldStorage, if I determine the request needs such.

    I've been using IOMix for some months (I have a version for both Python 2 and 3), and it solves a real problem in generating web page data streams... the data stream should be bytes, but a lot of the data is manipulated using str, which would then need to be decoded. The default encoding of stdout is usually wrong, so must somehow be changed. And when you have chunks of bytes (in my experience usually from a database or file) to copy to the output stream, if your prior write was str, and then you write bytes to sys.stdout.binary, you have to also remember to flush the TextIOBuffer first. IOMix provides a convenient solution to all these problems, doing the flushing for you automatically, and just taking what comes and doing the right thing. If I hadn't already invented IOMix to help write web pages, I would want to :)

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Jan 14, 2011

    FWIW, keep in mind that cgi.FieldStorage is also quite often used in WSGI scripts in arbitrary WSGI servers which have got nothing to do with CGI. Having cgi.FieldStorage muck around with stdout/stderr under WSGI, even where using a CGI/WSGI bridge, would potentially be a bad thing to do, especially in embedded systems like mod_wsgi where sys.stdout and sys.stderr are replaced with file like objects that map onto Apache error logging. Even in non embedded systems, you could very well screw up any application logging done via stdout/stderr and break the application.

    So, the default or common code paths should never play with sys.stdout or sys.stderr. It is already a PITA that the implementation falls back to using sys.argv when QUERY_STRING isn't defined which also could produce strange results under a WSGI server. In other words, please don't go adding any more code which makes the wrong assumption that this is only used in CGI scripts.

    @vpython
    Copy link
    Mannequin

    vpython mannequin commented Jan 14, 2011

    Graham, Thanks for your comments. Fortunately, if the new charset parameter is not supplied, no mucking with stdout or stderr is done, which is the only reason I cannot argue strongly against the feature, which I would have implemented as a separate API... it doesn't get in the way if you don't use it.

    I would be happy to see the argv code removed, but it has been there longer than I have been a Python user, so I just live with it ... and don't pass arguments to my CGI scripts anyway. I've assumed that is some sort of a debug feature, but I also saw some code in the HTTPCGIServer and http.server that apparently, on some platforms, actually do pass parameters to CGI on the command lines. I would be happy to see that code removed too, but it also predates my Python experience. And no signs of "if debug:" by either of them!

    @PierreQuentel
    Copy link
    Mannequin

    PierreQuentel mannequin commented Jan 14, 2011

    @victor

    Thanks for the comments

    "- I don't understand why FieldStorage changes sys.stdout and sys.stderr (see remarks about IOMix above): please remove the charset argument (it is also confusing to have two encoding arguments). it should be done somewhere else"

    done

    "please remove the O_BINARY hack: the patch is for Python 3.2 and I closed issue bpo-10841. If you would like a backport, another patch should be written later"

    done

    ""encoding = 'latin-1' # ?": write a real comment or remove it"

    I removed this part

    "'self.fp.read(...) # bytes': you should add a test on the type if you are not sure that fp.read() gives bytes"

    added tests in read_urlencoded(), read_multi() and read_binary()

    "file: the file(-like) object from which you can read the data *as bytes*": you should mention that TextIOWrapper is also tolerated (accepted?)"

    not done : here "file" is not the argument passed to the FieldStorage constructor, but the attribute of values returned from calls to FieldStorage. In the new implementation, its read() method always returns bytes

    "you may set fp directly to sys.stdin.buffer (instead of sys.stdin) if fp is None (it will be easier after removing the O_BINARY thing)"

    done

    "the patch adds a tab in an empty line, please don't do that :-)"

    done (hopefully :-)

    "you should add a (private?) attribute to FieldStorage to decide if it works on bytes or unicode, instead of using "self.filename is not None" test (eg. self._use_bytes = (self.filename is not None)"

    done

    "i don't like the idea of having a generic self.__write() method supporting bytes and unicode. it would prefer two methods, eg. self.__write_text() and self.__write_binary() (they can share a third private method)"

    not done, the argument of __write is always bytes

    "i don't like "stream_encoding" name: what is the "stream" here? do you process a "file", a "string" or a "stream"? why not just "self.encoding"?"

    done

    • "import email.parser,email.feedparser" one import is useless here. I prefer "from email.feedparser import FeedParser" because you get directly a ImportError if the symbol is missing. And it's already faster to get FeedParser instead of email.feedparser.FeedParser in a loop (dummy micro-optimization)

    done

    " even I like the following change, please do it in a separated patch:

    •        if type(value) is type([]):
      

    + if isinstance(value,list):"

    not done

    "I really don't like the IOMix thing:"

    removed

    "I vote +0 to commit the patch now (if the release manager agrees), and +1 if all of my remarks are fixed."

    should be close to +0.8 now ;-)

    @erob
    Copy link
    Mannequin

    erob mannequin commented Jan 14, 2011

    +1

    thanks for this input. I agree for the most part. However if the io
    semantics in python 3 is radically different than on python 2, I could
    have expected that WSGI scripts would similarly depend on a newer
    type of file descriptor access using the sys module.

    cheers!

    @vpython
    Copy link
    Mannequin

    vpython mannequin commented Jan 14, 2011

    Pierre, Thank you for the new patch, with the philosophy of "it's broke, so let's produce something the committers like to get it fixed".

    I see you overlooked removing the second use of O_BINARY. Locally, I removed that also, and tested your newest patch, and it still functions great for me.

    @PierreQuentel
    Copy link
    Mannequin

    PierreQuentel mannequin commented Jan 14, 2011

    Glenn, you read my mind ;-)

    Thanks for mentioning the O_BINARY thing. New (last !) patch attached

    @vstinner
    Copy link
    Member

    r87996+r87997 adds encoding and errors argument to parse_qs() and parse_qsl() of urllib.parse. It is needed to decoded correctly %XX syntax in cgi.

    r87998 is the patch on the cgi module.

    Changes with cgi_32.patch:

    • Use TextIOWrapper instead of TextIOBase, because TextIOBase has no buffer
      attribute
    • typo in a docstring: "it must must" => "must"
    • (docstring) default: sys.stdin => default: sys.stdin.buffer
    • PEP-8: hasattr(a,b) => hasattr(a, b) (same for isinstance) and
      "encoding = 'utf-8'" => "encoding='utf-8'" (in the argument list)
    • "xxx.decode(...) # str": remove useless # str comment. decode() always give
      unicode in Python 3 (same change for ".encode() # bytes")
    • Rename "next" variables to "next_boundary" because next is a builtin
      function in Python 3 (unrelated change).
    • FieldStorage.innerboundary and FieldStorage.outerboundary are bytes objects:
      encode innerboundary in the constructor, and raise an error if outerboundary
      is not a bytes object
    • Rename _use_bytes to _binary_file
    • isinstance(bytes) test: write the type, not the value, in the error message
    • Replace line[:2] == b'--' by line.startswith(b'--'), and then replace
      line.strip() by line.rstrip()
    • test_fieldstorage_multipart() uses ASCII (and specifiy the encoding to FieldStorage)
    • add FieldStorage.errors attribute: pass it to parse_qsl()
    • add errors attribute to FieldStorage: same default value than urllib.parse.unquote(): 'replace'
    • parse(): pass encoding argument to parse_qs()
    • FieldStorage: pass encoding and errors arguments to parse_qsl()

    Because the patch on TextIOBase, it patched the docstring:
    ---
    fp : file pointer; default: sys.stdin.buffer
    (not used when the request method is GET)
    Can be :
    1. an instance of (a subclass of) TextIOWrapper, in this case it
    must provide an attribute "buffer" = the binary layer that returns
    bytes from its read() method, and preferably an attribute
    "encoding" (defaults to latin-1)
    2. an object whose read() and readline() methods return bytes
    ---
    becomes
    ---
    fp : file pointer; default: sys.stdin.buffer
    (not used when the request method is GET)
    Can be :
    1. a TextIOWrapper object
    2. an object whose read() and readline() methods return bytes
    ---

    Replace "type(value) is type([])" is done in another commit: r87999.

    I consider that the work on this issue is done, and so I close it. If I am wrong, explain why and repoen the issue.

    Please test the cgi as much as possible before Python 3.2 final: reopen the issue if it doesn't work.

    @vstinner
    Copy link
    Member

    Oh, I forgot to credit the author(s): who wrote the patch?

    @PierreQuentel
    Copy link
    Mannequin

    PierreQuentel mannequin commented Jan 14, 2011

    Thanks a lot Victor !

    I wrote the patch : Pierre Quentel (pierre.quentel@gmail.com) with many
    inputs by Glenn Linderman

    2011/1/14 STINNER Victor <report@bugs.python.org>

    STINNER Victor <victor.stinner@haypocalc.com> added the comment:

    Oh, I forgot to credit the author(s): who wrote the patch?

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue4953\>


    @PierreQuentel PierreQuentel mannequin changed the title cgi module cannot handle POST with multipart/form-data in 3.x cgi module cannot handle POST with multipart/form-data in 3.x Jan 14, 2011
    @vstinner
    Copy link
    Member

    TODO: Add more tests to test_cgi. What is the latest patch for test_cgi?

    @PierreQuentel
    Copy link
    Mannequin

    PierreQuentel mannequin commented Jan 14, 2011

    My latest patch for test_cgi is in cgi_32.patch

    I will try to add more tests later

    @vstinner
    Copy link
    Member

    haypo>> What is the latest patch for test_cgi?
    quentel> My latest patch for test_cgi is in cgi_32.patch

    Ok, but cgi_32.patch doesn't add any test. I only adapt existing tests for your other changes.

    I remove cgi_32.patch because it was commited.

    @vstinner
    Copy link
    Member

    Remove cgi_plus_tests.diff: it looks to be an old version of cgi_32.patch.

    @r.david.murray: Did you write cgi_plus_tests.diff, or is it based on the work on Pierre Quentel?

    @vstinner
    Copy link
    Member

    Remove tmpy44zj7.html and tmpav1vve.html: a similar file is included in full_source_and_error.zip.

    @bitdancer
    Copy link
    Member

    Victor: we normally leave the patch file that was committed attached to the issue for future reference.

    The _plus_tests file was just the original patch plus the existing cgi tests adjusted to pass in bytes instead of strings to cgi, if I recall correctly. Nor original code of mine in either part :)

    @vpython
    Copy link
    Mannequin

    vpython mannequin commented Jan 14, 2011

    Thanks to Pierre for producing patch after patch and testing testing testing, and to Victor for committing it, as well as others that contributed in smaller ways, as I tried to. I look forward to 3.2 rc1 so I can discard all my temporary patched copies of cgi.py

    @vstinner
    Copy link
    Member

    Because I'm unable to read the whole history and analyze each file attached to this issue, I opened bpo-10911 to ask to write more tests for the cgi module.

    @vstinner
    Copy link
    Member

    Le vendredi 14 janvier 2011 à 19:11 +0000, R. David Murray a écrit :

    Victor: we normally leave the patch file that was committed attached
    to the issue for future reference.

    Sorry, but there were too much files. I was trying to figure out if
    there is something useful in the files.

    The _plus_tests file was just the original patch plus the existing cgi
    tests adjusted to pass in bytes instead of strings to cgi, if I recall
    correctly. Nor original code of mine in either part :)

    Ok.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants