Message 125885 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	v+python
Recipients	amaury.forgeotdarc, barry, eric.araujo, erob, flox, ggenellina, gvanrossum, oopos, pebbe, pitrou, quentel, r.david.murray, tcourbon, tercero12, tobias, v+python, vstinner
Date	2011-01-10.08:52:11
SpamBayes Score	4.440892e-16
Marked as misclassified	No
Message-id	<1294649539.28.0.477289044307.issue4953@psf.upfronthosting.co.za>
In-reply-to

Content
This looks much simpler than the previous patch. However, I think it can be further simplified. This is my first reading of this code, however, so I might be totally missing something(s). Pierre said: Besides FieldStorage, I modified the parse() function at module level, but not parse_multipart (should it be kept at all ?) I say: Since none of this stuff works correctly in 3.x, and since there are comments in the code about "folding" the parse* functions into FieldStorage, then I think they could be deprecated, and not fixed. If people are still using them, by writing code to work around their deficiencies, that code would continue to work for 3.2, but then not in 3.3 when that code is removed? That seems reasonable to me. In this scenario, the few parse* functions that are used by FieldStorage should be copied into FieldStorage as methods (possibly private methods), and fixed there, instead of being fixed in place. That was all the parse* functions could be deprecated, and the use of them would be unchanged for 3.2. Since RFC 2616 says that the HTTP protocol uses ISO-8859-1 (latin-1), I think that should be required here, instead of deferring to fp.encoding, which would eliminate 3 lines. Also, the use of FeedParser could be replaced by BytesFeedParser, thus eliminating the need to decode header lines in that loop. And, since this patch will be applied only to Python 3.2+, the mscvrt code can be removed (you might want a personal copy with it for earlier version of Python 3.x, of course). I wonder if the 'ascii' reference should also be 'latin-1'? In truly reading and trying to understand this code to do a review, I notice a deficiency in _parseparam and parse_header: should I file new issues for them? (perhaps these are unimportant in practice; I haven't seen \ escapes used in HTTP headers). RFC 2616 allows for "" which are handled in _parseparam. And for \c inside "", which is handled in parse_header. But: _parseparam counts " without concern for \", and parse_header allows for \\ and \" but not \f or \j or \ followed by other characters, even though they are permitted (but probably not needed for much). In make_file, shouldn't the encoding and newline parameters be preserved when opening text files? On the other hand, it seems like perhaps we should leverage the power of IO to do our encoding/decoding... open the file with the TextIOBase layer set to the encoding for the MIME part, but then just read binary without decoding it, write it to the .buffer of the TextIOBase, and when the end is reached, flush it, and seek(0). Then the data can be read back from the TextIOBase layer, and it will be appropriate decoded. Decoding errors might be deferred, but will still occur. This technique would save two data operations: the explicit decode in the cgi code, and the implicit encode in the IO layers, so resources would be saved. Additionally, if there is a CONTENT-LENGTH specified for non-binary data, the read_binary method should be used for it also, because it is much more efficient than readlines... less scanning of the data, and fewer outer iterations. This goes well with the technique of leaving that data in binary until read from the file. It seems that in addition to fixing this bug, you are also trying to limit the bytes read by FieldStorage to some maximum (CONTENT_LENGTH). This is good, I guess. But skip_lines() has a readline potentially as long as 32KB, that isn't limited by the maximum. Similar in read_lines_to_outer_boundary, and read_lines_to_eof (although that may not get called in the cases that need to be limited). If a limit is to be checked for, I think it should be a true, exact limit, not an approximate limit. See also issue 10879.

This looks much simpler than the previous patch. However, I think it can be further simplified. This is my first reading of this code, however, so I might be totally missing something(s).

Pierre said:
Besides FieldStorage, I modified the parse() function at module level, but not parse_multipart (should it be kept at all ?)

I say:
Since none of this stuff works correctly in 3.x, and since there are comments in the code about "folding" the parse* functions into FieldStorage, then I think they could be deprecated, and not fixed. If people are still using them, by writing code to work around their deficiencies, that code would continue to work for 3.2, but then not in 3.3 when that code is removed? That seems reasonable to me. In this scenario, the few parse* functions that are used by FieldStorage should be copied into FieldStorage as methods (possibly private methods), and fixed there, instead of being fixed in place. That was all the parse* functions could be deprecated, and the use of them would be unchanged for 3.2.

Since RFC 2616 says that the HTTP protocol uses ISO-8859-1 (latin-1), I think that should be required here, instead of deferring to fp.encoding, which would eliminate 3 lines.

Also, the use of FeedParser could be replaced by BytesFeedParser, thus eliminating the need to decode header lines in that loop.

And, since this patch will be applied only to Python 3.2+, the mscvrt code can be removed (you might want a personal copy with it for earlier version of Python 3.x, of course).

I wonder if the 'ascii' reference should also be 'latin-1'?

In truly reading and trying to understand this code to do a review, I notice a deficiency in _parseparam and parse_header: should I file new issues for them? (perhaps these are unimportant in practice; I haven't seen \ escapes used in HTTP headers). RFC 2616 allows for "" which are handled in _parseparam. And for \c inside "", which is handled in parse_header. But: _parseparam counts " without concern for \", and parse_header allows for \\ and \" but not \f or \j or \ followed by other characters, even though they are permitted (but probably not needed for much).

In make_file, shouldn't the encoding and newline parameters be preserved when opening text files? On the other hand, it seems like perhaps we should leverage the power of IO to do our encoding/decoding... open the file with the TextIOBase layer set to the encoding for the MIME part, but then just read binary without decoding it, write it to the .buffer of the TextIOBase, and when the end is reached, flush it, and seek(0). Then the data can be read back from the TextIOBase layer, and it will be appropriate decoded. Decoding errors might be deferred, but will still occur. This technique would save two data operations: the explicit decode in the cgi code, and the implicit encode in the IO layers, so resources would be saved. Additionally, if there is a CONTENT-LENGTH specified for non-binary data, the read_binary method should be used for it also, because it is much more efficient than readlines... less scanning of the data, and fewer outer iterations. This goes well with the technique of leaving that data in binary until read from the file.

It seems that in addition to fixing this bug, you are also trying to limit the bytes read by FieldStorage to some maximum (CONTENT_LENGTH). This is good, I guess. But skip_lines() has a readline potentially as long as 32KB, that isn't limited by the maximum. Similar in read_lines_to_outer_boundary, and read_lines_to_eof (although that may not get called in the cases that need to be limited). If a limit is to be checked for, I think it should be a true, exact limit, not an approximate limit.

History
Date	User	Action	Args
2011-01-10 08:52:19	v+python	set	recipients: + v+python, gvanrossum, barry, amaury.forgeotdarc, ggenellina, pitrou, vstinner, eric.araujo, r.david.murray, oopos, tercero12, tcourbon, tobias, flox, pebbe, quentel, erob
2011-01-10 08:52:19	v+python	set	messageid: <1294649539.28.0.477289044307.issue4953@psf.upfronthosting.co.za>
2011-01-10 08:52:12	v+python	link	issue4953 messages
2011-01-10 08:52:11	v+python	create