New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content-length is incorrect when request body is a list or tuple #67539
Comments
Rather than summing the value of each element of a list or tuple to use as the value of the content-length header, the length of the list or tuple is used. |
Updated patch based on review. |
[Edit Error: 'utf8' codec can't decode byte 0xe2 in position 207: invalid continuation byte] The documentation currently says “Content-Length header should be explicitly provided when the body is an iterable”. See Lib/urllib/request.py:1133 for how it is done for urlopen(), using memoryview(), which is probabaly more correct. |
On 2015-01-29 9:51 PM, Martin Panter wrote:
Sure, entirely disabling computing the content length if the body is an The current implementation /should/ be correct whether elements are Is there something that I'm missing? I could possibly see an argument |
Sorry my comment was a bit rushed. I wasn’t saying this feature shouldn’t be added. I guess I was pointing out two things:
|
Thanks for the clarification Martin. After giving this some further thought, I think that the best way to go is to /only/ calculate and add the Content-Length header if each element in the list or tuple is pre-encoded. If it's mixed or only strings, then there are one of three options:
The attached patch uses option 1 in order to not add any CPU or memory cost to the operation, but still fix the Content-Length issue as reported. I've also updated the docs to indicate as much. |
The length of an encoded Latin-1 string should equal the length of the unencoded text string, since it is a one-to-one character-to-byte encoding. So encoding should not actually be needed to determine the Latin-1 encoded length. Though I’m not particularly excited by silently Latin-1 encoding text bodies in the first place. |
Thanks for the review Martin, I've addressed your comments.
I'm not sure why, but the auto-encoding of the raw string input object |
New patch looks good I think. Making the encoding code more consistent is nice. |
There are two issues. The one is that calculated Content-Length is not correct for lists, tuples, and other types (such as deque or array.array). The right solution is to calculate size using a technique used in urllib. Content-Length shouldn't be calculated for lists, tuples, and other non-bytes-compatible sequences. This is a bug, and the patch should be applied to all maintained releases. The second issue is feature request. Allow calculating Content-Length for lists and tuples. |
Technically I don’t think there is a bug. The documentation says [the] “Content-Length header should be explicitly provided”, so if you don’t set it you could argue that you’re using the library wrong. For this issue I think Demian was trying to add support (i.e. new feature) for implicit Content-Length with tuples and lists of bytes (or strings). He has also added support for iterables of Latin-1 encodable text strings. What you are suggesting Serhiy sounds like a separate new feature to support bodies of arbitrary bytes-like objects (or lists or tuples of them). According to the documentation, only byte and Latin-1 text strings, file objects supporting stat(), and iterables are currently supported. It does not say, but before this patch I think the iterables had to be of bytes-like objects. |
I just updated the docs to what I think is the current reality. See bpo-23740 for what I think are problems with the current implementation, aside from any enhancement of computing a length for tuple or list. Since the latter cannot be done reliably unless we know the list or tuple is all bytes, I propose that we don't do it at all (since I'd like to see iterables of text strings supported). |
Well, the current reality not counting the bug reported in this issue. So, I documented it as if the fix here is to not set the length when body is an iterator. |
David: Calculating the length of a list or tuple of Latin-1 text strings should actually be straight-forward and reliable, because it is a single-byte encoding. However I am starting to think adding a new special case for lists and tuples is a bad idea. There are already way too many special cases. |
I'd agree with this if it wasn't relatively trivial to calculate. There's no reason that I can think of to exclude the auto-generated Content-Length header for data types for which the size is known.
The bug is that the Content-Length header is currently added for bodies that are lists and tuples and the value is incorrect. For example: con.request('POST', '/', ['aaa', 'bbb', 'ccc']) results in Host: example.com
The patch here adds support for iterables of text strings (as well as iterables comprised of both bytes and strings). Content-Length can be computed reliably as the size of a latin1-encoded string will be identical to the original string. |
If bpo-12319 is accepted, the implementation for Content-Length should also likely be migrated to this issue to be applied to maintenance branches as a bug fix. |
See also bpo-12327 for length issue using StingIO. |
Another issue should be addressed by patch... When trying to guess the content-length, here is the code you find...
The call to So, the code should be replaced by (in both python 2.7 and 3):
|
Vincent: That sounds more like a case of bpo-15267, or have you found a way to trigger the AttributeError in Python 3 as well? |
Vincent: The logic to determine content length is undergoing a bit of an overhaul as part of bpo-12319, which I'm hoping to wrap up in the next week or so. |
Martin: You're right, it's the same issue, and only related to python2's old style classes. Sorry for the useless noise. Demian: My problem is |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: