New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Content-Length should be specified #55291
Comments
I found this bug when I started to trying Python 3.2 release candidate 1. When using urllib.request.urlopen to handle HTTP POST, I got the error message: ValueError: Content-Length should be specified for iterable data of type <class 'str'> 'foo=bar' I'll attach the patch and test case. |
Senthil, could this be a regression of the recent urllib transfer-encoding changes? |
The POST data should be bytes. So in the attached test case, instead of request = urllib.request.Request('http://does.not.matter', 'foo=bar') it should be: request = urllib.request.Request('http://does.not.matter', b'foo=bar') And the Content-Length will be calculated using this logic. mv = memoryview(data)
Content-length = len(mv) * mv.itemsize Should we emphasize further that data should be bytes? I think error |
Since r70638, the http client encodes unicode to ISO-8859-1: |
If the POST data should be bytes which I also think reasonable, should urllib.parse.urlencode return bytes instead of str? >>> urllib.parse.urlencode({'foo': 'bar'})
'foo=bar'
>>> urllib.parse.urlencode({b'foo': b'bar'})
'foo=bar' |
That would seem correct to me. |
It is also a question whether to disallow str explicitly, instead of letting it go through the Iterable check. |
So, what's the decision to be taken? I'm willing to provide patches (if I need to), but I need to know *the reasonable behaviors*. :) |
For this particular issue, I think, it is good idea to disallow str |
Then let us do that. Senthil, what about urlencode of bytes values returning a str? |
On Sat, Feb 5, 2011 at 12:33 AM, Georg Brandl <report@bugs.python.org> wrote:
Sorry, the late response. I needed some time to look at this one and I The resultant str is certainly useful when we want to construct a URL (Well, I also think that urllencode could be modified to the effect So, for this issue. A patch would a prevent str data from being posted Shall we go ahead with this, Georg? I shall post a patch immediately. |
Here is a patch to resolve this issue. Let me know if it okay to commit.(Or feel free to commit too). |
Georg Brandl: can this fix go into Python 3.2? It changes the API. I like any patch rejecting unicode where unicode is irrevelant (where we don't know how to choose the right encoding). About the patch: you should maybe add a test to ensure that str is rejected with a TypeError. |
Victor - It does not change the API. Only that the ValueError message which had a confusing message that "for iterable data of type <class 'str'> " is made clear that POST should not be a str. |
Victor: I don't see an API change here. A ValueError is raised in both cases, and error messages are not part of the API. (And BTW, you may call me Georg.) Senthil: I'd like the documentation change to be a bit more explicit that a string is always returned. Since the string must be encoded, it would also be helpful to mention which encoding to use. The new exception message looks slightly incorrect to me: the argument can also be an iterable of bytes. |
Here is the patch with addressing the comments.
|
patch with Docs :ref: to proper section. |
I still find "user-specified encoding" unclear. This can be addressed at a different time though. Your exception message is missing a space between "bytes" and "or"; otherwise this is ok to commit. |
Thanks. Committed in r88394. Regarding the encoding information. An explanation of this sort might be helpful. The string can be encoded using ISO-8859-1 which is the default encoding for POST data or the user can also encode using a custom encoding , in which case, Content-Type: header should specify the encoding value in addition to 'application/x-www-form-urlencoded' which is specified for POST data. For Example:, if one has to use utf-8
|
Lowering priority and making a doc issue now that the code change has been made. |
BTW, there was no entry in Misc/NEWS, so this was not in whatsnew/3.2.rst. |
Here is the docs patch which will help us close the issue. Addressing Eric's last comment - I believe the what's new and News for this issue was added with the feature, this one was Exception msg change. |
I think the information in the patch should go in the urlopen doc, not in urlencode. Adding a note to urlencode that says that the result must be encoded is fine, but all the details about the default encoding and the fact that an extra Content-Type header is necessary when the encoding is not iso-8859-1 belong to the urlopen doc IMHO. (Is the header actually necessary? I think I always sent utf-8 without header, and the example you added in r88394 also uses utf-8 with no extra headers.) |
New changeset 057cf78ed576 by Senthil Kumaran in branch '3.2': New changeset 90e35b91756d by Senthil Kumaran in branch 'default': |
I have rewritten some parts of the documentation, explaining the use of charset parameter with Content-Type header. Used the suggestions from the previous patch's review comments too. I see that other parts of the documentation can be improved further, I shall do that separately. I am closing this issue. Thanks! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: