Issue11898
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011-04-21 13:42 by bero, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
python-2.7.1-fix-httplib-UnicodeDecodeError.patch | bero, 2011-04-21 13:42 | Proposed fix | ||
data | Jiri.Horky, 2011-05-15 18:29 | binary data that triggers the problem |
Messages (17) | |||
---|---|---|---|
msg134211 - (view) | Author: Bernhard Rosenkraenzer (bero) | Date: 2011-04-21 13:42 | |
Sending e.g. a JPEG file with a httplib POST request (e.g. through mechanize) can result in an error like this: File "/usr/lib64/python2.7/httplib.py", line 947, in request self._send_request(method, url, body, headers) File "/usr/lib64/python2.7/httplib.py", line 988, in _send_request self.endheaders(body) File "/usr/lib64/python2.7/httplib.py", line 941, in endheaders self._send_output(message_body) File "/usr/lib64/python2.7/httplib.py", line 802, in _send_output msg += message_body UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 2566: invalid start byte The code triggering this is the attempt to merge the msg and message_body into a single request in httplib.py lines 791+ The patch I'm attaching treats an invalid string of unknown encoding (e.g. binary data wrapped as string) like something that isn't a string. Works for me with the patch. |
|||
msg134824 - (view) | Author: Terry J. Reedy (terry.reedy) * | Date: 2011-04-30 00:11 | |
Did you run the httplib test with your patch? Interactively >>> from test.test_httplib import test_main as f; f() (verbose mode, over 40 tests) In 3.x, the patch would be to http/client.py, line 802 in 3.2 release if isinstance(message_body, str) # becomes if isinstance(message_body, bytes) Will this be an issue in 3.x? |
|||
msg134840 - (view) | Author: Bernhard Rosenkraenzer (bero) | Date: 2011-04-30 06:57 | |
Not sure how to get it into verbose mode (I presume you don't mean "python -v"), but normal mode (22 tests) works fine: Python 2.7.1 (r271:86832, Apr 22 2011, 13:40:40) [GCC 4.6.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from test.test_httplib import test_main as f >>> f() test_auto_headers (test.test_httplib.HeaderTests) ... ok test_ipv6host_header (test.test_httplib.HeaderTests) ... ok test_putheader (test.test_httplib.HeaderTests) ... ok test_responses (test.test_httplib.OfflineTest) ... ok test_bad_status_repr (test.test_httplib.BasicTest) ... ok test_chunked (test.test_httplib.BasicTest) ... ok test_chunked_head (test.test_httplib.BasicTest) ... ok test_epipe (test.test_httplib.BasicTest) ... ok test_filenoattr (test.test_httplib.BasicTest) ... ok test_host_port (test.test_httplib.BasicTest) ... ok test_incomplete_read (test.test_httplib.BasicTest) ... ok test_negative_content_length (test.test_httplib.BasicTest) ... ok test_partial_reads (test.test_httplib.BasicTest) ... ok test_read_head (test.test_httplib.BasicTest) ... ok test_response_headers (test.test_httplib.BasicTest) ... ok test_send (test.test_httplib.BasicTest) ... ok test_send_file (test.test_httplib.BasicTest) ... ok test_status_lines (test.test_httplib.BasicTest) ... ok testTimeoutAttribute (test.test_httplib.TimeoutTest) This will prove that the timeout gets through ... ok test_attributes (test.test_httplib.HTTPSTimeoutTest) ... ok testHTTPConnectionSourceAddress (test.test_httplib.SourceAddressTest) ... ok testHTTPSConnectionSourceAddress (test.test_httplib.SourceAddressTest) ... ok ---------------------------------------------------------------------- Ran 22 tests in 0.004s OK Not sure if this is an issue with 3.x - I haven't used 3.x so far. |
|||
msg135290 - (view) | Author: Senthil Kumaran (orsenthil) * | Date: 2011-05-06 13:08 | |
Hello Bernhard, I tried to a POST of JPEG file, through urllib2 (which internally uses httplib) and goes through the code that you pointed out and I don't face any problem. I am able to POST binaries using httplib. I am also surprised at UnicodeDecodeError which is being raised. The POST data is string (8-bit strings) in Python2.7 and the portion of code will have no problem in creating the content. You will get UnicodeDecodeError, only if you explicitly pass a Unicode Object as Data and never when you pass string or binary string. Perhaps mechanize is doing something wrong here and sending a Unicode object. So, this really does not look like a bug to me. (Also a note on patch. The patch tries to silence the error, which is wrong thing to do). If you can provide a simple snippet to reproduce this error, feel free reopen this again. I am closing this as 'works for me'. Thanks. |
|||
msg136043 - (view) | Author: Jiri Horky (Jiri.Horky) | Date: 2011-05-15 18:29 | |
I have the same problem as the original submitter. The reason it previously worked for you was probably because you didn't utilize a "right" unicode string in the urllib2.request. The following code will raise the exception (I enclose the data file for completeness, but it fails with basically any binary data). It works fine with Python 2.6.6, but fails with Python 2.7.1. {{{ import urllib2 f = open("data", "r") mydata = f.read() f.close() #this fails url=unicode('http://localhost/test') #this works #url=str('http://localhost/test') #this also works #url=unicode('http://localhost') req = urllib2.Request(url, data=mydata) urllib2.urlopen(req) }}} |
|||
msg136060 - (view) | Author: Senthil Kumaran (orsenthil) * | Date: 2011-05-16 01:51 | |
The bug was about sending Binary "data" via httplib. In the example you wrote, you are sending a unicode "url" and experiencing a failure for certain examples. In the 2.7, the urls should be str type, we don't have function to deal with unicode url separately and sending of unicode url is an error. |
|||
msg138056 - (view) | Author: Ion Scerbatiuc (cyrus) | Date: 2011-06-10 08:48 | |
Hello, I would like to subscribe to the issue. The problem seems to indeed exist in Python 2.7. What I'm doing is to proxy HTTP requests (using Django) and the PUT / POST requests are working fine on Python 2.6 but are failing on 2.7 with the error already presented in the first bero's message. I'm using httplib2 and the code looks like {{ http = httplib2.Http(timeout=5) try: resp, content = http.request( request_url, method, body=body, headers=headers) except (AttributeError, httplib.ResponseNotReady), e: # ... }} Body is the result of the Django's request.read() which in fact contain the binary data from the PUT / POST request. The full stack trace is: {{ Traceback: File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response 111. response = callback(request, *callback_args, **callback_kwargs) File "/home/cyrus/workspace/macleod/apps/macleod/macleod/auth.py" in _decorated_view 33. return view(request, *args, **kwargs) File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view 39. resp = view_func(*args, **kwargs) File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view 52. return view_func(*args, **kwargs) File "/home/cyrus/workspace/macleod/apps/macleod/macleod/views.py" in dispatch 55. original=request.build_absolute_uri()) File "/home/cyrus/workspace/macleod/apps/macleod/macleod/handlers/its.py" in proxy 51. body=body, headers=headers) File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in request 1129. (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey) File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in _request 901. (response, content) = self._conn_request(conn, request_uri, method, body, headers) File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in _conn_request 862. conn.request(method, request_uri, body, headers) File "/usr/local/lib/python2.7/httplib.py" in request 941. self._send_request(method, url, body, headers) File "/usr/local/lib/python2.7/httplib.py" in _send_request 975. self.endheaders(body) File "/usr/local/lib/python2.7/httplib.py" in endheaders 937. self._send_output(message_body) File "/usr/local/lib/python2.7/httplib.py" in _send_output 795. msg += message_body }} |
|||
msg138059 - (view) | Author: Ion Scerbatiuc (cyrus) | Date: 2011-06-10 09:06 | |
Hello again, After some digging I found that the "real" problem was because the provided URL was a unicode string and the concatenation was failing. Maybe this is not a big deal, but I think we should least do a proper assertion for the provided URL or some other checks, because the error encountered is at least confusing. |
|||
msg138128 - (view) | Author: Terry J. Reedy (terry.reedy) * | Date: 2011-06-10 18:00 | |
Ion, as you perhaps noticed, posting a message 'subscribes' you (puts you on the nosy list). One can also add oneself as nosy with the little button under it without saying anything. This should be reopened because we do not change error classes in bugfix releases (ie, future 2.7.x releases) because that can break code -- unless the error class is contrary to the doc and we decide the doc is right. Even as a new feature, a change is dubious and carefully to be considered. |
|||
msg138142 - (view) | Author: Terry J. Reedy (terry.reedy) * | Date: 2011-06-10 23:45 | |
should *not* be reopened. Sorry for omission of 'not'. |
|||
msg138908 - (view) | Author: Sorin Sbarnea (ssbarnea) * | Date: 2011-06-24 11:00 | |
Can we get more info regarding resolution of this bug. Due to this bug httplib cannot be used anymore to send binary data. This bug breaks other modules, one example being PyAMF (that does communicate only using binary data). |
|||
msg138914 - (view) | Author: Sorin Sbarnea (ssbarnea) * | Date: 2011-06-24 11:25 | |
There is another problem that makes the problem even more critical. OS X 10.7 does include Python 2.7.1 as the *default* interpreter. So we'll need both a fix for the future and an workaround. BTW, the hack with sys.setdefaultencoding cannot be used if you really send binary data. |
|||
msg138952 - (view) | Author: Senthil Kumaran (orsenthil) * | Date: 2011-06-24 15:27 | |
Sorin, can you please open another report with more details and how some condition in httplib breaks PyAMF. We will see through that it is fixed. Commenting on an invalid closed issue is confusing. |
|||
msg138954 - (view) | Author: Sorin Sbarnea (ssbarnea) * | Date: 2011-06-24 15:40 | |
Added as bug http://bugs.python.org/issue12398 |
|||
msg138972 - (view) | Author: Terry J. Reedy (terry.reedy) * | Date: 2011-06-24 18:22 | |
Soren, this is an issue that claimed a bug, not a bug. The resolution is that the claim appears false because the problem arose from using unicode rather than bytes url. The error message may be confusing, but the error class cannot be changed. Senthil says that he *did* send non-ascii bytes with no problem. |
|||
msg139110 - (view) | Author: Sorin Sbarnea (ssbarnea) * | Date: 2011-06-25 19:54 | |
I have to add some details here. First, this bug has nothing to do with the URL, it does reproduce for normal urls. Still the problem with the line: "msg += message_body" is quite complex when combined with Python 2.7: type(msg) is unicode type(message_body) is str ... even if I tried to manually force Python for use bytes. It seams that in 2.7 bytes are alias to str. Due to this the code will fail to run only on 2.7 because it will try to convert binary data to unicode string. If I am not mistaken the code will work with Python 3.x, because there bytes() are not str(). |
|||
msg139116 - (view) | Author: Senthil Kumaran (orsenthil) * | Date: 2011-06-25 20:22 | |
Hi Sorin, On Sat, Jun 25, 2011 at 07:54:24PM +0000, sorin wrote: > type(message_body) is str ... even if I tried to manually force > Python for use bytes. It seams that in 2.7 bytes are alias to str. > Due to this the code will fail to run only on 2.7 because it will > try to convert binary data to unicode string. Bit confused here. You encode the string to bytes and decode it back to str. One does not force bytes to str. And if you use, str or bytes consistently in Python2.7 you wont face the problem. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:16 | admin | set | github: 56107 |
2011-07-04 16:16:35 | eric.araujo | set | messages: - msg134878 |
2011-06-25 20:22:42 | orsenthil | set | messages: + msg139116 |
2011-06-25 19:54:23 | ssbarnea | set | messages: + msg139110 |
2011-06-24 18:22:47 | terry.reedy | set | messages: + msg138972 |
2011-06-24 15:40:23 | ssbarnea | set | messages: + msg138954 |
2011-06-24 15:27:10 | orsenthil | set | messages: + msg138952 |
2011-06-24 11:25:05 | ssbarnea | set | messages: + msg138914 |
2011-06-24 11:00:41 | ssbarnea | set | nosy:
+ ssbarnea messages: + msg138908 |
2011-06-10 23:45:50 | terry.reedy | set | messages: + msg138142 |
2011-06-10 18:00:15 | terry.reedy | set | messages: + msg138128 |
2011-06-10 09:06:29 | cyrus | set | messages: + msg138059 |
2011-06-10 08:48:13 | cyrus | set | nosy:
+ cyrus messages: + msg138056 |
2011-05-16 01:51:57 | orsenthil | set | messages: + msg136060 |
2011-05-15 18:29:59 | Jiri.Horky | set | files:
+ data nosy: + Jiri.Horky messages: + msg136043 |
2011-05-06 13:08:19 | orsenthil | set | status: open -> closed messages: + msg135290 assignee: orsenthil resolution: works for me stage: test needed -> resolved |
2011-04-30 16:30:08 | eric.araujo | set | nosy:
+ eric.araujo messages: + msg134878 |
2011-04-30 06:57:04 | bero | set | messages: + msg134840 |
2011-04-30 00:11:31 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg134824 stage: test needed |
2011-04-21 17:37:57 | santoso.wijaya | set | nosy:
+ santoso.wijaya |
2011-04-21 13:44:13 | ezio.melotti | set | nosy:
+ orsenthil, ezio.melotti |
2011-04-21 13:42:33 | bero | create |