classification
Title: Sending binary data with a POST request in httplib can cause Unicode exceptions
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Jiri.Horky, bero, cyrus, eric.araujo, ezio.melotti, orsenthil, santoso.wijaya, ssbarnea, terry.reedy
Priority: normal Keywords: patch

Created on 2011-04-21 13:42 by bero, last changed 2011-07-04 16:16 by eric.araujo. This issue is now closed.

Files
File name Uploaded Description Edit
python-2.7.1-fix-httplib-UnicodeDecodeError.patch bero, 2011-04-21 13:42 Proposed fix
data Jiri.Horky, 2011-05-15 18:29 binary data that triggers the problem
Messages (17)
msg134211 - (view) Author: Bernhard Rosenkraenzer (bero) Date: 2011-04-21 13:42
Sending e.g. a JPEG file with a httplib POST request (e.g. through mechanize) can result in an error like this:

  File "/usr/lib64/python2.7/httplib.py", line 947, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib64/python2.7/httplib.py", line 988, in _send_request
    self.endheaders(body)
  File "/usr/lib64/python2.7/httplib.py", line 941, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 802, in _send_output
    msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 2566: invalid start byte


The code triggering this is the attempt to merge the msg and message_body into a single request in httplib.py lines 791+

The patch I'm attaching treats an invalid string of unknown encoding (e.g. binary data wrapped as string) like something that isn't a string.

Works for me with the patch.
msg134824 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-04-30 00:11
Did you run the httplib test with your patch? Interactively
>>> from test.test_httplib import test_main as f; f()
(verbose mode, over 40 tests)

In 3.x, the patch would be to http/client.py, line 802 in 3.2 release
if isinstance(message_body, str) # becomes
if isinstance(message_body, bytes)

Will this be an issue in 3.x?
msg134840 - (view) Author: Bernhard Rosenkraenzer (bero) Date: 2011-04-30 06:57
Not sure how to get it into verbose mode (I presume you don't mean "python -v"), but normal mode (22 tests) works fine:


Python 2.7.1 (r271:86832, Apr 22 2011, 13:40:40)
[GCC 4.6.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from test.test_httplib import test_main as f
>>> f()
test_auto_headers (test.test_httplib.HeaderTests) ... ok
test_ipv6host_header (test.test_httplib.HeaderTests) ... ok
test_putheader (test.test_httplib.HeaderTests) ... ok
test_responses (test.test_httplib.OfflineTest) ... ok
test_bad_status_repr (test.test_httplib.BasicTest) ... ok
test_chunked (test.test_httplib.BasicTest) ... ok
test_chunked_head (test.test_httplib.BasicTest) ... ok
test_epipe (test.test_httplib.BasicTest) ... ok
test_filenoattr (test.test_httplib.BasicTest) ... ok
test_host_port (test.test_httplib.BasicTest) ... ok
test_incomplete_read (test.test_httplib.BasicTest) ... ok
test_negative_content_length (test.test_httplib.BasicTest) ... ok
test_partial_reads (test.test_httplib.BasicTest) ... ok
test_read_head (test.test_httplib.BasicTest) ... ok
test_response_headers (test.test_httplib.BasicTest) ... ok
test_send (test.test_httplib.BasicTest) ... ok
test_send_file (test.test_httplib.BasicTest) ... ok
test_status_lines (test.test_httplib.BasicTest) ... ok
testTimeoutAttribute (test.test_httplib.TimeoutTest)
This will prove that the timeout gets through ... ok
test_attributes (test.test_httplib.HTTPSTimeoutTest) ... ok
testHTTPConnectionSourceAddress (test.test_httplib.SourceAddressTest) ... ok
testHTTPSConnectionSourceAddress (test.test_httplib.SourceAddressTest) ... ok

----------------------------------------------------------------------
Ran 22 tests in 0.004s

OK


Not sure if this is an issue with 3.x - I haven't used 3.x so far.
msg135290 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-05-06 13:08
Hello Bernhard, 

I tried to a POST of JPEG file, through urllib2 (which internally uses httplib) and goes through the code that you pointed out and I don't face any problem. I am able to POST binaries using httplib.

I am also surprised at UnicodeDecodeError which is being raised. The POST data is string (8-bit strings) in Python2.7 and the portion of code will have no problem in creating the content.

You will get UnicodeDecodeError, only if you explicitly pass a Unicode Object as Data and never when you pass string or binary string.

Perhaps mechanize is doing something wrong here and sending a Unicode object.

So, this really does not look like a bug to me.

(Also a note on patch. The patch tries to silence the error, which is wrong thing to do).

If you can provide a simple snippet to reproduce this error, feel free reopen this again. I am closing this as 'works for me'.

Thanks.
msg136043 - (view) Author: Jiri Horky (Jiri.Horky) Date: 2011-05-15 18:29
I have the same problem as the original submitter.

The reason it previously worked for you was probably because you didn't utilize a "right" unicode string in the urllib2.request. The following code will raise the exception (I enclose the data file for completeness, but it fails with basically any binary data).

It works fine with Python 2.6.6, but fails with Python 2.7.1.

{{{
import urllib2

f = open("data", "r")
mydata = f.read()
f.close()

#this fails
url=unicode('http://localhost/test')

#this works
#url=str('http://localhost/test')

#this also works 
#url=unicode('http://localhost')

req = urllib2.Request(url, data=mydata)
urllib2.urlopen(req)
}}}
msg136060 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-05-16 01:51
The bug was about sending Binary "data" via httplib. In the example you
wrote, you are sending a unicode "url" and experiencing a failure for
certain examples.

In the 2.7, the urls should be str type, we don't have function to
deal with unicode url separately and sending of unicode url is an
error.
msg138056 - (view) Author: Ion Scerbatiuc (cyrus) Date: 2011-06-10 08:48
Hello,

I would like to subscribe to the issue. The problem seems to indeed exist in Python 2.7. 

What I'm doing is to proxy HTTP requests (using Django) and the PUT / POST requests are working fine on Python 2.6 but are failing on 2.7 with the error already presented in the first bero's message.

I'm using httplib2 and the code looks like

{{
http = httplib2.Http(timeout=5)
try:
    resp, content = http.request(
        request_url, method,
        body=body, headers=headers)
    except (AttributeError, httplib.ResponseNotReady), e:
        # ...
}}

Body is the result of the Django's request.read() which in fact contain the binary data from the PUT / POST request.

The full stack trace is:
{{
Traceback:
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  111.                         response = callback(request, *callback_args, **callback_kwargs)
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/auth.py" in _decorated_view
  33.         return view(request, *args, **kwargs)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
  39.         resp = view_func(*args, **kwargs)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
  52.         return view_func(*args, **kwargs)
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/views.py" in dispatch
  55.         original=request.build_absolute_uri())
File "/home/cyrus/workspace/macleod/apps/macleod/macleod/handlers/its.py" in proxy
  51.                 body=body, headers=headers)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in request
  1129.                     (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in _request
  901.         (response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/home/cyrus/workspace/macleod/ve/lib/python2.7/site-packages/httplib2/__init__.py" in _conn_request
  862.                 conn.request(method, request_uri, body, headers)
File "/usr/local/lib/python2.7/httplib.py" in request
  941.         self._send_request(method, url, body, headers)
File "/usr/local/lib/python2.7/httplib.py" in _send_request
  975.         self.endheaders(body)
File "/usr/local/lib/python2.7/httplib.py" in endheaders
  937.         self._send_output(message_body)
File "/usr/local/lib/python2.7/httplib.py" in _send_output
  795.             msg += message_body
}}
msg138059 - (view) Author: Ion Scerbatiuc (cyrus) Date: 2011-06-10 09:06
Hello again,

After some digging I found that the "real" problem was because the provided URL was a unicode string and the concatenation was failing. Maybe this is not a big deal, but I think we should least do a proper assertion for the provided URL or some other checks, because the error encountered is at least confusing.
msg138128 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-06-10 18:00
Ion, as you perhaps noticed, posting a message 'subscribes' you (puts you on the nosy list). One can also add oneself as nosy with the little button under it without saying anything.

This should be reopened because we do not change error classes in bugfix releases (ie, future 2.7.x releases) because that can break code -- unless the error class is contrary to the doc and we decide the doc is right. Even as a new feature, a change is dubious and carefully to be considered.
msg138142 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-06-10 23:45
should *not* be reopened. Sorry for omission of 'not'.
msg138908 - (view) Author: Sorin Sbarnea (ssbarnea) * Date: 2011-06-24 11:00
Can we get more info regarding resolution of this bug. Due to this bug httplib cannot be used anymore to send binary data. This bug breaks other modules, one example being PyAMF (that does communicate only using binary data).
msg138914 - (view) Author: Sorin Sbarnea (ssbarnea) * Date: 2011-06-24 11:25
There is another problem that makes the problem even more critical. OS X 10.7 does include Python 2.7.1 as the *default* interpreter.

So we'll need both a fix for the future and an workaround.

BTW, the hack with sys.setdefaultencoding cannot be used if you really send binary data.
msg138952 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-06-24 15:27
Sorin, can you please open another report with more details and how some condition in httplib breaks PyAMF. We will see through that it is fixed. Commenting on an invalid closed issue is confusing.
msg138954 - (view) Author: Sorin Sbarnea (ssbarnea) * Date: 2011-06-24 15:40
Added as bug http://bugs.python.org/issue12398
msg138972 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-06-24 18:22
Soren, this is an issue that claimed a bug, not a bug. The resolution is that the claim appears false because the problem arose from using unicode rather than bytes url. The error message may be confusing, but the error class cannot be changed. Senthil says that he *did* send non-ascii bytes with no problem.
msg139110 - (view) Author: Sorin Sbarnea (ssbarnea) * Date: 2011-06-25 19:54
I have to add some details here. First, this bug has nothing to do with the URL, it does reproduce for normal urls.


Still the problem with the line: "msg += message_body" is quite complex when combined with Python 2.7:

type(msg) is unicode
type(message_body) is str ... even if I tried to manually force Python for use bytes. It seams that in 2.7 bytes are alias to str. Due to this the code will fail to run only on 2.7 because it will try to convert  binary data to unicode string.

If I am not mistaken the code will work with Python 3.x, because there bytes() are not str().
msg139116 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-06-25 20:22
Hi Sorin,
On Sat, Jun 25, 2011 at 07:54:24PM +0000, sorin wrote:
> type(message_body) is str ... even if I tried to manually force
> Python for use bytes. It seams that in 2.7 bytes are alias to str.
> Due to this the code will fail to run only on 2.7 because it will
> try to convert  binary data to unicode string.

Bit confused here. You encode the string to bytes and decode it back
to str. One does not force bytes to str. And if you use, str or bytes
consistently in Python2.7 you wont face the problem.
History
Date User Action Args
2011-07-04 16:16:35eric.araujosetmessages: - msg134878
2011-06-25 20:22:42orsenthilsetmessages: + msg139116
2011-06-25 19:54:23ssbarneasetmessages: + msg139110
2011-06-24 18:22:47terry.reedysetmessages: + msg138972
2011-06-24 15:40:23ssbarneasetmessages: + msg138954
2011-06-24 15:27:10orsenthilsetmessages: + msg138952
2011-06-24 11:25:05ssbarneasetmessages: + msg138914
2011-06-24 11:00:41ssbarneasetnosy: + ssbarnea
messages: + msg138908
2011-06-10 23:45:50terry.reedysetmessages: + msg138142
2011-06-10 18:00:15terry.reedysetmessages: + msg138128
2011-06-10 09:06:29cyrussetmessages: + msg138059
2011-06-10 08:48:13cyrussetnosy: + cyrus
messages: + msg138056
2011-05-16 01:51:57orsenthilsetmessages: + msg136060
2011-05-15 18:29:59Jiri.Horkysetfiles: + data
nosy: + Jiri.Horky
messages: + msg136043

2011-05-06 13:08:19orsenthilsetstatus: open -> closed
messages: + msg135290

assignee: orsenthil
resolution: works for me
stage: test needed -> resolved
2011-04-30 16:30:08eric.araujosetnosy: + eric.araujo
messages: + msg134878
2011-04-30 06:57:04berosetmessages: + msg134840
2011-04-30 00:11:31terry.reedysetnosy: + terry.reedy

messages: + msg134824
stage: test needed
2011-04-21 17:37:57santoso.wijayasetnosy: + santoso.wijaya
2011-04-21 13:44:13ezio.melottisetnosy: + orsenthil, ezio.melotti
2011-04-21 13:42:33berocreate