Created on 2011-06-24 15:39 by sorin, last changed 2012-11-18 15:23 by eric.araujo.
|urllib2.patch||haypo, 2011-09-22 23:54|
|msg138953 - (view)||Author: sorin (sorin)||Date: 2011-06-24 15:39|
It looks that Python 2.7 changes did induce some important bugs into httplib due to to implicit str-unicode encoding/decoding. One clear example is that PyAMF library doesn't work with Python 2.7 because it is not able to generate binary data POST responses. Please check http://dev.pyamf.org/ticket/823 (partial trackback, full in above bug) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 937, in endheaders self._send_output(message_body) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 795, in _send_output msg += message_body
|msg138971 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2011-06-24 18:18|
If this worked in 2.6 and fails in 2.7, it would probably be helpful if we can determine what change broke it. I believe hg has some sort of 'bisect' support that might make this not too onerous to do. Senthil (or someone) will eventually either figure out the problem or do the bisect, but if you want to speed things along you could do the bisect.
|msg138975 - (view)||Author: Terry J. Reedy (terry.reedy) *||Date: 2011-06-24 19:07|
A crash is a segfault or equivalent. Python 2.6 only gets security fixes. PyAMF does not run on Python 3. Hence a problem with PyAMF is no evidence of a problem with 3.x. Separate tests/examples would be needed. Changes are not bugs unless they introduce a discrepancy between code and doc. Please post a self-contained example that exhibits the behavior that you consider a problem. It should not just be a repeat of #11898. Then quote the section of the docs that says (or suggests) that the behavior should be different from what it is. The PyAMF site says "PyAMF requires Python 2.4 or newer. Python 3.0 isn’t supported yet." Since 3.0 was deprecated 2 years ago with the release of 3.1, I strongly suspect that the statement was written before 2.7 was released a year ago. Library developers should not make open ended promises like 'or newer' -- certainly not without testing and revising as necessary with each new Python version. If PyAMF was broken by planned, announced, and documented changed in 2.7, that is too bad, but it is a year too late to change 2.7. Like all new versions, it had public beta and release candidate phases when people could test their packages and make comments. I believe what David is getting at is finding out for sure whether the change was intended or not. The quote from the link you provide >msg += message_body appears to be the programming error, already explained in #11898, where msg is unicode and message_body is bytes with non-ascii bytes. >>> u'a'+'\xf0' UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) This is exactly the same error message that followed in the link, except that the position of the non-ascii byte. The fix is to not do the above.
|msg138977 - (view)||Author: Terry J. Reedy (terry.reedy) *||Date: 2011-06-24 19:47|
Did things like "u'a'+'\xf0'" work in 2.6- (with implicit latin-1 decoding)? (I do not have 2.6 loaded.) The doc for seq+seq (concatenation) in the language reference section 5.6. Binary arithmetic operations says that both sequences must be the same type. In the Library manual, 5.6. Sequence Types, the footnote for seq+seq makes no mention of a special exception for (some) mixed unicode/byte concatenations. I think footnote 6 about string+string should both note the exception and its limitation (and if the limitation was changed in 2.7, say so). (In any case, the exception was removed in Py3, so *this* is not a Py3 issue.)
|msg138989 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2011-06-24 21:41|
Many applications and libraries say "Python X.Y or newer", and it is one of the strengths of Python that this will often be true. That's what our backward compatibility policy is about, and that's why the fact that it isn't true for 2.x->3.x is such a big deal. As far as I can see there was no deprecation involved here, so "announced" is not a factor, I think. We won't be sure until we know what changed. All that said, it is quite possible (even likely, given #11898) that the pyamf code contains a bug and only worked by accident, and is now failing because some other bug in Python was fixed. Again, we won't know until we have a complete diagnosis of the cause of the change in behavior.
|msg139103 - (view)||Author: sorin (sorin)||Date: 2011-06-25 17:10|
You are right, I debugged the problem a little more and discovered at least one bug in PyAMF. Still, I was surprised to find out something very strange, it look that BytesIO.getvalue() does return `str` even if the documentation says it does return `bytes`. Should I file another bug? Python 2.7.1 (r271:86832, Jun 13 2011, 14:28:51) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import io >>> a = io.BytesIO() >>> a <_io.BytesIO object at 0x10f9453b0> >>> a.getvalue() '' >>> print type(a.getvalue()) <type 'str'> >>>
|msg139107 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2011-06-25 18:26|
No, that's correct. In python 2.x the 'bytes' stuff is just a portability aid. In 2.x, bytes and string are the same type. In Python 3 they aren't, so by using the 'fake' classes in python2 you can often make your code work correctly on both python2 and python3. So, can this issue be closed, or do you think there is still might be a valid backward compatibility issue?
|msg139108 - (view)||Author: Terry J. Reedy (terry.reedy) *||Date: 2011-06-25 18:31|
In 2.7, bytes is an alias for str to aid porting to 3.x. >>> bytes is str True >>> type(bytes()) <type 'str'> I suspect the doc uses 'bytes' rather than 'str' because it was backported from 3.x. Perhaps it should be changed but I do not know the policy on using the alias in 2.6/7 docs. I presume in 2.7 io.BytesIO is similar, if not equivalent to io.StringIO, but it is not an alias. Again, it was added so 2.7 code could use a bytes memory buffer that would remain bytes in 3.x and not become unicode text, like StringIO does.
|msg139265 - (view)||Author: sorin (sorin)||Date: 2011-06-27 12:59|
Here is a test file that will replicate the problem, I added it as a gist so it could support contributions ;) Py <2.7 works Py ==2.7 fails Py >=3.0 works after minor changes required by py3k https://gist.github.com/1047551
|msg139268 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2011-06-27 13:37|
rdmurray>python2.6 py27-str-unicode-bytes.py type(b)=<type 'str'> Traceback (most recent call last): File "py27-str-unicode-bytes.py", line 17, in <module> unicode_str += b # this line will throw UnicodeDecodeError on Python 2.7 UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 4: ordinal not in range(128) And of course it doesn't work earlier than 2.6 since the b'' notation isn't supported before 2.6.
|msg139269 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2011-06-27 13:41|
To clarify: if I convert your program to using strings pre2.6, it still fails with a UnicodeDecodeError, as one would expect. bytes are strings in 2.x.
|msg139271 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2011-06-27 13:48|
And finally, your program does *not* succeed on Python3, except in the trivial sense that on python3 you never attempt to add the string and bytes data. It is exactly this kind of programming error that Python3 is designed to avoid: instead of sometimes getting a UnicodeDecodeError depending on what is in the "bytes" string, you *always* get a "Can't convert 'bytes' object to str implicitly" error when you attempt to add string and bytes.
|msg139272 - (view)||Author: sorin (sorin)||Date: 2011-06-27 13:53|
Right, so you have some binary data and you want to sent it to `httplib`. This worked in the past when `msg` was a non-unicode string, but starting with Python 2.7 this became an unicode string, so when you try to append the `message` if will fail because it will try to decode it.
|msg139283 - (view)||Author: R. David Murray (r.david.murray) *||Date: 2011-06-27 14:36|
But senthil already demonstrated in the previous issue that it does not become a unicode string unless you use unicode input. You also claimed that your test program here succeeded in python2.6, but it does not. This casts a little bit of doubt on your claim that there is a regression. Can you produce a minimal example of using httplib that demonstrates the regression?
|msg139304 - (view)||Author: sorin (sorin)||Date: 2011-06-27 15:54|
I updated the gist and made a minimal test https://gist.github.com/1047551
|msg144427 - (view)||Author: Adam Cohen (Adam.Cohen)||Date: 2011-09-22 22:11|
I encountered this issue as well. "params" is simply a bytestring, with no encoding. Workaround/proper solution is to cast the string as a bytearray with bytearray(params).
|msg144433 - (view)||Author: STINNER Victor (haypo) *||Date: 2011-09-22 23:54|
Here is a patch for httplib encoding HTTP headers to ISO-8859-1, as done in Python 3 (see HTTPConnection.putheader() from http.client). urllib is not affected by this issue because it does already encode Unicode, but encode to ASCII instead of ISO-8859-1. Related commit in Python 3: changeset: 67720:b3cadf5cf742 user: Armin Ronacher <firstname.lastname@example.org> date: Sat Jan 22 13:44:22 2011 +0000 files: Lib/http/client.py Lib/test/test_httpservers.py Misc/NEWS description: To match the behaviour of HTTP server, the HTTP client library now also encodes headers with iso-8859-1 (latin1) encoding. It was already doing that for incoming headers which makes this behaviour now consistent in both incoming and outgoing direction.
|msg175727 - (view)||Author: Gregory P. Smith (gregory.p.smith) *||Date: 2012-11-17 08:09|
I'm running into this on 2.7.3 with code that worked fine on 2.6.5. The problem appears to be caused by a 'Host' http header that has a unicode type for the hostname:port value. Encoding header values makes sense though I haven't yet examined the patch in detail.
messages: + msg175727
keywords: + patch
messages: + msg144433
messages: + msg144427
|2011-08-07 06:07:40||orsenthil||set||assignee: orsenthil|
nosy: + orsenthil
|2011-06-27 15:54:00||sorin||set||messages: + msg139304|
|2011-06-27 14:36:25||r.david.murray||set||messages: + msg139283|
|2011-06-27 13:53:51||sorin||set||messages: + msg139272|
|2011-06-27 13:48:16||r.david.murray||set||messages: + msg139271|
|2011-06-27 13:41:29||r.david.murray||set||messages: + msg139269|
|2011-06-27 13:37:24||r.david.murray||set||messages: + msg139268|
|2011-06-27 12:59:56||sorin||set||messages: + msg139265|
|2011-06-25 18:31:39||terry.reedy||set||messages: + msg139108|
|2011-06-25 18:26:37||r.david.murray||set||messages: + msg139107|
|2011-06-25 17:10:24||sorin||set||messages: + msg139103|
|2011-06-24 21:41:59||r.david.murray||set||messages: + msg138989|
|2011-06-24 19:47:35||terry.reedy||set||messages: + msg138977|
|2011-06-24 19:07:58||terry.reedy||set||stage: test needed|
type: crash -> behavior
versions: - Python 3.1, Python 3.2, Python 3.3, Python 3.4
messages: + msg138975
messages: + msg138971