Message 61263 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	edemaine
Recipients
Date	2006-11-19.19:47:30
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Currently, urllib.urlopen(...).read() returns a string, not a unicode object. Ditto for urllib2. No attempt is made to decode the data using the charset encoding specified in the header ....info()['Content-Type']. Is it fair to assume that, in Python 3K, urllib....read() will return (Unicode) strings instead of bytes, automatically decoding according to the charset? Do you think we could expose this futuristic functionality in Python 2? I doubt we could change read() without breaking a lot of existing code that already does this decoding (e.g., http://zesty.ca/python/scrape.py), but perhaps a 'uread()' method could return a unicode object instead of a string.

Currently, urllib.urlopen(...).read() returns a string, not a unicode object.  Ditto for urllib2.  No attempt is made to decode the data using the charset encoding specified in the header ....info()['Content-Type'].

Is it fair to assume that, in Python 3K, urllib....read() will return (Unicode) strings instead of bytes, automatically decoding according to the charset?

Do you think we could expose this futuristic functionality in Python 2?  I doubt we could change read() without breaking a lot of existing code that already does this decoding (e.g., http://zesty.ca/python/scrape.py), but perhaps a 'uread()' method could return a unicode object instead of a string.

History
Date	User	Action	Args
2008-01-20 09:59:51	admin	link	issue1599329 messages
2008-01-20 09:59:51	admin	create