This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients ebfe, lemburg, pitrou, vstinner
Date 2008-12-27.13:13:13
SpamBayes Score 5.851486e-12
Marked as misclassified No
Message-id <495629E8.1080408@egenix.com>
In-reply-to <1230382698.86.0.846541744822.issue4757@psf.upfronthosting.co.za>
Content
On 2008-12-27 13:58, STINNER Victor wrote:
> Python 2.x allows to encode any byte string (str) and ASCII unicode 
> string (unicode):
> 
> $ python
> Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>>> import zlib
>>>> zlib.compress('abc')
> "x\x9cKLJ\x06\x00\x02M\x01'"
>>>> zlib.compress(u'abc')
> "x\x9cKLJ\x06\x00\x02M\x01'"
>>>> zlib.compress(u'abc\xe9')
> ...
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' ...
> 
> I'm not sure that this behaviour was really wanted become the 
> decompress operation is not symetric (the result type is always byte 
> string):
> 
> $ python
> Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>>> import zlib
>>>> zlib.decompress("x\x9cKLJ\x06\x00\x02M\x01'")
> 'abc'
> 

I don't see a problem with this. The fact that Python 2.x also
accepts Unicode ASCII strings where strings are normally expected
is intended to help with the migration to Unicode, so the above
is expected.

zlib itself doesn't care about whether the data to be encoded
is text or bytes.

In Python 3.x, it's probably better to use bytes throughout the
API.
History
Date User Action Args
2008-12-27 13:13:14lemburgsetrecipients: + lemburg, pitrou, vstinner, ebfe
2008-12-27 13:13:13lemburglinkissue4757 messages
2008-12-27 13:13:13lemburgcreate