Message 78360 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	ebfe, lemburg, pitrou, vstinner
Date	2008-12-27.13:13:13
SpamBayes Score	5.851486e-12
Marked as misclassified	No
Message-id	<495629E8.1080408@egenix.com>
In-reply-to	<1230382698.86.0.846541744822.issue4757@psf.upfronthosting.co.za>

Content
On 2008-12-27 13:58, STINNER Victor wrote: > Python 2.x allows to encode any byte string (str) and ASCII unicode > string (unicode): > > $ python > Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40) >>>> import zlib >>>> zlib.compress('abc') > "x\x9cKLJ\x06\x00\x02M\x01'" >>>> zlib.compress(u'abc') > "x\x9cKLJ\x06\x00\x02M\x01'" >>>> zlib.compress(u'abc\xe9') > ... > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' ... > > I'm not sure that this behaviour was really wanted become the > decompress operation is not symetric (the result type is always byte > string): > > $ python > Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40) >>>> import zlib >>>> zlib.decompress("x\x9cKLJ\x06\x00\x02M\x01'") > 'abc' > I don't see a problem with this. The fact that Python 2.x also accepts Unicode ASCII strings where strings are normally expected is intended to help with the migration to Unicode, so the above is expected. zlib itself doesn't care about whether the data to be encoded is text or bytes. In Python 3.x, it's probably better to use bytes throughout the API.

On 2008-12-27 13:58, STINNER Victor wrote:
> Python 2.x allows to encode any byte string (str) and ASCII unicode 
> string (unicode):
> 
> $ python
> Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>>> import zlib
>>>> zlib.compress('abc')
> "x\x9cKLJ\x06\x00\x02M\x01'"
>>>> zlib.compress(u'abc')
> "x\x9cKLJ\x06\x00\x02M\x01'"
>>>> zlib.compress(u'abc\xe9')
> ...
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' ...
> 
> I'm not sure that this behaviour was really wanted become the 
> decompress operation is not symetric (the result type is always byte 
> string):
> 
> $ python
> Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>>> import zlib
>>>> zlib.decompress("x\x9cKLJ\x06\x00\x02M\x01'")
> 'abc'
> 

I don't see a problem with this. The fact that Python 2.x also
accepts Unicode ASCII strings where strings are normally expected
is intended to help with the migration to Unicode, so the above
is expected.

zlib itself doesn't care about whether the data to be encoded
is text or bytes.

In Python 3.x, it's probably better to use bytes throughout the
API.

History
Date	User	Action	Args
2008-12-27 13:13:14	lemburg	set	recipients: + lemburg, pitrou, vstinner, ebfe
2008-12-27 13:13:13	lemburg	link	issue4757 messages
2008-12-27 13:13:13	lemburg	create