Author vstinner
Recipients dabeaz, mark.dickinson, r.david.murray, rhettinger, vstinner
Date 2010-12-28.01:13:36
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1293498819.06.0.567356166964.issue10783@psf.upfronthosting.co.za>
In-reply-to
Content
This "feature" was introduced in a big commit from Guido van Rossum (made before Python 3.0): r55500. The changelog is strange because it starts with "Make test_zipfile pass. The zipfile module now does all I/O in binary mode using bytes." but ends with "The _struct needed a patch to support bytes, str8 and str for the 's' and 'p' formats.". Why was _struct patched at the same time?

Implicit conversion bytes and str is a very bad idea, it is the root of all confusion related to Unicode. The experience with Python 2 demonstrated that it should be changed, and it was changed in Python 3.0. But "Python 3.0" is a big project, it has many modules. Some modules were completly broken in Python 3.0, it works better with 3.1, and we hope that it will be even better with 3.2.

Attached patch removes the implicit conversion for 'c', 's' and 'p' formats. I did a similar change in ctypes, 5 months ago: issue #8966.

If a program written for Python 3.1 fails because of the patch, it can use explicit conversion to stay compatible with 3.1 and 3.2 (patched). I think that it's better to use explicit conversion.

Implicit conversion on 'c' format is really weird and it was not documented correctly: the note (1) is attached to "b" format, not to the "c" format. Example:

   >>> struct.pack('c', 'é')
   struct.error: char format requires bytes or string of length 1
   >>> len('é')
   1

There is also a length issue with the s format: struct.pack() truncates unicode string to a length in bytes, not in character, it is confusiong.

  >>> struct.pack('2s', 'ha')
   b'ha'
   >>> struct.pack('2s', 'hé')
   b'h\xc3'
   >>> struct.pack('3s', 'hé')
   b'h\xc3\xa9'

Finally, I don't like implicit conversion from unicode to bytes on pack, because it's not symmetrical.

   >>> struct.pack('3s', 'hé')
   b'h\xc3\xa9'
   >>> struct.unpack('3s', b'h\xc3\xa9')
   (b'h\xc3\xa9',)

(str -> pack() -> unpack() -> bytes)
History
Date User Action Args
2010-12-28 01:13:39vstinnersetrecipients: + vstinner, rhettinger, mark.dickinson, r.david.murray, dabeaz
2010-12-28 01:13:39vstinnersetmessageid: <1293498819.06.0.567356166964.issue10783@psf.upfronthosting.co.za>
2010-12-28 01:13:37vstinnerlinkissue10783 messages
2010-12-28 01:13:36vstinnercreate