classification
Title: struct.pack fails first time with unicode fmt
Type: enhancement Stage: resolved
Components: Unicode Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: ezio.melotti, mark.dickinson, meador.inge, miwa, pitrou, python-dev, serhiy.storchaka, vajrasky
Priority: normal Keywords: patch

Created on 2013-09-27 08:28 by miwa, last changed 2013-12-15 09:31 by vajrasky. This issue is now closed.

Files
File name Uploaded Description Edit
handle_ascii_range_unicode_in_struct_pack.patch vajrasky, 2013-09-30 15:29 review
handle_ascii_range_unicode_in_struct_pack_v2.patch vajrasky, 2013-10-01 03:52 review
struct_unicode_fmt.patch serhiy.storchaka, 2013-10-01 08:27 review
fix_error_message_struct_Struct.patch vajrasky, 2013-12-14 04:13 review
Messages (15)
msg198464 - (view) Author: Musashi Tamura (miwa) Date: 2013-09-27 08:28
C:\>python
Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import struct
>>> struct.pack(u'B',1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Struct() argument 1 must be string, not unicode
>>> struct.pack(u'B',1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Struct() argument 1 must be string, not unicode
>>> struct.pack('B',1) # this is ok
'\x01'
>>> struct.pack(u'B',1)
'\x01'
msg198708 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-30 15:29
Here's the preliminary patch. I am assuming that we should accept unicode argument not reject it straight away.

Python3 does that.
>>> import struct
>>> struct.pack('b', 3)
b'\x03'
>>> struct.pack(b'b', 3)
b'\x03'
>>> struct.pack(b'\xff', 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: bad char in struct format
msg198714 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-30 18:23
Struct constructor accepts only str and not unicode. But struct.pack() uses caching and it found Struct('B') in the cache (because u'B' and 'B' are equal and have same hash).

I doubt we should fix this. Adding support of Unicode argument is new feature.
msg198716 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-09-30 18:40
Either way, the runtime inconsistency is a bug. Since we shouldn't break existing code, I would vote for always allowing unicode format strings, rather than always disallowing them.

Another argument is that str and unicode are generally substituible in 2.x when they are pure ASCII (which they are here).
msg198747 - (view) Author: Musashi Tamura (miwa) Date: 2013-09-30 23:29
Thanks for feedback.

I think it should be fixed with allowing unicode. 
"from __future__ import unicode_literals" may break existing code.
msg198749 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-10-01 03:52
Refactor test to clear the cache before using unicode format.
msg198754 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-01 08:27
struct.Struct() should be changed instead of struct.pack(). Here is a patch.
msg198758 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-10-01 08:57
Serhiy, you don't want to clear the cache in the test to simulate the bug?

struct._clearcache()

Other than that, should we raise struct.error instead of ValueError? Right now, the current behaviour in python 2.7 is:

>>> struct.pack('\x80', 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: bad char in struct format

But you are right. Struct() should have been changed instead of struct.pack.
msg198759 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-10-01 09:03
Nevermind about my comment about clearing cache. It only happens if we use struct.pack not struct.Struct.
msg198769 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-01 12:55
> Other than that, should we raise struct.error instead of ValueError?

Python 3 raises UnicodeEncodeError. And Python 2 raises UnicodeEncodeError when coerce non-ASCII unicode to str.
msg203063 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-16 16:38
Any comments?
msg205571 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-12-08 15:45
New changeset 42d3afd29460 by Serhiy Storchaka in branch '2.7':
Issue #19099: The struct module now supports Unicode format strings.
http://hg.python.org/cpython/rev/42d3afd29460
msg205572 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-08 15:48
Fixed. Thank you Musashi for your report.
msg206165 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-12-14 04:13
Okay, I think the error message can be improved because in Python 2.7 we differentiate very clearly the string from the unicode.

>>> import struct
>>> struct.Struct(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Struct() argument 1 must be string, not int

But you can give unicode, right?

>>> struct.Struct(u'b')
<Struct object at 0x1f484b8>

This is consistent with other example:

>>> " cutecat ".strip(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: strip arg must be None, str or unicode

What do you say, Serhiy?

Here is the patch.
msg206220 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-12-15 09:31
Nevermind, I already created this issue. http://bugs.python.org/issue19985
History
Date User Action Args
2013-12-15 09:31:39vajraskysetmessages: + msg206220
2013-12-14 04:13:50vajraskysetfiles: + fix_error_message_struct_Struct.patch

messages: + msg206165
2013-12-08 15:48:19serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg205572

stage: patch review -> resolved
2013-12-08 15:45:38python-devsetnosy: + python-dev
messages: + msg205571
2013-11-16 16:38:52serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg203063
2013-10-01 12:55:19serhiy.storchakasetmessages: + msg198769
2013-10-01 09:03:56vajraskysetmessages: + msg198759
2013-10-01 08:57:48vajraskysetmessages: + msg198758
2013-10-01 08:27:30serhiy.storchakasetfiles: + struct_unicode_fmt.patch
type: behavior -> enhancement
messages: + msg198754

stage: patch review
2013-10-01 03:52:42vajraskysetfiles: + handle_ascii_range_unicode_in_struct_pack_v2.patch

messages: + msg198749
2013-09-30 23:29:09miwasetmessages: + msg198747
2013-09-30 18:40:39pitrousetmessages: + msg198716
2013-09-30 18:23:39serhiy.storchakasetnosy: + pitrou, mark.dickinson, meador.inge, serhiy.storchaka
messages: + msg198714
2013-09-30 15:29:25vajraskysetfiles: + handle_ascii_range_unicode_in_struct_pack.patch

nosy: + vajrasky
messages: + msg198708

keywords: + patch
2013-09-27 08:28:07miwacreate