New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
struct.pack fails first time with unicode fmt #63298
Comments
C:\>python
Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import struct
>>> struct.pack(u'B',1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Struct() argument 1 must be string, not unicode
>>> struct.pack(u'B',1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Struct() argument 1 must be string, not unicode
>>> struct.pack('B',1) # this is ok
'\x01'
>>> struct.pack(u'B',1)
'\x01' |
Here's the preliminary patch. I am assuming that we should accept unicode argument not reject it straight away. Python3 does that.
>>> import struct
>>> struct.pack('b', 3)
b'\x03'
>>> struct.pack(b'b', 3)
b'\x03'
>>> struct.pack(b'\xff', 3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
struct.error: bad char in struct format |
Struct constructor accepts only str and not unicode. But struct.pack() uses caching and it found Struct('B') in the cache (because u'B' and 'B' are equal and have same hash). I doubt we should fix this. Adding support of Unicode argument is new feature. |
Either way, the runtime inconsistency is a bug. Since we shouldn't break existing code, I would vote for always allowing unicode format strings, rather than always disallowing them. Another argument is that str and unicode are generally substituible in 2.x when they are pure ASCII (which they are here). |
Thanks for feedback. I think it should be fixed with allowing unicode. |
Refactor test to clear the cache before using unicode format. |
struct.Struct() should be changed instead of struct.pack(). Here is a patch. |
Serhiy, you don't want to clear the cache in the test to simulate the bug? struct._clearcache() Other than that, should we raise struct.error instead of ValueError? Right now, the current behaviour in python 2.7 is: >>> struct.pack('\x80', 3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
struct.error: bad char in struct format But you are right. Struct() should have been changed instead of struct.pack. |
Nevermind about my comment about clearing cache. It only happens if we use struct.pack not struct.Struct. |
Python 3 raises UnicodeEncodeError. And Python 2 raises UnicodeEncodeError when coerce non-ASCII unicode to str. |
Any comments? |
New changeset 42d3afd29460 by Serhiy Storchaka in branch '2.7': |
Fixed. Thank you Musashi for your report. |
Okay, I think the error message can be improved because in Python 2.7 we differentiate very clearly the string from the unicode. >>> import struct
>>> struct.Struct(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Struct() argument 1 must be string, not int But you can give unicode, right? >>> struct.Struct(u'b')
<Struct object at 0x1f484b8> This is consistent with other example: >>> " cutecat ".strip(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: strip arg must be None, str or unicode What do you say, Serhiy? Here is the patch. |
Nevermind, I already created this issue. http://bugs.python.org/issue19985 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: