New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
array.array of UCS2 values #59240
Comments
I'm sometimes using an array.array with format character "u" as a writable backing store for buffers shared with platform APIs that access buffers of UCS2 values. This works fine in python 3.2 and earlier with a ucs2 build of python, but no longer works with python 3.3 because the "u" character explicitly selects a UCS4 representation in that version. An example of how I use this is using PyObjC on MacOSX, for example: b = array.array('u', "hello world")
s = CFStringCreateMutableWithExternalCharactersNoCopy(
None, b, len(b), len(b), kCFAllocatorNull) "s" now refers to a mutable Objective-C string that uses "b" as its backing store. It would be nice if there were a format code that would allow me to do this with Python 3.3, for example b = array.array("U", ...) (BTW. I'm sorry if this is a duplicate, searching for "array.array" on the tracker results in a lot of hits, most of which have nothing to do with the array module) |
See also bpo-13072 and the discussion starting at: http://mail.python.org/pipermail/python-dev/2012-March/117390.html I think the priority should be "high", since the current behavior |
Hmm, obviously the discussion starts here: http://mail.python.org/pipermail/python-dev/2012-March/117376.html |
This one should be fixed by bpo-13072. Could you check again? |
As Stefan noted, so long as Py_UNICODE is 16 bits in the Mac OS X builds, then this should now be back to the 3.2 behaviour. |
It's not back to the 3.2 behavior. In 3.3, Py_UNICODE is always equal to wchar_t, which is a 4-byte type on Darwin. However, CFString is based on UniChar, which is a 2-byte type. That this worked in 3.2 was by accident - it would work only in "narrow" builds. Python's configure in 3.2 and before wouldn't default to using wchar_t on Darwin since it didn't consider wchar_t "usable", which in turn happened because wchar_t is signed on Darwin, but Py_UNICODE was understood to be unsigned. Since it's too late to add an 'U' code to 3.3, as a work-around, you would have to use a 'H' array, and initialize it with map(ord, the_string)). Chances are good that a proper UCS-2 array code gets added to 3.4. |
Py_UNICODE is an typedef for wchar_t and that type is 4 bytes long: >>> a.tobytes()
b'h\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00 \x00\x00\x00w\x00\x00\x00o\x00\x00\x00r\x00\x00\x00l\x00\x00\x00d\x00\x00\x00'
>>> a = array.array('u', 'bar')
>>> a.tobytes()
b'b\x00\x00\x00a\x00\x00\x00r\x00\x00\x00'
>>> len(a.tobytes())
12
>>> This is with a checkout that was created yesterday. The issue is not resolved, there now is no way to easily create a UCS2 buffer; while there was in earlier releases of Python (with the default narrow build) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: