classification
Title: array.array of UCS2 values
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, christian.heimes, inada.naoki, loewis, ncoghlan, ronaldoussoren, skrah
Priority: high Keywords:

Created on 2012-06-08 09:22 by ronaldoussoren, last changed 2019-04-13 11:46 by inada.naoki.

Messages (7)
msg162520 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2012-06-08 09:22
I'm sometimes using an array.array with format character "u" as a writable backing store for buffers shared with platform APIs that access buffers of UCS2 values. This works fine in python 3.2 and earlier with a ucs2 build of python, but no longer works with python 3.3 because the "u" character explicitly selects a UCS4 representation in that version.

An example of how I use this is using PyObjC on MacOSX, for example:

b = array.array('u', "hello world")
s = CFStringCreateMutableWithExternalCharactersNoCopy(                      
        None, b, len(b), len(b), kCFAllocatorNull)

"s" now refers to a mutable Objective-C string that uses "b" as its backing store.

It would be nice if there were a format code that would allow me to do this with Python 3.3, for example   b = array.array("U", ...)


(BTW. I'm sorry if this is a duplicate, searching for "array.array" on the tracker results in a lot of hits, most of which have nothing to do with the array module)
msg162521 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-06-08 09:46
See also #13072 and the discussion starting at:

http://mail.python.org/pipermail/python-dev/2012-March/117390.html

I think the priority should be "high", since the current behavior
doesn't preserve the status quo. Also, PEP-3118 suggests 'u' for
UCS2 and 'w' for UCS4.
msg162522 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-06-08 09:48
Hmm, obviously the discussion starts here:

http://mail.python.org/pipermail/python-dev/2012-March/117376.html
msg168374 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-08-16 11:47
This one should be fixed by #13072. Could you check again?
msg168376 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-16 11:54
As Stefan noted, so long as Py_UNICODE is 16 bits in the Mac OS X builds, then this should now be back to the 3.2 behaviour.
msg168378 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-08-16 12:07
It's not back to the 3.2 behavior. In 3.3, Py_UNICODE is always equal to wchar_t, which is a 4-byte type on Darwin. However, CFString is based on UniChar, which is a 2-byte type.

That this worked in 3.2 was by accident - it would work only in "narrow" builds. Python's configure in 3.2 and before wouldn't default to using wchar_t on Darwin since it didn't consider wchar_t "usable", which in turn happened because wchar_t is signed on Darwin, but Py_UNICODE was understood to be unsigned.

Since it's too late to add an 'U' code to 3.3, as a work-around, you would have to use a 'H' array, and initialize it with map(ord, the_string)).

Chances are good that a proper UCS-2 array code gets added to 3.4.
msg168379 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2012-08-16 12:09
Py_UNICODE is an typedef for wchar_t and that type is 4 bytes long:

>>> a.tobytes()
b'h\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00 \x00\x00\x00w\x00\x00\x00o\x00\x00\x00r\x00\x00\x00l\x00\x00\x00d\x00\x00\x00'
>>> a = array.array('u', 'bar')
>>> a.tobytes()
b'b\x00\x00\x00a\x00\x00\x00r\x00\x00\x00'
>>> len(a.tobytes())
12
>>> 

This is with a checkout that was created yesterday.

The issue is not resolved, there now is no way to easily create a UCS2 buffer; while there was in earlier releases of Python (with the default narrow build)
History
Date User Action Args
2019-04-13 11:46:41inada.naokisetnosy: + inada.naoki
2013-07-07 16:07:20christian.heimessetnosy: + christian.heimes

versions: + Python 3.4, - Python 3.3
2012-08-16 19:44:37Arfreversetnosy: + Arfrever
2012-08-16 12:09:00ronaldoussorensetmessages: + msg168379
2012-08-16 12:07:14loewissetnosy: + loewis
messages: + msg168378
2012-08-16 11:54:36ncoghlansetpriority: low -> high
nosy: + ncoghlan
messages: + msg168376

2012-08-16 11:47:11skrahsetmessages: + msg168374
2012-06-08 09:48:01skrahsetmessages: + msg162522
2012-06-08 09:46:30skrahsetnosy: + skrah
messages: + msg162521
2012-06-08 09:22:50ronaldoussorencreate