Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array.array of UCS2 values #59240

Open
ronaldoussoren opened this issue Jun 8, 2012 · 7 comments
Open

array.array of UCS2 values #59240

ronaldoussoren opened this issue Jun 8, 2012 · 7 comments
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@ronaldoussoren
Copy link
Contributor

BPO 15035
Nosy @loewis, @ronaldoussoren, @ncoghlan, @tiran, @methane, @skrah

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2012-06-08.09:22:50.343>
labels = ['extension-modules', 'type-bug']
title = 'array.array of UCS2 values'
updated_at = <Date 2019-04-13.11:46:41.362>
user = 'https://github.com/ronaldoussoren'

bugs.python.org fields:

activity = <Date 2019-04-13.11:46:41.362>
actor = 'methane'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Extension Modules']
creation = <Date 2012-06-08.09:22:50.343>
creator = 'ronaldoussoren'
dependencies = []
files = []
hgrepos = []
issue_num = 15035
keywords = []
message_count = 7.0
messages = ['162520', '162521', '162522', '168374', '168376', '168378', '168379']
nosy_count = 7.0
nosy_names = ['loewis', 'ronaldoussoren', 'ncoghlan', 'christian.heimes', 'Arfrever', 'methane', 'skrah']
pr_nums = []
priority = 'high'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue15035'
versions = ['Python 3.4']

@ronaldoussoren
Copy link
Contributor Author

I'm sometimes using an array.array with format character "u" as a writable backing store for buffers shared with platform APIs that access buffers of UCS2 values. This works fine in python 3.2 and earlier with a ucs2 build of python, but no longer works with python 3.3 because the "u" character explicitly selects a UCS4 representation in that version.

An example of how I use this is using PyObjC on MacOSX, for example:

b = array.array('u', "hello world")
s = CFStringCreateMutableWithExternalCharactersNoCopy(                      
        None, b, len(b), len(b), kCFAllocatorNull)

"s" now refers to a mutable Objective-C string that uses "b" as its backing store.

It would be nice if there were a format code that would allow me to do this with Python 3.3, for example b = array.array("U", ...)

(BTW. I'm sorry if this is a duplicate, searching for "array.array" on the tracker results in a lot of hits, most of which have nothing to do with the array module)

@ronaldoussoren ronaldoussoren added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Jun 8, 2012
@skrah
Copy link
Mannequin

skrah mannequin commented Jun 8, 2012

See also bpo-13072 and the discussion starting at:

http://mail.python.org/pipermail/python-dev/2012-March/117390.html

I think the priority should be "high", since the current behavior
doesn't preserve the status quo. Also, PEP-3118 suggests 'u' for
UCS2 and 'w' for UCS4.

@skrah
Copy link
Mannequin

skrah mannequin commented Jun 8, 2012

Hmm, obviously the discussion starts here:

http://mail.python.org/pipermail/python-dev/2012-March/117376.html

@skrah
Copy link
Mannequin

skrah mannequin commented Aug 16, 2012

This one should be fixed by bpo-13072. Could you check again?

@ncoghlan
Copy link
Contributor

As Stefan noted, so long as Py_UNICODE is 16 bits in the Mac OS X builds, then this should now be back to the 3.2 behaviour.

@loewis
Copy link
Mannequin

loewis mannequin commented Aug 16, 2012

It's not back to the 3.2 behavior. In 3.3, Py_UNICODE is always equal to wchar_t, which is a 4-byte type on Darwin. However, CFString is based on UniChar, which is a 2-byte type.

That this worked in 3.2 was by accident - it would work only in "narrow" builds. Python's configure in 3.2 and before wouldn't default to using wchar_t on Darwin since it didn't consider wchar_t "usable", which in turn happened because wchar_t is signed on Darwin, but Py_UNICODE was understood to be unsigned.

Since it's too late to add an 'U' code to 3.3, as a work-around, you would have to use a 'H' array, and initialize it with map(ord, the_string)).

Chances are good that a proper UCS-2 array code gets added to 3.4.

@ronaldoussoren
Copy link
Contributor Author

Py_UNICODE is an typedef for wchar_t and that type is 4 bytes long:

>>> a.tobytes()
b'h\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00 \x00\x00\x00w\x00\x00\x00o\x00\x00\x00r\x00\x00\x00l\x00\x00\x00d\x00\x00\x00'
>>> a = array.array('u', 'bar')
>>> a.tobytes()
b'b\x00\x00\x00a\x00\x00\x00r\x00\x00\x00'
>>> len(a.tobytes())
12
>>> 

This is with a checkout that was created yesterday.

The issue is not resolved, there now is no way to easily create a UCS2 buffer; while there was in earlier releases of Python (with the default narrow build)

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants