Title: array: Deprecate 'u' type in array module
Type: Stage:
Components: Library (Lib) Versions: Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: inada.naoki, ncoghlan, serhiy.storchaka, skrah, terry.reedy
Priority: normal Keywords: patch

Created on 2019-03-15 05:50 by inada.naoki, last changed 2020-04-23 00:47 by inada.naoki.

Pull Requests
URL Status Linked Edit
PR 12497 closed inada.naoki, 2019-03-22 10:43
Messages (12)
msg337967 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-03-15 05:50
The doc says:

> 'u' will be removed together with the rest of the Py_UNICODE API.
> Deprecated since version 3.3, will be removed in version 4.0.

But DeprecationWarning is not raised yet.  Let's raise it.

* 3.8 -- PendingDeprecationWarning
* 3.9 -- DeprecationWarning
* 4.0 or 3.10 -- Remove it.
msg338031 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-03-15 20:45
'4.0' is a stand-in for 'sometime after', scheduled for Jan 2020.  A Pending... for 3.8.0, scheduled for Oct 2019, seems reasonable to me.  Perhaps we should have a pydev discussion for the general issue of post 2.7 removals of already deprecated items.
msg338595 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-03-22 09:13

We may able to convert 'u' to wchar_t to int32_t and un-deprecate it.
msg338598 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-03-22 10:49
I found converting Py_UNICODE to Py_UCS4 wad happened, and reverted.
msg338607 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-03-22 14:44
I think the problem is still whether to use 'u' == UCS2 and 'w' == UCS4 like in PEP-3118.

For the project I'm currently working on I'd need these for buffer exports:

>>> from xnd import *
>>> x = xnd(["abc", "xyz"], dtype="fixed_string(10, 'utf16')")
>>> y = xnd(["abc", "xyz"], dtype="fixed_string(10, 'utf32')")
>>> memoryview(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: type is not supported by the buffer protocol

The use case is not an array that represents a single utf16 string, but
an array *of* fixed strings with different encodings.

So x would be exported with format 'u' and y with format 'w'.
msg338608 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-03-22 15:01
Just to demonstrate what the format would look like, this is working
for an array of fixed bytes:

>>> x = xnd([b"123", b"23456"], dtype="fixed_bytes(size=10)")
>>> memoryview(x).format

So the formats in the previous message would be '10u' and '10w'.
msg338609 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-03-22 15:03
array('u') is not tied with the legacy Unicode C API. It is possible to use the modern wchar_t based Unicode C API for it. See issue36346.

There are benefits from getting rid of the legacy Unicode C API, but not from array('u').
msg338610 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-03-22 15:10
array() uses struct module characters except for 'u'. PEP-3118 was 
supposed to be implemented in the struct module.

If array() continues to use 'u', the only sensible thing would be
to remove (or rename) 'a', 'u' and 'w' from PEP-3118.
msg338611 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019-03-22 15:25
The funny thing is that array() already knows this:

>>> import array
>>> a = array.array("u", "123")
>>> memoryview(a).format
msg367000 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2020-04-22 13:16
I closed GH-12497 (Py_UNICODE -> Py_UCS4).
I created GH-19653 (Py_UNICODE -> wchar_t) instead.
msg367044 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-04-22 19:15
Should this issue be closed, possibly as superseded by #36346, the issue for the new PR-19653?
msg367065 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2020-04-23 00:47
While array('u') doesn't use deprecated API with GH-19653, I still don't like 'u' because:

* I don't have any reason to use platform dependant wchar_t. [1]
* It is not consistent with PEP-3118.


How about this plan?

* Add 'w' for Py_UCS4.
* Deprecate 'u', and remove it in the future.
Date User Action Args
2020-04-23 00:47:43inada.naokisetmessages: + msg367065
2020-04-22 19:15:06terry.reedysetmessages: + msg367044
2020-04-22 13:16:26inada.naokisetmessages: + msg367000
2019-03-22 15:56:45vstinnersetnosy: - vstinner
2019-03-22 15:25:27skrahsetmessages: + msg338611
2019-03-22 15:10:07skrahsetmessages: + msg338610
2019-03-22 15:03:02serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg338609
2019-03-22 15:01:08skrahsetmessages: + msg338608
2019-03-22 14:44:21skrahsetmessages: + msg338607
2019-03-22 11:26:51inada.naokisetnosy: + ncoghlan, vstinner, skrah

stage: patch review ->
title: Deprecate 'u' type in array module -> array: Deprecate 'u' type in array module
2019-03-22 10:49:09inada.naokisetmessages: + msg338598
2019-03-22 10:43:34inada.naokisetkeywords: + patch
stage: patch review
pull_requests: + pull_request12447
2019-03-22 09:13:15inada.naokisetmessages: + msg338595
2019-03-15 20:45:47terry.reedysetnosy: + terry.reedy
messages: + msg338031
2019-03-15 05:50:02inada.naokicreate