classification
Title: create_unicode_buffer() fails on non-BMP strings on Windows
Type: behavior Stage: resolved
Components: ctypes Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Leonard de Ruijter, ZackerySpytz, amaury.forgeotdarc, belopolsky, eryksun, gergely.erdelyi, meador.inge, miss-islington, vstinner
Priority: normal Keywords: patch

Created on 2013-12-02 19:36 by gergely.erdelyi, last changed 2019-06-14 16:54 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 14081 merged ZackerySpytz, 2019-06-14 14:33
PR 14087 merged miss-islington, 2019-06-14 15:55
PR 14088 merged miss-islington, 2019-06-14 15:55
Messages (9)
msg205045 - (view) Author: Gergely Erdélyi (gergely.erdelyi) Date: 2013-12-02 19:36
create_unicode_buffer() fails on Windows if the initializer string contains unicode code points outside of the Basic Multilingual Plane and an explicit length is not specified.

The problem appears to be rooted in the fact that, since PEP 393, len() returns the number of code points, which does not always correspond to the number of 16-bit wchar words needed for the encoding on Windows. Because of that, the preallocated c_wchar buffer will be too short for the UTF-16 string.

The following small snippet demonstrates the problem:

from ctypes import create_unicode_buffer
b = create_unicode_buffer("\U00028318\U00028319")
print(b)

  File "c:\Python33\lib\ctypes\__init__.py", line 294, in create_unicode_buffer
    buf.value = init
ValueError: string too long
msg228405 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-10-03 22:43
I can confirm that this problem still exists so can someone take a look please, thanks.
msg228424 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2014-10-04 02:24
When sizeof(c_wchar) == 2, it can just count the number of non-BMP ordinals in the string. Another approach would be to use size = pythonapi.PyUnicode_AsWideChar(init, None, 0), but then the whole function may as well be implemented in the _ctypes extension module.
msg330583 - (view) Author: Leonard de Ruijter (Leonard de Ruijter) Date: 2018-11-28 09:18
I'm still able to reproduce this issue with ctypes under Python 3.7.0
msg345596 - (view) Author: Zackery Spytz (ZackerySpytz) * (Python triager) Date: 2019-06-14 14:36
I have created a pull request for this issue. Please take a look.
msg345601 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-14 15:54
New changeset 9765efcb39fc03d5b1abec3924388974470a8bd5 by Victor Stinner (Zackery Spytz) in branch 'master':
bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081)
https://github.com/python/cpython/commit/9765efcb39fc03d5b1abec3924388974470a8bd5
msg345609 - (view) Author: miss-islington (miss-islington) Date: 2019-06-14 16:30
New changeset 0b592d513b073cd3a4ba7632907c25b8282f15ce by Miss Islington (bot) in branch '3.7':
bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081)
https://github.com/python/cpython/commit/0b592d513b073cd3a4ba7632907c25b8282f15ce
msg345610 - (view) Author: miss-islington (miss-islington) Date: 2019-06-14 16:43
New changeset b0f6fa8d7d4c6d8263094124df9ef9cf816bbed6 by Miss Islington (bot) in branch '3.8':
bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081)
https://github.com/python/cpython/commit/b0f6fa8d7d4c6d8263094124df9ef9cf816bbed6
msg345611 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-14 16:54
Thanks Zackery Spytz for the fix. Thanks Gergely Erdélyi for the bug report! Sorry for the long delay.
History
Date User Action Args
2019-07-10 05:13:07eryksunlinkissue37536 superseder
2019-06-14 16:54:21vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg345611

stage: patch review -> resolved
2019-06-14 16:43:26miss-islingtonsetmessages: + msg345610
2019-06-14 16:30:30miss-islingtonsetnosy: + miss-islington
messages: + msg345609
2019-06-14 15:55:09miss-islingtonsetpull_requests: + pull_request13944
2019-06-14 15:55:01miss-islingtonsetpull_requests: + pull_request13943
2019-06-14 15:54:04vstinnersetmessages: + msg345601
2019-06-14 14:36:53ZackerySpytzsetnosy: + ZackerySpytz

messages: + msg345596
versions: + Python 3.9, - Python 3.4, Python 3.5, Python 3.6
2019-06-14 14:33:31ZackerySpytzsetkeywords: + patch
stage: patch review
pull_requests: + pull_request13940
2018-11-28 13:20:06josh.rsetkeywords: - 3.2regression
2018-11-28 13:18:55josh.rsetkeywords: + 3.2regression
versions: + Python 3.6, Python 3.7, Python 3.8
2018-11-28 12:01:58BreamoreBoysetnosy: - BreamoreBoy
2018-11-28 09:18:48Leonard de Ruijtersetnosy: + Leonard de Ruijter
messages: + msg330583
2014-10-04 02:24:52eryksunsetnosy: + eryksun
messages: + msg228424
2014-10-03 22:43:19BreamoreBoysetnosy: + BreamoreBoy

messages: + msg228405
versions: + Python 3.5, - Python 3.3
2013-12-02 22:37:50pitrousetnosy: + amaury.forgeotdarc, belopolsky, vstinner, meador.inge
2013-12-02 20:14:41serhiy.storchakasettype: crash -> behavior
2013-12-02 19:36:45gergely.erdelyicreate