This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author stutzbach
Recipients stutzbach
Date 2010-05-21.12:43:57
SpamBayes Score 0.00045202
Marked as misclassified No
Message-id <1274445840.82.0.641304229797.issue8781@psf.upfronthosting.co.za>
In-reply-to
Content
If ./configure detects that the system's wchar_t type is compatible, it will define "#define PY_UNICODE_TYPE wchar_t" and enable certain optimizations when converting between Py_UNICODE and wchar_t (i.e., it can just do a memcpy).

Right now, ./configure considers wchar_t to be compatible if it is the same bit-width as Py_UNICODE and if wchar_t is unsigned.  In practice, that means Python only uses wchar_t on Windows, which uses an unsigned 16-bit wchar_t.  On Linux, wchar_t is 32-bit and signed.

In the original Unicode implementation for Python, Py_UNICODE was always 16-bit.  I believe the "unsigned" requirement heralds back to that time.  A 32-bit wchar_t gives us plenty of space to hold the maximum Unicode code point of 0x10FFFF, regardless of whether wchar_t is signed or unsigned.

I believe the condition could be relaxed to the following:
- wchar_t must be the same bit-width as Py_UNICODE, and
- if wchar_t is 16-bit, it must be unsigned

That would allow a UCS4 Python to use wchar_t on Linux.

I experimented by manually tweaking my pyconfig.h to treat Linux's signed 32-bit wchar_t as compatible.  The unit test suite encountered no problems.

However, it's quite possible that I'm missing some important detail here.  Someone familiar with the guts of Python's Unicode implementation  will presumably have a much better idea of whether I have this right or not. ;-)
History
Date User Action Args
2010-05-21 12:44:00stutzbachsetrecipients: + stutzbach
2010-05-21 12:44:00stutzbachsetmessageid: <1274445840.82.0.641304229797.issue8781@psf.upfronthosting.co.za>
2010-05-21 12:43:59stutzbachlinkissue8781 messages
2010-05-21 12:43:58stutzbachcreate