Message 80018 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	lemburg, mark.dickinson, rpetrov, vstinner
Date	2009-01-17.15:19:14
SpamBayes Score	3.666578e-07
Marked as misclassified	No
Message-id	<4971F6F1.4080001@egenix.com>
In-reply-to	<200901171359.55129.victor.stinner@haypocalc.com>

Content
On 2009-01-17 14:00, STINNER Victor wrote: > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> Looks pretty good at first glance, except that it seems that the UTF-32 to >> UTF-16 translation is skipped if HAVE_USABLE_WCHAR_T is defined. Is that >> deliberate? > > #ifdef HAVE_USABLE_WCHAR_T > memcpy(unicode->str, w, size * sizeof(wchar_t)); > #else > ... > #endif > > I understand this code as: sizeof(wchar_t) == sizeof(Py_UNICODE). If I > misunderstood the code, it's a a heap overflow :-) So there is no not > conversion from UTF-32 to UTF-16 using memcpy if HAVE_USABLE_WCHAR_T is > defined, right? If HAVE_USABLE_WCHAR_T is defined, Py_UNICODE is defined as wchar_t, so a memcpy can be used. Note that this does not provide any information about sizeof(wchar_t), e.g. with GLIBC, wchar_t is 4 bytes. MS C lib defines it as 2 bytes. That said, if Py_UNICODE is the same as wchar_t, no conversion is necessary and that's why the function simply copies over the data.

On 2009-01-17 14:00, STINNER Victor wrote:
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> Looks pretty good at first glance, except that it seems that the UTF-32 to
>> UTF-16 translation is skipped if HAVE_USABLE_WCHAR_T is defined.  Is that
>> deliberate?
> 
> #ifdef HAVE_USABLE_WCHAR_T
>     memcpy(unicode->str, w, size * sizeof(wchar_t));
> #else
>     ...
> #endif
> 
> I understand this code as: sizeof(wchar_t) == sizeof(Py_UNICODE). If I 
> misunderstood the code, it's a a heap overflow :-) So there is no not 
> conversion from UTF-32 to UTF-16 using memcpy if HAVE_USABLE_WCHAR_T is 
> defined, right?

If HAVE_USABLE_WCHAR_T is defined, Py_UNICODE is defined as wchar_t,
so a memcpy can be used. Note that this does not provide any information
about sizeof(wchar_t), e.g. with GLIBC, wchar_t is 4 bytes. MS C lib defines
it as 2 bytes.

That said, if Py_UNICODE is the same as wchar_t, no conversion is
necessary and that's why the function simply copies over the data.

History
Date	User	Action	Args
2009-01-17 15:19:16	lemburg	set	recipients: + lemburg, mark.dickinson, vstinner, rpetrov
2009-01-17 15:19:15	lemburg	link	issue4474 messages
2009-01-17 15:19:14	lemburg	create