Message 106522 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	lemburg, loewis, pitrou, stutzbach
Date	2010-05-26.11:21:12
SpamBayes Score	0.0019020438
Marked as misclassified	No
Message-id	<4BFD0426.8090805@egenix.com>
In-reply-to	<1274447945.54.0.456108066249.issue8781@psf.upfronthosting.co.za>

Content
Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > > The problem with a signed Py_UNICODE is implicit sign extension (rather than zero extension) in some conversions, for example from "char" or "unsigned char" to "Py_UNICODE". The effects could go anywhere from incorrect results to plain crashes. Not only in our code, but in C extensions relying on the unsignedness of Py_UNICODE. Right. The Unicode code was written with an unsigned data type in mind (range checks, conversions, etc.). We'd have to do some serious code review to allow switching to a signed data type. > Is there a way to enable those optimizations while keeping an unsigned Py_UNICODE type? It seems Py_UNICODE doesn't have to be typedef'ed to wchar_t, it can be defined to be an unsigned integer of the same width. Or would it break some part of the C standard? The memcpy optimizations don't rely on the unsignedness of wchar_t, so they would work just as well.

Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
> The problem with a signed Py_UNICODE is implicit sign extension (rather than zero extension) in some conversions, for example from "char" or "unsigned char" to "Py_UNICODE". The effects could go anywhere from incorrect results to plain crashes. Not only in our code, but in C extensions relying on the unsignedness of Py_UNICODE.

Right.

The Unicode code was written with an unsigned data type in mind (range
checks, conversions, etc.). We'd have to do some serious code review to
allow switching to a signed data type.

> Is there a way to enable those optimizations while keeping an unsigned Py_UNICODE type? It seems Py_UNICODE doesn't have to be typedef'ed to wchar_t, it can be defined to be an unsigned integer of the same width. Or would it break some part of the C standard?

The memcpy optimizations don't rely on the unsignedness of
wchar_t, so they would work just as well.

History
Date	User	Action	Args
2010-05-26 11:21:15	lemburg	set	recipients: + lemburg, loewis, pitrou, stutzbach
2010-05-26 11:21:13	lemburg	link	issue8781 messages
2010-05-26 11:21:12	lemburg	create