Message 105312 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	gvanrossum, lemburg, loewis, r.david.murray, scoder, stutzbach, vstinner, zooko
Date	2010-05-08.16:35:34
SpamBayes Score	1.082903e-05
Marked as misclassified	No
Message-id	<4BE592D4.9030807@egenix.com>
In-reply-to	<m2reae285401005080848r8c64cb57m2b74b3001cbf1f06@mail.gmail.com>

Content
Daniel Stutzbach wrote: > > Daniel Stutzbach <daniel@stutzbachenterprises.com> added the comment: > > On Sat, May 8, 2010 at 10:16 AM, Marc-Andre Lemburg > <report@bugs.python.org> wrote: >> Are you sure this doesn't get optimized away in practice ? > > I'm sure it doesn't get optimized away by gcc 4.3, where I tested it. :) > >> Sure, though, I don't see how this relates to C code relying >> on these details, e.g. a C extension will probably use different >> conversion code depending on whether UCS2 or UCS4 is compatible >> with some external library, etc. > > Can you give an example? > > All of the examples I can think of either: > - poke into PyUnicodeObject's internals, > - call a Python function that exposes Py_UNICODE or PyUnicodeObject > > I'm explicitly trying to protect those two cases. It's quite possible > that I'm missing something, but I can't think of any other unsafe way > for a C extension to convert a Python Unicode object to a byte string. One of the more important cases you are missing is the argument parser in Python: Py_UNICODE *x; Py_ssize_t y; PyArg_ParseTuple(args, "u#", &x, &y); This uses the native Py_UNICODE type, but doesn't rely on any Unicode APIs. Same for the tuple builder: args = Py_BuildValue("(u#)", x, y);

Daniel Stutzbach wrote:
> 
> Daniel Stutzbach <daniel@stutzbachenterprises.com> added the comment:
> 
> On Sat, May 8, 2010 at 10:16 AM, Marc-Andre Lemburg
> <report@bugs.python.org> wrote:
>> Are you sure this doesn't get optimized away in practice ?
> 
> I'm sure it doesn't get optimized away by gcc 4.3, where I tested it. :)
> 
>> Sure, though, I don't see how this relates to C code relying
>> on these details, e.g. a C extension will probably use different
>> conversion code depending on whether UCS2 or UCS4 is compatible
>> with some external library, etc.
> 
> Can you give an example?
> 
> All of the examples I can think of either:
> - poke into PyUnicodeObject's internals,
> - call a Python function that exposes Py_UNICODE or PyUnicodeObject
> 
> I'm explicitly trying to protect those two cases.  It's quite possible
> that I'm missing something, but I can't think of any other unsafe way
> for a C extension to convert a Python Unicode object to a byte string.

One of the more important cases you are missing is the
argument parser in Python:

Py_UNICODE *x;
Py_ssize_t y;
PyArg_ParseTuple(args, "u#", &x, &y);

This uses the native Py_UNICODE type, but doesn't rely on any
Unicode APIs.

Same for the tuple builder:

args = Py_BuildValue("(u#)", x, y);

History
Date	User	Action	Args
2010-05-08 16:35:36	lemburg	set	recipients: + lemburg, gvanrossum, loewis, zooko, scoder, vstinner, stutzbach, r.david.murray
2010-05-08 16:35:34	lemburg	link	issue8654 messages
2010-05-08 16:35:34	lemburg	create