Author stutzbach
Recipients gvanrossum, scoder, stutzbach, zooko
Date 2010-05-07.21:54:28
SpamBayes Score 2.82661e-09
Marked as misclassified No
Message-id <1273269271.17.0.191357629461.issue8654@psf.upfronthosting.co.za>
In-reply-to
Content
I've been thinking about this a bit more.  There are three types of symbols in unicodeobject.h:

1. Functions that are always safe to use
2. Functions that are only safe if the module is compiled with the same Unicode settings as Python
3. Structures and macros that are only safe if the module is compiled with the same Unicode settings as Python

The functions in #2 will generate a link error if the module actually uses them.  

We can add some symbols next to the structures and macros (#3), such that there will always be a link error if the Unicode settings are mismatched.  However, we can't tell if the module actually uses the structure or not.

The hard question is: what should be declared by default?

Option 1: Make Unicode-agnosticism the default and force anyone who cares about the Unicode setting to include a separate header.  If they don't include that header, they can only call safe functions and can't poke at PyUnicodeObject's internals.  If they include the header, their module will always generate a link failure if the Unicode settings are mismatched.  (Guido proposed this solution in the python-ideas thread)

Option 2: Make Unicode-dependence the default.  If the compilation settings don't match, force a link failure.  Allow extension authors to define a flag (Py_UNICODE_AGNOSTIC) before including Python.h to avoid defining any unsafe functions, structures, or macros.  In practice, this is what Python 3 does today, except there's currently no way to declare Unicode-agnosticism.

Option 3: Go for a middle ground.  Modules are Unicode agnostic by default, unless they call a non-agnostic function (which will cause a link error if there's a mismatch).  If they want to poke directly into PyUnicodeObject, they still need to include a separate header.  I fear that this is the worst of both worlds, though.

The more I think about it, the more I like the first option.

Maybe I should bring this up on capi-sig and try to gather a consensus?
History
Date User Action Args
2010-05-07 21:54:31stutzbachsetrecipients: + stutzbach, gvanrossum, zooko, scoder
2010-05-07 21:54:31stutzbachsetmessageid: <1273269271.17.0.191357629461.issue8654@psf.upfronthosting.co.za>
2010-05-07 21:54:29stutzbachlinkissue8654 messages
2010-05-07 21:54:28stutzbachcreate