Message 105297 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	gvanrossum, lemburg, loewis, r.david.murray, scoder, stutzbach, vstinner, zooko
Date	2010-05-08.15:16:06
SpamBayes Score	0.00012545392
Marked as misclassified	No
Message-id	<4BE58035.9040604@egenix.com>
In-reply-to	<r2zeae285401005080801kfb7fe0d4t2dc7e71660cc8fb3@mail.gmail.com>

Content
Daniel Stutzbach wrote: > > Daniel Stutzbach <daniel@stutzbachenterprises.com> added the comment: > > On Sat, May 8, 2010 at 5:03 AM, Marc-Andre Lemburg > <report@bugs.python.org> wrote: >> If you can propose a different method of reliably protecting against >> mixed Unicode build module loads, that would be great. We could then >> get rid off the wrapping altogether. > > The following code will cause a link error 100% of the time iff > there's a mismatch: > > #ifndef Py_UNICODE_WIDE > #define _Py_Unicode_Build_Symbol _Py_UCS2_Build_Symbol > #else > #define _Py_Unicode_Build_Symbol _Py_UCS4_Build_Symbol > #endif > extern int _Py_Unicode_Build_Symbol; /* Defined in unicodeobject.c / > static int _Py_Unicode_Build_Symbol_Check = &_Py_Unicode_Build_Symbol; > > In practice, I'd surrounded it with a bunch #ifdefs to disable the > "defined but not used" warning in gcc and MSVC. Are you sure this doesn't get optimized away in practice ? FWIW: I still like the import logic solution better, since that will make it possible for the user to see a truly useful error message rather than just some missing symbol linker error. >> Please note that UCS2 and UCS4 builds of Python are different in >> more ways than just the underlying Py_UNICODE type. E.g. UCS2 builds >> use surrogates when converting between Unicode and bytes which >> UCS4 don't, sys.maxunicode is different, range checks use different >> bounds, unichr() behaves differently, etc. etc. > > That's true, but those differences are visible from pure-Python code > as well aren't they? Sure, though, I don't see how this relates to C code relying on these details, e.g. a C extension will probably use different conversion code depending on whether UCS2 or UCS4 is compatible with some external library, etc.

Daniel Stutzbach wrote:
> 
> Daniel Stutzbach <daniel@stutzbachenterprises.com> added the comment:
> 
> On Sat, May 8, 2010 at 5:03 AM, Marc-Andre Lemburg
> <report@bugs.python.org> wrote:
>> If you can propose a different method of reliably protecting against
>> mixed Unicode build module loads, that would be great. We could then
>> get rid off the wrapping altogether.
> 
> The following code will cause a link error 100% of the time iff
> there's a mismatch:
> 
> #ifndef Py_UNICODE_WIDE
> #define _Py_Unicode_Build_Symbol _Py_UCS2_Build_Symbol
> #else
> #define _Py_Unicode_Build_Symbol _Py_UCS4_Build_Symbol
> #endif
> extern int _Py_Unicode_Build_Symbol; /* Defined in unicodeobject.c */
> static int *_Py_Unicode_Build_Symbol_Check = &_Py_Unicode_Build_Symbol;
> 
> In practice, I'd surrounded it with a bunch #ifdefs to disable the
> "defined but not used" warning in gcc and MSVC.

Are you sure this doesn't get optimized away in practice ?

FWIW: I still like the import logic solution better, since that
will make it possible for the user to see a truly useful
error message rather than just some missing symbol linker
error.

>> Please note that UCS2 and UCS4 builds of Python are different in
>> more ways than just the underlying Py_UNICODE type. E.g. UCS2 builds
>> use surrogates when converting between Unicode and bytes which
>> UCS4 don't, sys.maxunicode is different, range checks use different
>> bounds, unichr() behaves differently, etc. etc.
> 
> That's true, but those differences are visible from pure-Python code
> as well aren't they?

Sure, though, I don't see how this relates to C code relying
on these details, e.g. a C extension will probably use different
conversion code depending on whether UCS2 or UCS4 is compatible
with some external library, etc.

History
Date	User	Action	Args
2010-05-08 15:16:09	lemburg	set	recipients: + lemburg, gvanrossum, loewis, zooko, scoder, vstinner, stutzbach, r.david.murray
2010-05-08 15:16:07	lemburg	link	issue8654 messages
2010-05-08 15:16:07	lemburg	create