Message 88830 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gvanrossum
Recipients	ajaksu2, amaury.forgeotdarc, collinwinter, eric.smith, ezio.melotti, gvanrossum, jafo, jimjjewett, lemburg, orivej, pitrou, rhettinger
Date	2009-06-03.21:31:24
SpamBayes Score	6.903479e-10
Marked as misclassified	No
Message-id	<ca471dc20906031431h79b8e6ia9ac8db280d2d079@mail.gmail.com>
In-reply-to	<1244061829.5505.23.camel@localhost>

Content
On Wed, Jun 3, 2009 at 1:41 PM, Antoine Pitrou <report@bugs.python.org> wrote: > Apart from the example Marc-André just posted (and which is a 0.0.1 > proof of concept he apparently just wrote), the number of users is, > AFAICT, zero. IIUC Marc-Andre extracted that from a larger code base (MX) which he owns and has been maintaining for a decade or so. > Unless there's some closed source extension which happens to extend > unicode as a C subtype. I believe part of MX is closed source. > Now, as for easing the subclassing of unicode in C, there are probably > several possibilities which range from devising a clever set of macros > to abusing the ob_size field for a tagged pointer. People who really > care should do a concrete proposal (and I don't know who these people > are, apart from Marc-André). Not really if the core code uses a macro that depends on the layout of the object (i.e. the data immediately following the header, like old 8-bit strings), unless you change the core (or the macro) to only use this if the type matches exactly, and for subtypes use a more expensive API. But that would slow down unnecessarily for subclasses written in Python (of which there are plenty). But I would like to point out that few people if any have ever complained about the contiguous allocation for 8-bit strings in Python [0-2].x. And we certainly wouldn't have given in. Now that Unicode is no longer some fancy-schmancy advanced concept but the basis for all Python string processing I think we should apply the same policy.

On Wed, Jun 3, 2009 at 1:41 PM, Antoine Pitrou <report@bugs.python.org> wrote:
> Apart from the example Marc-André just posted (and which is a 0.0.1
> proof of concept he apparently just wrote), the number of users is,
> AFAICT, zero.

IIUC Marc-Andre extracted that from a larger code base (MX) which he
owns and has been maintaining for a decade or so.

> Unless there's some closed source extension which happens to extend
> unicode as a C subtype.

I believe part of MX is closed source.

> Now, as for easing the subclassing of unicode in C, there are probably
> several possibilities which range from devising a clever set of macros
> to abusing the ob_size field for a tagged pointer. People who really
> care should do a concrete proposal (and I don't know who these people
> are, apart from Marc-André).

Not really if the core code uses a macro that depends on the layout of
the object (i.e. the data immediately following the header, like old
8-bit strings), unless you change the core (or the macro) to only use
this if the type matches exactly, and for subtypes use a more
expensive API. But that would slow down unnecessarily for subclasses
written in Python (of which there are plenty).

But I would like to point out that few people if any have ever
complained about the contiguous allocation for 8-bit strings in Python
[0-2].x. And we certainly wouldn't have given in. Now that Unicode is
no longer some fancy-schmancy advanced concept but the basis for *all*
Python string processing I think we should apply the same policy.

History
Date	User	Action	Args
2009-06-03 21:31:26	gvanrossum	set	recipients: + gvanrossum, lemburg, collinwinter, rhettinger, jafo, jimjjewett, amaury.forgeotdarc, pitrou, eric.smith, ajaksu2, orivej, ezio.melotti
2009-06-03 21:31:25	gvanrossum	link	issue1943 messages
2009-06-03 21:31:24	gvanrossum	create