Message 97553 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Rhamphoryncus, ajaksu2, amaury.forgeotdarc, benjamin.peterson, collinwinter, eric.smith, ezio.melotti, gvanrossum, jafo, jimjjewett, lemburg, orivej, pitrou, rhettinger, terry.reedy
Date	2010-01-10.21:59:25
SpamBayes Score	6.811085e-11
Marked as misclassified	No
Message-id	<4B4A4DBC.2090009@egenix.com>
In-reply-to	<1263153135.09.0.0405786287299.issue1943@psf.upfronthosting.co.za>

Content
Adam Olsen wrote: > > Adam Olsen <rhamph@gmail.com> added the comment: > > Points against the subclassing argument: > > * We have a null-termination invariant. For byte strings this was part of the public API, and I'm not sure that's changed for unicode strings; aren't you arguing that we should maximize how much of our implementation is a public API? This prevents lazy slicing. Base type Unicode buffers end with a null-Py_UNICODE termination, but this is not used anywhere, AFAIK. We could probably remove that overallocation at some point. There's no such thing as a null-termination invariant for Unicode. > * subclassing unicode so you can change the meaning of the fields (ie allocating your own buffer) is a gross hack. It relies far too much on fine details of the implementation and is fragile (what if you miss the dummy byte needed by fastsearch?) Most of the possible options could be, if they function correctly, applied directly to the basetype as a patch, so it's moot. Actually, Unicode objects were designed to be subclassable right from the start and adjusting the buffer to point e.g. into some other already allocated string was too. I removed this feature from Fredrik's type implementation with the intent to readd it later on as subclass. See the prototype implementation of such a subclass uniref that I've written to show how easy it is to add a subclass which can be used to slice large Unicode objects without having to reallocate new buffers all the time. BTW, I'm not aware of any changes to the PyUnicodeObject by some fastsearch implementation. Could you point me to this ?

Adam Olsen wrote:
> 
> Adam Olsen <rhamph@gmail.com> added the comment:
> 
> Points against the subclassing argument:
> 
> * We have a null-termination invariant.  For byte strings this was part of the public API, and I'm not sure that's changed for unicode strings; aren't you arguing that we should maximize how much of our implementation is a public API?  This prevents lazy slicing.

Base type Unicode buffers end with a null-Py_UNICODE termination,
but this is not used anywhere, AFAIK. We could probably remove
that overallocation at some point.

There's no such thing as a null-termination invariant for Unicode.

> * subclassing unicode so you can change the meaning of the fields (ie allocating your own buffer) is a gross hack.  It relies far too much on fine details of the implementation and is fragile (what if you miss the dummy byte needed by fastsearch?)  Most of the possible options could be, if they function correctly, applied directly to the basetype as a patch, so it's moot.

Actually, Unicode objects were designed to be subclassable right
from the start and adjusting the buffer to point e.g. into some
other already allocated string was too. I removed this feature from
Fredrik's type implementation with the intent to readd it later on as
subclass.

See the prototype implementation of such a subclass uniref that I've
written to show how easy it is to add a subclass which can be used
to slice large Unicode objects without having to reallocate new
buffers all the time.

BTW, I'm not aware of any changes to the PyUnicodeObject by some
fastsearch implementation. Could you point me to this ?

History
Date	User	Action	Args
2010-01-10 21:59:27	lemburg	set	recipients: + lemburg, gvanrossum, collinwinter, rhettinger, terry.reedy, jafo, jimjjewett, amaury.forgeotdarc, Rhamphoryncus, pitrou, eric.smith, ajaksu2, benjamin.peterson, orivej, ezio.melotti
2010-01-10 21:59:26	lemburg	link	issue1943 messages
2010-01-10 21:59:26	lemburg	create