This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients Rhamphoryncus, ajaksu2, amaury.forgeotdarc, benjamin.peterson, collinwinter, eric.smith, ezio.melotti, gvanrossum, jafo, jimjjewett, lemburg, orivej, pitrou, rhettinger, terry.reedy
Date 2010-01-10.21:59:25
SpamBayes Score 6.81109e-11
Marked as misclassified No
Message-id <4B4A4DBC.2090009@egenix.com>
In-reply-to <1263153135.09.0.0405786287299.issue1943@psf.upfronthosting.co.za>
Content
Adam Olsen wrote:
> 
> Adam Olsen <rhamph@gmail.com> added the comment:
> 
> Points against the subclassing argument:
> 
> * We have a null-termination invariant.  For byte strings this was part of the public API, and I'm not sure that's changed for unicode strings; aren't you arguing that we should maximize how much of our implementation is a public API?  This prevents lazy slicing.

Base type Unicode buffers end with a null-Py_UNICODE termination,
but this is not used anywhere, AFAIK. We could probably remove
that overallocation at some point.

There's no such thing as a null-termination invariant for Unicode.

> * subclassing unicode so you can change the meaning of the fields (ie allocating your own buffer) is a gross hack.  It relies far too much on fine details of the implementation and is fragile (what if you miss the dummy byte needed by fastsearch?)  Most of the possible options could be, if they function correctly, applied directly to the basetype as a patch, so it's moot.

Actually, Unicode objects were designed to be subclassable right
from the start and adjusting the buffer to point e.g. into some
other already allocated string was too. I removed this feature from
Fredrik's type implementation with the intent to readd it later on as
subclass.

See the prototype implementation of such a subclass uniref that I've
written to show how easy it is to add a subclass which can be used
to slice large Unicode objects without having to reallocate new
buffers all the time.

BTW, I'm not aware of any changes to the PyUnicodeObject by some
fastsearch implementation. Could you point me to this ?
History
Date User Action Args
2010-01-10 21:59:27lemburgsetrecipients: + lemburg, gvanrossum, collinwinter, rhettinger, terry.reedy, jafo, jimjjewett, amaury.forgeotdarc, Rhamphoryncus, pitrou, eric.smith, ajaksu2, benjamin.peterson, orivej, ezio.melotti
2010-01-10 21:59:26lemburglinkissue1943 messages
2010-01-10 21:59:26lemburgcreate