Author gvanrossum
Recipients Arfrever, ezio.melotti, gvanrossum, jkloth, lemburg, mrabarnett, pitrou, r.david.murray, tchrist, terry.reedy, v+python, vstinner
Date 2011-08-26.20:58:43
SpamBayes Score 4.996e-16
Marked as misclassified No
Message-id <1314392325.36.0.245807311379.issue12729@psf.upfronthosting.co.za>
In-reply-to
Content
Wow.  A very educational discussion.  We will be referencing this issue for many years to come.

As long as the buck stops with me, I feel strongly that *today* changing indexing from O(1) to O(log N) is a bad idea, partly for technical reasons, partly because the Python culture isn't ready.  In 5 or 10 years we need to revisit this, and it wouldn't hurt if in the mean time we started seriously thinking about how to change our APIs so that O(1) indexing is not relied upon so much.  This may include rewriting tutorials to nudge users in the direction of using different idioms for text processing.

In the meantime, I think our best option is to switch CPython to the PEP 393 string implementation.  Despite its disadvantages (I understand the "spoiler" issue) is is generally no worse than a wide build, and there is working code today that we can optimize before 3.3 is released.

For Python implementations where this is not an option (I'm thinking Jython and IronPython, both of which are closely tied to a system string type that behaves like UTF-16) I hope that at least the regular expression behavior can be fixed so that "." matches a surrogate pair.  (Possibly they already behave that way, if they use a native regex library.)

In all cases, for future Python versions, we should tighten the codecs to reject data that the Unicode standard considers invalid (and we should offer separate non-strict codecs for situations where such invalid data needs to be processed).

I wish we could fix the codecs and the regex "." issue on narrow builds for Python versions before 3.3 (esp. 3.2 and 2.7), but I fear that this is considered too backwards incompatible (though for each specific fix we should consider this carefully).
History
Date User Action Args
2011-08-26 20:58:45gvanrossumsetrecipients: + gvanrossum, lemburg, terry.reedy, pitrou, vstinner, jkloth, ezio.melotti, mrabarnett, Arfrever, v+python, r.david.murray, tchrist
2011-08-26 20:58:45gvanrossumsetmessageid: <1314392325.36.0.245807311379.issue12729@psf.upfronthosting.co.za>
2011-08-26 20:58:44gvanrossumlinkissue12729 messages
2011-08-26 20:58:43gvanrossumcreate