Author ezio.melotti
Recipients Rhamphoryncus, amaury.forgeotdarc, belopolsky, eric.smith, ezio.melotti, lemburg, loewis, pitrou, rhettinger, vstinner
Date 2010-11-28.00:46:47
SpamBayes Score 6.80613e-05
Marked as misclassified No
Message-id <1290905211.52.0.313719022267.issue10542@psf.upfronthosting.co.za>
In-reply-to
Content
AFAIU the macro returns lone surrogates as they are, this means that:
  1) if the string contains only surrogate pairs, Py_UNICODE_NEXT will iterate on scalar values[0];
  2) if the string contains only lone surrogates, it will iterate on codepoints[1];
  3) if it contains both it will be half and half (i.e. scalar values if the surrogates are in pair, or falling back on codepoints if they aren't);
(for strings without surrogates, iterating on scalar values or codepoints is the same).

Is this semantic correct for all (or at least most of) the places where the macro will be used?
Would a stricter version (that rejects lone surrogates and iterates on scalar values only) be useful in addition or in alternative to Py_UNICODE_NEXT?

[0]: http://unicode.org/glossary/#unicode_scalar_value
[1]: http://unicode.org/glossary/#code_point
History
Date User Action Args
2010-11-28 00:46:51ezio.melottisetrecipients: + ezio.melotti, lemburg, loewis, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, pitrou, vstinner, eric.smith
2010-11-28 00:46:51ezio.melottisetmessageid: <1290905211.52.0.313719022267.issue10542@psf.upfronthosting.co.za>
2010-11-28 00:46:47ezio.melottilinkissue10542 messages
2010-11-28 00:46:47ezio.melotticreate