Message 122594 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	Rhamphoryncus, amaury.forgeotdarc, belopolsky, eric.smith, ezio.melotti, lemburg, loewis, pitrou, rhettinger, vstinner
Date	2010-11-28.00:46:47
SpamBayes Score	6.80613e-05
Marked as misclassified	No
Message-id	<1290905211.52.0.313719022267.issue10542@psf.upfronthosting.co.za>
In-reply-to

Content
AFAIU the macro returns lone surrogates as they are, this means that: 1) if the string contains only surrogate pairs, Py_UNICODE_NEXT will iterate on scalar values[0]; 2) if the string contains only lone surrogates, it will iterate on codepoints[1]; 3) if it contains both it will be half and half (i.e. scalar values if the surrogates are in pair, or falling back on codepoints if they aren't); (for strings without surrogates, iterating on scalar values or codepoints is the same). Is this semantic correct for all (or at least most of) the places where the macro will be used? Would a stricter version (that rejects lone surrogates and iterates on scalar values only) be useful in addition or in alternative to Py_UNICODE_NEXT? [0]: http://unicode.org/glossary/#unicode_scalar_value [1]: http://unicode.org/glossary/#code_point

AFAIU the macro returns lone surrogates as they are, this means that:
  1) if the string contains only surrogate pairs, Py_UNICODE_NEXT will iterate on scalar values[0];
  2) if the string contains only lone surrogates, it will iterate on codepoints[1];
  3) if it contains both it will be half and half (i.e. scalar values if the surrogates are in pair, or falling back on codepoints if they aren't);
(for strings without surrogates, iterating on scalar values or codepoints is the same).

Is this semantic correct for all (or at least most of) the places where the macro will be used?
Would a stricter version (that rejects lone surrogates and iterates on scalar values only) be useful in addition or in alternative to Py_UNICODE_NEXT?

[0]: http://unicode.org/glossary/#unicode_scalar_value
[1]: http://unicode.org/glossary/#code_point

History
Date	User	Action	Args
2010-11-28 00:46:51	ezio.melotti	set	recipients: + ezio.melotti, lemburg, loewis, rhettinger, amaury.forgeotdarc, belopolsky, Rhamphoryncus, pitrou, vstinner, eric.smith
2010-11-28 00:46:51	ezio.melotti	set	messageid: <1290905211.52.0.313719022267.issue10542@psf.upfronthosting.co.za>
2010-11-28 00:46:47	ezio.melotti	link	issue10542 messages
2010-11-28 00:46:47	ezio.melotti	create