Message 140799 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	belopolsky, eric.araujo, ezio.melotti, lemburg, py.user, r.david.murray
Date	2011-07-21.09:02:34
SpamBayes Score	1.4981072e-11
Marked as misclassified	No
Message-id	<4E27EB23.9050700@egenix.com>
In-reply-to	<1311238375.93.0.206051053538.issue12266@psf.upfronthosting.co.za>

Content
Ezio Melotti wrote: > > Ezio Melotti <ezio.melotti@gmail.com> added the comment: > > Do you mean "if (!Py_UNICODE_ISLOWER(s)) {" (with the '!')? Sorry, here's the correct version: if (!Py_UNICODE_ISUPPER(s)) { s = Py_UNICODE_TOUPPER(s); status = 1; } s++; while (--len > 0) { if (!Py_UNICODE_ISLOWER(s)) { s = Py_UNICODE_TOLOWER(s); status = 1; } s++; } > This sounds fine to me, but with this approach all the uncased characters will go through a Py_UNICODE_TO macro, whereas with the current code only the cased ones are converted. I'm not sure this matters too much though. > > OTOH if the non-lowercase cased chars are always either upper or titlecased, checking for both should be equivalent. AFAIK, there are characters that don't have a case mapping at all. It may also be the case, that a non-cased character still has a lower/upper case mapping, e.g. for typographical reasons. Someone would have to check this against the current Unicode database.

Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
> 
> Do you mean  "if (!Py_UNICODE_ISLOWER(*s)) {"  (with the '!')?

Sorry, here's the correct version:

    if (!Py_UNICODE_ISUPPER(*s)) {
        *s = Py_UNICODE_TOUPPER(*s);
        status = 1;
    }
    s++;
    while (--len > 0) {
        if (!Py_UNICODE_ISLOWER(*s)) {
            *s = Py_UNICODE_TOLOWER(*s);
            status = 1;
        }
        s++;
    }

> This sounds fine to me, but with this approach all the uncased characters will go through a Py_UNICODE_TO* macro, whereas with the current code only the cased ones are converted.  I'm not sure this matters too much though.
> 
> OTOH if the non-lowercase cased chars are always either upper or titlecased, checking for both should be equivalent.

AFAIK, there are characters that don't have a case mapping at all.
It may also be the case, that a non-cased character still has a
lower/upper case mapping, e.g. for typographical reasons.

Someone would have to check this against the current Unicode database.

History
Date	User	Action	Args
2011-07-21 09:02:35	lemburg	set	recipients: + lemburg, belopolsky, ezio.melotti, eric.araujo, r.david.murray, py.user
2011-07-21 09:02:34	lemburg	link	issue12266 messages
2011-07-21 09:02:34	lemburg	create