Message151644
> http://hg.python.org/cpython/rev/0b5ce36a7a24
> changeset: 74515:0b5ce36a7a24
> + Casefolding is similar to lowercasing but more aggressive because it is
> + intended to remove all case distinctions in a string. For example, the German
> + lowercase letter ``'ß'`` is equivalent to ``"ss"``. Since it is already
> + lowercase, :meth:`lower` would do nothing to ``'ß'``; :meth:`casefold`
> + converts it to ``"ss"``.
Perhaps add the recommendation to canonicalize as well.
A complete, but possibly too long, try is below:
Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter ``'ß'`` is equivalent to ``"ss"``. Since it is already lowercase, :meth:`lower` would do nothing to ``'ß'``; :meth:`casefold` converts it to ``"ss"``. Note that most case-insensitive matches should also match compatibility equivalent characters.
The casefolding algorithm is described in section 3.13 of the Unicode Standard. Per D146, a compatibility caseless match can be achieved by
from unicodedata import normalize
def caseless_compat(string):
nfd_string = normalize("NFD", string)
nfkd1_string = normalize("NFKD", nfd_string.casefold())
return normalize("NFKD", nfkd1_string.casefold()) |
|
Date |
User |
Action |
Args |
2012-01-19 17:06:03 | Jim.Jewett | set | recipients:
+ Jim.Jewett, benjamin.peterson, docs@python |
2012-01-19 17:06:03 | Jim.Jewett | set | messageid: <1326992763.46.0.576190503242.issue13828@psf.upfronthosting.co.za> |
2012-01-19 17:06:02 | Jim.Jewett | link | issue13828 messages |
2012-01-19 17:06:02 | Jim.Jewett | create | |
|