This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tchrist
Recipients Arfrever, ezio.melotti, gvanrossum, loewis, tchrist, terry.reedy, vstinner
Date 2011-08-26.22:00:17
SpamBayes Score 2.1029006e-10
Marked as misclassified No
Message-id <9986.1314396010@chthon>
In-reply-to <1314393417.26.0.121501061547.issue12737@psf.upfronthosting.co.za>
Content
Guido van Rossum <report@bugs.python.org> wrote
   on Fri, 26 Aug 2011 21:16:57 -0000: 

> Yeah, this should be fixed in 3.3 and probably backported to 3.2
> and 2.7.  (There is already no guarantee that len(s) ==
> len(s.title()), right?)

Well, *I* don't know of any such guarantee, 
but I don't know Python very well.

In general, Unicode makes very few guarantees about casing.  Under full
casemapping, which is the only way to do the silly Turkish stuff amongst
quite a bit else, any of the three casemappings can change the length of
the string.

Other things you can't rely on are round tripping and "single paths".  By
roundtripping, just look at the two lowercase sigmas and think about how
you can't get back to one of them if you uppercase them both.  By single
paths, I mean that code that does some sort of conversion where it first
lowercases everything and then titlecases the first letter can produce
something different from titlecasing just the original first letter and
then lowercasing the rest of them.  That's because tc(x) and tc(lc(x)) can
be different.

--tom
History
Date User Action Args
2011-08-26 22:00:18tchristsetrecipients: + tchrist, gvanrossum, loewis, terry.reedy, vstinner, ezio.melotti, Arfrever
2011-08-26 22:00:17tchristlinkissue12737 messages
2011-08-26 22:00:17tchristcreate