This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients ezio.melotti, gvanrossum, lemburg, markon, nickd, nnorwitz, pitrou, r.david.murray, rhettinger, twb
Date 2009-09-29.07:57:38
SpamBayes Score 3.2001466e-08
Marked as misclassified No
Message-id <4AC1BDF1.7060105@egenix.com>
In-reply-to <1254178973.99.0.0865778223053.issue7008@psf.upfronthosting.co.za>
Content
Guido van Rossum wrote:
> What's a realistic use case for .title() anyway?

The primary use is when converting a string to be used as
title or sub-title of text - mostly inspired by the way
English treats titles.

The implementation follows the rules laid out in UTR#21:

http://unicode.org/reports/tr21/tr21-3.html

The Python version only implements the basic set of rules, i.e.
"If the preceeding letter is cased, chose the lowercase mapping; otherwise chose the titlecase
mapping (in most cases, this will be the same as the uppercase, but not always)."

It doesn't implement the special casing rules, since these would
require locale and language dependent context information which
we don't implement/use in Python.

It also doesn't implement mappings that would result in a change of
length (ligatures) or require look-ahead strategies (e.g. if the casing
depends on the code point following the converted code point).

Patches to enhance the code to support those additional rules
are welcome.

Regarding the apostrophe: the Unicode standard doesn't appear to
include any rule regarding that character and its use in titles
or upper-case versions of text. The apostrophe itself is a
non-cased code point.

It's likely that the special use of the apostrophe in English
is actually a language-specific use case. For those, it's (currently)
better to implement your own versions of the conversion functions,
based on the existing methods.

Regarding the idea to add an option to define which characters to
regard as cased/non-cased: This would cause the algorithm to no longer
adhere to the Unicode standard and most probably cause more problems
than it solves.
History
Date User Action Args
2009-09-29 07:57:40lemburgsetrecipients: + lemburg, gvanrossum, nnorwitz, rhettinger, pitrou, ezio.melotti, r.david.murray, markon, twb, nickd
2009-09-29 07:57:39lemburglinkissue7008 messages
2009-09-29 07:57:38lemburgcreate