Message 93258 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	ezio.melotti, gvanrossum, lemburg, markon, nickd, nnorwitz, pitrou, r.david.murray, rhettinger, twb
Date	2009-09-29.07:57:38
SpamBayes Score	3.2001466e-08
Marked as misclassified	No
Message-id	<4AC1BDF1.7060105@egenix.com>
In-reply-to	<1254178973.99.0.0865778223053.issue7008@psf.upfronthosting.co.za>

Content
Guido van Rossum wrote: > What's a realistic use case for .title() anyway? The primary use is when converting a string to be used as title or sub-title of text - mostly inspired by the way English treats titles. The implementation follows the rules laid out in UTR#21: http://unicode.org/reports/tr21/tr21-3.html The Python version only implements the basic set of rules, i.e. "If the preceeding letter is cased, chose the lowercase mapping; otherwise chose the titlecase mapping (in most cases, this will be the same as the uppercase, but not always)." It doesn't implement the special casing rules, since these would require locale and language dependent context information which we don't implement/use in Python. It also doesn't implement mappings that would result in a change of length (ligatures) or require look-ahead strategies (e.g. if the casing depends on the code point following the converted code point). Patches to enhance the code to support those additional rules are welcome. Regarding the apostrophe: the Unicode standard doesn't appear to include any rule regarding that character and its use in titles or upper-case versions of text. The apostrophe itself is a non-cased code point. It's likely that the special use of the apostrophe in English is actually a language-specific use case. For those, it's (currently) better to implement your own versions of the conversion functions, based on the existing methods. Regarding the idea to add an option to define which characters to regard as cased/non-cased: This would cause the algorithm to no longer adhere to the Unicode standard and most probably cause more problems than it solves.

Guido van Rossum wrote:
> What's a realistic use case for .title() anyway?

The primary use is when converting a string to be used as
title or sub-title of text - mostly inspired by the way
English treats titles.

The implementation follows the rules laid out in UTR#21:

http://unicode.org/reports/tr21/tr21-3.html

The Python version only implements the basic set of rules, i.e.
"If the preceeding letter is cased, chose the lowercase mapping; otherwise chose the titlecase
mapping (in most cases, this will be the same as the uppercase, but not always)."

It doesn't implement the special casing rules, since these would
require locale and language dependent context information which
we don't implement/use in Python.

It also doesn't implement mappings that would result in a change of
length (ligatures) or require look-ahead strategies (e.g. if the casing
depends on the code point following the converted code point).

Patches to enhance the code to support those additional rules
are welcome.

Regarding the apostrophe: the Unicode standard doesn't appear to
include any rule regarding that character and its use in titles
or upper-case versions of text. The apostrophe itself is a
non-cased code point.

It's likely that the special use of the apostrophe in English
is actually a language-specific use case. For those, it's (currently)
better to implement your own versions of the conversion functions,
based on the existing methods.

Regarding the idea to add an option to define which characters to
regard as cased/non-cased: This would cause the algorithm to no longer
adhere to the Unicode standard and most probably cause more problems
than it solves.

History
Date	User	Action	Args
2009-09-29 07:57:40	lemburg	set	recipients: + lemburg, gvanrossum, nnorwitz, rhettinger, pitrou, ezio.melotti, r.david.murray, markon, twb, nickd
2009-09-29 07:57:39	lemburg	link	issue7008 messages
2009-09-29 07:57:38	lemburg	create