Message 93274 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	christoph, ezio.melotti, gvanrossum, lemburg, markon, nickd, nnorwitz, pitrou, r.david.murray, rhettinger, twb
Date	2009-09-29.10:40:54
SpamBayes Score	5.7325438e-09
Marked as misclassified	No
Message-id	<4AC1E435.9030908@egenix.com>
In-reply-to	<1254219647.05.0.244296326279.issue7008@psf.upfronthosting.co.za>

Content
Christoph Burgmer wrote: > > Christoph Burgmer <cburgmer@ira.uka.de> added the comment: > > I admit I don't fully understand the semantics of capwords(). string.capwords() is an old function from the days before Unicode. The function is basically defined by its implementation. > But from > what I believe what it should do, this function could be happily > replaced by the word-breaking algorithm as defined in > http://www.unicode.org/reports/tr29/. > > This algorithm should be implemented anyway, to properly solve > issue6412. Simple word breaking would be nice to have in Python as new Unicode method, e.g. .splitwords(). Note however, that word boundaries are just as complicated as casing: there are lots of special cases in different languages or locales (see the notes after the word boundary rules in the TR29).

Christoph Burgmer wrote:
> 
> Christoph Burgmer <cburgmer@ira.uka.de> added the comment:
> 
> I admit I don't fully understand the semantics of capwords().

string.capwords() is an old function from the days before Unicode.
The function is basically defined by its implementation.

> But from
> what I believe what it should do, this function could be happily
> replaced by the word-breaking algorithm as defined in
> http://www.unicode.org/reports/tr29/.
> 
> This algorithm should be implemented anyway, to properly solve
> issue6412.

Simple word breaking would be nice to have in Python as new
Unicode method, e.g. .splitwords().

Note however, that word boundaries are just as complicated as casing:
there are lots of special cases in different languages or locales
(see the notes after the word boundary rules in the TR29).

History
Date	User	Action	Args
2009-09-29 10:40:56	lemburg	set	recipients: + lemburg, gvanrossum, nnorwitz, rhettinger, pitrou, christoph, ezio.melotti, r.david.murray, markon, twb, nickd
2009-09-29 10:40:54	lemburg	link	issue7008 messages
2009-09-29 10:40:54	lemburg	create