Message 62472 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	arnimar, lemburg, pitrou
Date	2008-02-16.22:20:14
SpamBayes Score	0.0017533302
Marked as misclassified	No
Message-id	<1203200416.58.0.1255902925.issue1813@psf.upfronthosting.co.za>
In-reply-to

Content
I agree that it's a bit unfortunate that the 8-bit string APIs in Python use the locale aware C functions per default (this should really be reversed: there should be locale-aware .upper() and .lower() methods and the the standard ones should work just like the Unicode ones - without dependency on the locale, using ASCII mappings), but for historical reasons this cannot easily be changed. .lower() and .upper() for 8-bit strings were always locale dependent and before the addition of Unicode, setting the locale was the most common way to make an application understand different character sets. In Python 3k the problem will probably go away, since .lower() and .upper() will then no longer depend on the locale. Perhaps we should just convert a few of the cases you found to using Unicode strings instead of 8-bit strings in 2.6 ?! That would both make the code more portable and also provide a clear statement of "this is a text string", making porting to Py3k easier.

I agree that it's a bit unfortunate that the 8-bit string APIs in Python
use the locale aware C functions per default (this should really be
reversed: there should be locale-aware .upper() and .lower() methods and
the the standard ones should work just like the Unicode ones - without
dependency on the locale, using ASCII mappings), but for historical
reasons this cannot easily be changed.

.lower() and .upper() for 8-bit strings were always locale dependent and
before the addition of Unicode, setting the locale was the most common
way to make an application understand different character sets.

In Python 3k the problem will probably go away, since .lower() and
.upper() will then no longer depend on the locale.

Perhaps we should just convert a few of the cases you found to using
Unicode strings instead of 8-bit strings in 2.6 ?! That would both make
the code more portable and also provide a clear statement of "this is a
text string", making porting to Py3k easier.

History
Date	User	Action	Args
2008-02-16 22:20:16	lemburg	set	spambayes_score: 0.00175333 -> 0.0017533302 recipients: + lemburg, pitrou, arnimar
2008-02-16 22:20:16	lemburg	set	spambayes_score: 0.00175333 -> 0.00175333 messageid: <1203200416.58.0.1255902925.issue1813@psf.upfronthosting.co.za>
2008-02-16 22:20:15	lemburg	link	issue1813 messages
2008-02-16 22:20:15	lemburg	create