This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients antlong, belopolsky, eric.araujo, eric.smith, jkloth, loewis, mark.dickinson, ronaldoussoren, vstinner
Date 2010-07-25.23:27:18
SpamBayes Score 7.833334e-09
Marked as misclassified No
Message-id <1280100441.33.0.101426017348.issue9335@psf.upfronthosting.co.za>
In-reply-to
Content
> Victor, This looks like your cup of tee.

Unicode is my cup of tee, but not programs considering that bytes are characters.

<a byte string>.isalpha() doesn't mean anything to me :-)

This issue is a more question about the C library, not about Python :-) So try the attached program "isalpha.c" if you would like to test your libc.

Results on my Linux box (Debian Sid, eglibc 2.11.2):
----------------
$ ./isalpha C
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz (52)

$ ./isalpha fr_FR.UTF-8
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz (52)

$ ./isalpha fr_FR.iso88591
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff (117)

$ ./isalpha fr_FR.iso885915@euro
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xa6\xa8\xaa\xb4\xb5\xb8\xba\xbc\xbd\xbe\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff (124)
----------------

If your libc consider that \xff is a valid UTF-8 character, you should change your OS for a better one :-)

--

> >>> len(letters)
> 117
> ...
> >>> locale.setlocale(locale.LC_CTYPE)
> 'en_US.UTF-8'

It looks like Mac OS X uses ISO-8859-1 instead of UTF-8.

--

string.letters is built using strop.lowercase + strop.uppsercase which are built using the C functions islower() and islower(). locale.setlocale() regenerates strop/string.lowercase, strop/string.uppercase and string.letters for LC_CTYPE and LC_ALL categories.

--

You don't need to run IDLE or import Tkinter to set the locale:

   import locale; locale.setlocale(locale.LC_ALL, '')

is enough.

--

A library should not change the locale (only the application).

$ python2.6
>>> import locale
>>> locale.getlocale()
(None, None)
>>> import Tkinter
>>> locale.getlocale()
('fr_FR', 'UTF8')

=> Tkinter is an horrible library! (The bug is in the C library, not in the Python wrapper)

Use a better one like Gtk ou Qt ;-)

$ python
>>> import locale
>>> import pygtk
>>> locale.getlocale()
(None, None)
>>> import PyQt4
>>> locale.getlocale()
(None, None)

(IDLE is based on Tkinter)

--

I don't understand why Alexander gets different results on Python 2.6 and Python 2.7.

@belopolsky: Are both programs linked to (built with?) the same C library? (same libray version)
History
Date User Action Args
2010-07-25 23:27:21vstinnersetrecipients: + vstinner, loewis, ronaldoussoren, mark.dickinson, belopolsky, eric.smith, jkloth, eric.araujo, antlong
2010-07-25 23:27:21vstinnersetmessageid: <1280100441.33.0.101426017348.issue9335@psf.upfronthosting.co.za>
2010-07-25 23:27:20vstinnerlinkissue9335 messages
2010-07-25 23:27:18vstinnercreate