This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author belopolsky
Recipients antlong, belopolsky, jkloth, loewis, mark.dickinson, ronaldoussoren, vstinner
Date 2010-07-23.14:13:25
SpamBayes Score 2.497257e-07
Marked as misclassified No
Message-id <AANLkTinabaoH+yQQ=mKdEFpQTwS-16A0TtSzjBrkYwhJ@mail.gmail.com>
In-reply-to <1279872225.3.0.538493653151.issue9335@psf.upfronthosting.co.za>
Content
On Fri, Jul 23, 2010 at 4:03 AM, Martin v. Löwis <report@bugs.python.org> wrote:
..
> I fail to see the bug in this report. '\xff' is a letter because the C library says it is.

This does not explain the difference between 2.6 and 2.7.  With
attached issue9335-test.py,

$ cat issue9335-test.py
import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
print(chr(255).isalpha())

$ python2.7 issue9335-test.py
False
$ python2.6 issue9335-test.py
True
$ python2.5 issue9335-test.py
True

Since chr(255) = '\xff', is not a valid UTF-8 byte sequence, it makes
little sense to ask whether it is a letter or not in a locale that
uses UTF-8 encoding.   Nevertheless the behavior changed between
revisions and it is not mentioned in "what's new in 2.7".  (I suspect
this was introduced in issue5793 (r72040), but I have not verified.)

There are two possible action items here:

1. New behavior needs to be documented.   I believe 2.7 is correct
because when isalpha is used to sanitize untrusted input, it is better
to reject in the case of uncertainy.

2. Arguably, this is a security issue and thus eligible for backporting to 2.6.
Files
File name Uploaded
issue9335-test.py belopolsky, 2010-07-23.14:13:25
History
Date User Action Args
2010-07-23 14:13:28belopolskysetrecipients: + belopolsky, loewis, ronaldoussoren, mark.dickinson, vstinner, jkloth, antlong
2010-07-23 14:13:26belopolskylinkissue9335 messages
2010-07-23 14:13:26belopolskycreate