New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document the meaning of str methods #54796
Comments
On Mon, Nov 29, 2010 at 4:13 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
http://mail.python.org/pipermail/python-dev/2010-November/106155.html Some of the clarifications may actually lead to a conclusion that current behavior is wrong. For example, Unicode defines Alphabetic property as Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic http://www.unicode.org/reports/tr44/tr44-6.html#Alphabetic However, str.isalpha() is defined as just Lu + Ll + Lt + Lm + Lo. For example, >>> import unicodedata as ud
>>> ud.category('Ⅴ')
'Nl'
>>> 'Ⅴ'.isalpha()
False
>>> ud.name('Ⅴ')
'ROMAN NUMERAL FIVE' As far a I can tell, the source of Other_Alphabetic property data, |
What is the issue that you are reporting? that the status quo should be documented, or that isalpha is wrong? These are independent - don't mix them. |
On Tue, Nov 30, 2010 at 1:53 PM, Martin v. Löwis <report@bugs.python.org> wrote:
This is a documentation issue. I don't say that str.isalpha() is |
As discussed in bpo-10610, it is important to keep the gory details in one place and refer to it throughout the manual. I think the Unicode terminology is best exposed in the unicodedata module documentation. For string character-type methods, I suggest presenting an equivalent to unicodedata expression where possible. For example, x.isalpha() is equivalent to all(unicodedata.category(c) in 'Lu Ll Lt Lm Lo' for c in x) or many be just a "character c is alphabetical if unicodedata.category(c) in 'Lu Ll Lt Lm Lo' is true. Other examples: isdigit() -> unicodedata.digit(c, None) is not None I am not sure about equivalent to expressions for isidentifier() and isspace(). |
I am attaching a patch that expands the documentation of isalnum, isalpha, isdecimal, isdigit, isnumeric, islower, isupper, and isspace. I did not change isidentifier or isprintable because their docs were already complete. I also left out istitle because I could not figure out how to deal with the confusion between Python and Unicode notions of titlecase. I would also like to note that it appears that isdigit and isdecimal imply isnumeric, so s.isalnum() is equivalent to all(c.isalpha() or c.isnumeric() for c in s). However the actual code does have redundant checks for isdecimal() and isdigit(). I think the documentation should reflect what the code does for an off-chance that someone would replace unicodedata with their own database with which these checks are not redundant. |
...
+1 for making these changes. Helps clarify meaning of these methods with |
Committed r87443 (3.2) and r87444 (3.1). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: