Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the meaning of str methods #54796

Closed
abalkin opened this issue Nov 30, 2010 · 7 comments
Closed

Document the meaning of str methods #54796

abalkin opened this issue Nov 30, 2010 · 7 comments
Assignees
Labels
docs Documentation in the Doc dir

Comments

@abalkin
Copy link
Member

abalkin commented Nov 30, 2010

BPO 10587
Nosy @malemburg, @loewis, @abalkin, @orsenthil, @vstinner, @ezio-melotti
Files
  • issue10587.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/abalkin'
    closed_at = <Date 2010-12-23.03:02:10.414>
    created_at = <Date 2010-11-30.05:46:44.467>
    labels = ['docs']
    title = 'Document the meaning of str methods'
    updated_at = <Date 2010-12-23.03:02:10.412>
    user = 'https://github.com/abalkin'

    bugs.python.org fields:

    activity = <Date 2010-12-23.03:02:10.412>
    actor = 'belopolsky'
    assignee = 'belopolsky'
    closed = True
    closed_date = <Date 2010-12-23.03:02:10.414>
    closer = 'belopolsky'
    components = ['Documentation']
    creation = <Date 2010-11-30.05:46:44.467>
    creator = 'belopolsky'
    dependencies = []
    files = ['20039']
    hgrepos = []
    issue_num = 10587
    keywords = ['patch']
    message_count = 7.0
    messages = ['122885', '122927', '122931', '123270', '123955', '124522', '124532']
    nosy_count = 7.0
    nosy_names = ['lemburg', 'loewis', 'belopolsky', 'orsenthil', 'vstinner', 'ezio.melotti', 'docs@python']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue10587'
    versions = ['Python 3.2']

    @abalkin
    Copy link
    Member Author

    abalkin commented Nov 30, 2010

    On Mon, Nov 29, 2010 at 4:13 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:

    > - How specific should library reference manual be in defining methods
    > affected by UCD such as str.upper()?

    It should specify what this actually does in Unicode terminology
    (probably in addition to a layman's rephrase of that)

    http://mail.python.org/pipermail/python-dev/2010-November/106155.html

    Some of the clarifications may actually lead to a conclusion that current behavior is wrong. For example, Unicode defines Alphabetic property as Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic

    http://www.unicode.org/reports/tr44/tr44-6.html#Alphabetic

    However, str.isalpha() is defined as just Lu + Ll + Lt + Lm + Lo. For example,

    >>> import unicodedata as ud
    >>> ud.category('')
    'Nl'
    >>> ''.isalpha()
    False
    >>> ud.name('')
    'ROMAN NUMERAL FIVE'

    As far a I can tell, the source of Other_Alphabetic property data,
    http://unicode.org/Public/UNIDATA/PropList.txt, is not even included in the unicodedata module and neither is SpecialCasing.txt which is necessary for implementing a compliant case mapping algorithm.

    @abalkin abalkin added the docs Documentation in the Doc dir label Nov 30, 2010
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 30, 2010

    What is the issue that you are reporting? that the status quo should be documented, or that isalpha is wrong? These are independent - don't mix them.

    @abalkin
    Copy link
    Member Author

    abalkin commented Nov 30, 2010

    On Tue, Nov 30, 2010 at 1:53 PM, Martin v. Löwis <report@bugs.python.org> wrote:
    ..

    What is the issue that you are reporting? that the status quo should be documented, or that isalpha is wrong?
    These are independent - don't mix them.

    This is a documentation issue. I don't say that str.isalpha() is
    necessarily wrong. (If unicodedata had an isAlphabetic() menthod
    defined as Lu + Ll + Lt + Lm + Lo, I would file a bug report for
    that.) Here, I just want to mention that proper str.isalpha()
    definition is subject to debate and it being defined as Lu + Ll + Lt +
    Lm + Lo may need to be marked as CPython implementation detail. Note
    that the Unicode book (sorry, don't have the page reference) advises
    not to rely on catch-all APIs such as isAlphabetic(), but consult the
    underlying properties directly. I tend to agree with that because
    some programs may want to treat say Roman numerals as letters and some
    as numbers, so whether isAlphabetic() should include Nl category is
    better left to the application.

    @abalkin
    Copy link
    Member Author

    abalkin commented Dec 3, 2010

    As discussed in bpo-10610, it is important to keep the gory details in one place and refer to it throughout the manual. I think the Unicode terminology is best exposed in the unicodedata module documentation. For string character-type methods, I suggest presenting an equivalent to unicodedata expression where possible. For example, x.isalpha() is equivalent to all(unicodedata.category(c) in 'Lu Ll Lt Lm Lo' for c in x) or many be just a "character c is alphabetical if unicodedata.category(c) in 'Lu Ll Lt Lm Lo' is true.

    Other examples:

    isdigit() -> unicodedata.digit(c, None) is not None
    isdecimal() -> unicodedata.decimal(c, None) is not None
    isnumeric() -> unicodedata.numeric(c, None) is not None
    isprintable()-> unicodedata.category(c) not in 'Cc Cf Cs Co Cn Zl Zp Zs'
    islower() -> unicodedata.category(c) == 'Ll'
    isupper() -> unicodedata.category(c) == 'Lu'
    istitle() -> unicodedata.category(c) == 'Lt'
    isalnum() -> isalpha() or isdecimal() or isdigit() or isnumeric()

    I am not sure about equivalent to expressions for isidentifier() and isspace().

    @abalkin
    Copy link
    Member Author

    abalkin commented Dec 14, 2010

    I am attaching a patch that expands the documentation of isalnum, isalpha, isdecimal, isdigit, isnumeric, islower, isupper, and isspace. I did not change isidentifier or isprintable because their docs were already complete. I also left out istitle because I could not figure out how to deal with the confusion between Python and Unicode notions of titlecase.

    I would also like to note that it appears that isdigit and isdecimal imply isnumeric, so s.isalnum() is equivalent to all(c.isalpha() or c.isnumeric() for c in s). However the actual code does have redundant checks for isdecimal() and isdigit(). I think the documentation should reflect what the code does for an off-chance that someone would replace unicodedata with their own database with which these checks are not redundant.

    @abalkin abalkin assigned abalkin and unassigned docspython Dec 14, 2010
    @orsenthil
    Copy link
    Member

    ...

    redundant checks for isdecimal() and isdigit(). I think the
    documentation should reflect what the code does for an off-chance
    that someone would replace unicodedata with their own database with
    which these checks are not redundant.

    +1 for making these changes. Helps clarify meaning of these methods with
    respect to Unicode strings.

    @abalkin
    Copy link
    Member Author

    abalkin commented Dec 23, 2010

    Committed r87443 (3.2) and r87444 (3.1).

    @abalkin abalkin closed this as completed Dec 23, 2010
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants