Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalize isdigit / isalpha / tolower / ... uses throughout Python source #50043

Closed
mdickinson opened this issue Apr 19, 2009 · 5 comments
Closed
Assignees
Labels
easy interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@mdickinson
Copy link
Member

BPO 5793
Nosy @mdickinson, @ericvsmith

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/ericvsmith'
closed_at = <Date 2009-04-27.21:13:25.114>
created_at = <Date 2009-04-19.12:57:45.685>
labels = ['interpreter-core', 'easy', 'type-feature']
title = 'Rationalize isdigit / isalpha / tolower / ... uses throughout Python source'
updated_at = <Date 2009-04-27.21:13:25.112>
user = 'https://github.com/mdickinson'

bugs.python.org fields:

activity = <Date 2009-04-27.21:13:25.112>
actor = 'eric.smith'
assignee = 'eric.smith'
closed = True
closed_date = <Date 2009-04-27.21:13:25.114>
closer = 'eric.smith'
components = ['Interpreter Core']
creation = <Date 2009-04-19.12:57:45.685>
creator = 'mark.dickinson'
dependencies = []
files = []
hgrepos = []
issue_num = 5793
keywords = ['easy']
message_count = 5.0
messages = ['86170', '86173', '86293', '86668', '86698']
nosy_count = 2.0
nosy_names = ['mark.dickinson', 'eric.smith']
pr_nums = []
priority = 'normal'
resolution = 'accepted'
stage = 'needs patch'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue5793'
versions = ['Python 3.1', 'Python 2.7']

@mdickinson
Copy link
Member Author

Problem: the standard C character handling functions from ctype.h
(isalpha, isdigit, isxdigit, isspace, toupper, tolower, etc.) are locale
aware, but for almost all uses CPython needs locale-unaware versions of
these.

There are various solutions in the current source:

  • there's a file Include/bytes_methods.h which provides suitable
    ISDIGIT/ISALPHA/... macros, but also undefines the standard functions.
    As it is, it can't be included in Python.h since that would break
    3rd party code that includes Python.h and also uses isdigit.

  • some files have their own solution: Python/pystrtod.c defines
    its own (probably inefficient) ISDIGIT and ISSPACE macros.

  • in some places the standard C functions are just used directly (and
    possibly incorrectly). A gotcha here is that one has to remember to use
    Py_CHARMASK to avoid errors on some platforms. (See bpo-3633 for an
    example.)

It would be nice to clean all this up, and have one central, efficient,
easy-to-use set of Py_ISDIGIT/Py_ISALPHA ... locale-independent macros (or
functions) that could be used safely throughout the Python source.

@mdickinson mdickinson added interpreter-core (Objects, Python, Grammar, and Parser dirs) easy type-feature A feature request or enhancement labels Apr 19, 2009
@ericvsmith
Copy link
Member

I concur. I've also been bitten by forgetting Py_CHARMASK, so a single
version that took this into account (and was locale-unaware) would be
welcome.

In private mail I'd mentioned that if these are functions, they should
take int. But I now think that's incorrect, and they should take char or
unsigned char. I think the standard C functions take int because they
also allow EOF. I think the Py_ versions should allow only characters
and not allow EOF. Py_CHARMASK already enforces this, anyway, with
likely undefined results.

@ericvsmith
Copy link
Member

Also, see _toupper/_tolower in Objects/stringlib/stringdef.h and
Objects/stringobject.c. Those should be rationalized as well.

@ericvsmith
Copy link
Member

I'll implement this by adding a pyctype.h and pyctype.c, mimicking
<ctype.h>. I'll essentially copy and rename the methods in
bytes_methods.[ch], then change bytes_methods.h to refer to the new
versions, for backward compatibility.

@ericvsmith ericvsmith self-assigned this Apr 27, 2009
@ericvsmith
Copy link
Member

Checked in to trunk (rr72040) and py3k (r72044).

Windows buildbots look okay, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants