This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Better explain re.LOCALE and re.UNICODE for \S and \W
Type: behavior Stage: resolved
Components: Documentation Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: ezio.melotti, orsenthil, python-dev
Priority: low Keywords: patch

Created on 2012-03-12 03:22 by orsenthil, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue14258.diff orsenthil, 2012-04-06 05:38
Messages (5)
msg155434 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-12 03:22
Opening the this bug following this discussion - http://mail.python.org/pipermail/docs/2012-March/007829.html

library/re.html

\S

When the LOCALE and UNICODE flags are not specified, matches any non-whitespace character; this is equivalent to the set [^ \t\n\r\f\v] With LOCALE, it will match any character not in this set, and not defined as space in the current locale. If UNICODE is set, this will match anything other than [ \t\n\r\f\v] and characters marked as space in the Unicode character properties database.

This is wrong. With LOCALE set, it should be [^ \t\n\r\f\v] plus any non-space character in that locale.
msg155435 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-03-12 03:37
New changeset 2d2a972b7523 by Senthil Kumaran in branch '2.7':
Fix closes issue14258 - added clarification to \W and \S flags
http://hg.python.org/cpython/rev/2d2a972b7523
msg155437 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-12 03:44
This clarification is specific to Python 2.7. 

For Python3, the use of LOCALE flag is explicitly discouraged and
confusing references to it's meaning is not present in the docs.
msg157645 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-04-06 05:38
Well, I would like to correct this further and add clarification based on the current implementation (_sre.c)

The definition of LOCALE Space is this -

 #define SRE_LOC_IS_SPACE(ch) (!((ch) & ~255) ? isspace((ch)) : 0)

And the definition of NON_SPACE category is a negation of space. That's it.

Now, given that definition, we see for the character values higher than 255, the check is not made at all. Is it simple ascii isspace is considered when the LOCALE flag is set. And in effect, re.LOCALE flag has not extra effect on matching of space or non-white space character.

After realizing this, I propose the following changes attached in the patch as a documentation fix.
msg157978 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-04-10 19:23
New changeset 4d49a2415ced by Senthil Kumaran in branch '2.7':
Fix closes Issue14258  - Clarify the re.LOCALE and re.UNICODE flags for \S class
http://hg.python.org/cpython/rev/4d49a2415ced
History
Date User Action Args
2022-04-11 14:57:27adminsetgithub: 58466
2012-04-10 19:23:22python-devsetstatus: open -> closed
resolution: fixed
messages: + msg157978
2012-04-06 05:38:26orsenthilsetstatus: closed -> open
files: + issue14258.diff
messages: + msg157645

keywords: + patch
resolution: fixed -> (no value)
2012-03-12 03:44:20orsenthilsetmessages: + msg155437
2012-03-12 03:37:58python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg155435

resolution: fixed
stage: resolved
2012-03-12 03:23:41ezio.melottisetnosy: + ezio.melotti
2012-03-12 03:22:06orsenthilcreate