This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Incoherent bevavior with umlaut in regular expressions
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: cklein, eryksun, ezio.melotti, mrabarnett, r.david.murray
Priority: normal Keywords:

Created on 2015-08-14 07:07 by cklein, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg248560 - (view) Author: Christian Klein (cklein) Date: 2015-08-14 07:07
The Python 2.7 re module seems not to agree what to consider a word character:

import re
s = u'f\xfc'
print re.sub('\W', '*', s, re.UNICODE)
print re.findall('\w', s, re.UNICODE)

The application of re.sub removes the character u'ü' which implies it's considered a non word character (\W).
But then re.findall shows it as a word character (\w).

Python 3.4 and Python 3.5 are correct respectively coherent.
(But that's unfortunately not an option for Google App Engine)
msg248561 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-08-14 07:43
You're passing re.UNICODE (32) as the value of the count parameter, i.e. the function signature is re.sub(pattern, repl, string, count=0, flags=0).
msg248562 - (view) Author: Christian Klein (cklein) Date: 2015-08-14 07:46
Wow, that's very embarrassing. Thank you.
(I tried to get further help before but nobody recognized that stupid mistake)
msg248584 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-08-14 12:52
Don't be embarrassed; a report like this turns up on this tracker about every three or four months.  Unfortunately there's nothing we can do to make the situation better because of backward compatibility concerns.
History
Date User Action Args
2022-04-11 14:58:19adminsetgithub: 69051
2015-08-14 16:53:01zach.waresetstage: resolved
2015-08-14 12:52:03r.david.murraysetnosy: + r.david.murray
messages: + msg248584
2015-08-14 07:46:46ckleinsetmessages: + msg248562
2015-08-14 07:43:07eryksunsetstatus: open -> closed

nosy: + eryksun
messages: + msg248561

resolution: not a bug
2015-08-14 07:07:42ckleincreate