This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.escape should not escape the hyphen
Type: enhancement Stage: resolved
Components: Regular Expressions Versions: Python 3.4
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, jjl, mrabarnett
Priority: normal Keywords:

Created on 2013-08-05 17:10 by jjl, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
rebugtest.py jjl, 2013-08-05 17:10 Test case
Messages (6)
msg194495 - (view) Author: James Laver (jjl) Date: 2013-08-05 17:10
Traceback (most recent call last):
  File "/Users/jlaver/retest.py", line 6, in test_escape
    self.assertEquals(re.escape('-'), '-')
AssertionError: '\\-' != '-'

The only place you can do bad things with hyphens is in a character class. I fail to see how you'd be in the situation of wanting to use escape()d data in a character class. Even if I could think of a reason to do that, it's decidedly not the usual case.

It's http://bugs.python.org/issue2650 all over again, just with a different character (in that case, underscore).

While we're at it, what else shouldn't it be escaping?
msg194496 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-08-05 17:19
The help says:

""">>> help(re.escape)
Help on function escape in module re:

escape(pattern)
    Escape all the characters in pattern except ASCII letters, numbers and '_'.
"""

The complementary approach is to escape _only_ the metacharacters.
msg194497 - (view) Author: James Laver (jjl) Date: 2013-08-05 17:35
Quite right, it does say that in the documentation. The documentation is perfectly correct, but the behaviour is wrong in my opinion and as you suggest, we should be escaping metacharacters only.
msg194526 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-08-06 05:52
In #2650 re.escape() was updated to match Perl's behavior.  I don't think there's any actual reason to change it -- it brings no benefits and it might break some code (even if admittedly it's not very likely).
msg194544 - (view) Author: James Laver (jjl) Date: 2013-08-06 13:48
I looked up quotemeta with perldoc and you're right, it will quote the hyphen. Given that python's regex engine correctly deals with unnecessarily quoted characters, I suppose this is fine.
msg194563 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-08-06 16:18
I can think of a real disadvantage with the current behaviour: it messes up Unicode graphemes.

For example:

>>> print('हिन्दी')
हिन्दी
>>> print(re.escape('हिन्दी'))
\ह\ि\न\्\द\ी

Of course, that's only a problem if you need to print it out or write it to a file.
History
Date User Action Args
2022-04-11 14:57:49adminsetgithub: 62862
2013-08-06 16:18:28mrabarnettsetmessages: + msg194563
2013-08-06 15:26:13ezio.melottisetstage: resolved
2013-08-06 13:48:51jjlsetstatus: open -> closed
resolution: wont fix
messages: + msg194544
2013-08-06 05:52:36ezio.melottisettype: behavior -> enhancement
messages: + msg194526
versions: - Python 2.6, Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.5
2013-08-05 17:35:44jjlsetmessages: + msg194497
2013-08-05 17:19:34mrabarnettsetmessages: + msg194496
2013-08-05 17:10:44jjlcreate