classification
Title: re.escape() does not work with bytes()
Type: behavior Stage:
Components: Regular Expressions Versions: Python 3.0
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: andrewmcnamara, gvanrossum, pitrou
Priority: deferred blocker Keywords:

Created on 2008-09-02 02:19 by andrewmcnamara, last changed 2008-09-10 23:24 by andrewmcnamara. This issue is now closed.

Files
File name Uploaded Description Edit
re_escape.py andrewmcnamara, 2008-09-02 02:40 Alternate re.escape()
re_escape-patch andrewmcnamara, 2008-09-03 00:20 Patch to fix re.escape() bytes() support, plus tests.
Messages (10)
msg72309 - (view) Author: Andrew McNamara (andrewmcnamara) * (Python committer) Date: 2008-09-02 02:19
In python 2, re.escape() works with either str or unicode, but in 
python 3, re.escape() no longer works correctly with the bytes type.
msg72310 - (view) Author: Andrew McNamara (andrewmcnamara) * (Python committer) Date: 2008-09-02 02:40
The attached "re_escape.py" is a (somewhat crappy) fix for re.escape()
msg72353 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-09-02 17:12
Mind adding a unittest?
msg72371 - (view) Author: Andrew McNamara (andrewmcnamara) * (Python committer) Date: 2008-09-02 23:38
Will do, although I'm slightly concerned that my "bytes" version of the 
function is about 50% slower than the "str" version. I can see why, I 
just can't think of a way to do it any faster. There's an inherent 
asymetry in bytes type that didn't exist before: b''.join(list(b'abc')) 
does not work. Of course, this does work: bytes(list(b'abc')), but the 
bytes constructor only accepts ints, not bytes. I'd like to see either 
the join method accept ints as well as bytes, or the bytes ctor accept 
bytes as well as ints. Or something.
msg72498 - (view) Author: Andrew McNamara (andrewmcnamara) * (Python committer) Date: 2008-09-04 12:36
On further testing, sometimes the str version is faster, sometimes the 
bytes version is faster. Never more than about 50% one way or the 
other, so probably not worth worrying about, although I still don't 
really like the implementation. Maybe it deserves a C implementation?
msg72509 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-09-04 14:28
I don't think there are cases where re.escape is performance critical -
are there any?
By the way, it seems to me the simplest way to write re.escape() would
be to use a regexp to do the replacement. It might or might not be the
fastest.
msg72759 - (view) Author: Andrew McNamara (andrewmcnamara) * (Python committer) Date: 2008-09-08 01:35
I don't think it's possible to say whether it's preformance critical - 
I can certainly image use cases such as parser generators where its 
speed could be noticed.

I tried building a version using regular expressions, but I couldn't do 
any better than 5x slower than the existing implementations, and the 
resulting code was less readable.
msg72760 - (view) Author: Andrew McNamara (andrewmcnamara) * (Python committer) Date: 2008-09-08 02:12
I meant "I can certainly imagine use cases..."

In case it's not clear, I think the implementation in the patch is 
"good enough" (unless someone can suggest any obvious optimisations).

If someone can prove that re.escape() performance is causing problems 
for other modules in the standard lib (email, ctypes, warnings, 
fnmatch, _strptime use it, among others), then we might consider a C 
implementation.
msg72978 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-09-10 17:44
Looks fine, except I used frozenset for the _alphanum* variables and
reverted to double quotes like the rest of the file.  Submitted as r66366.
msg72996 - (view) Author: Andrew McNamara (andrewmcnamara) * (Python committer) Date: 2008-09-10 23:24
>Looks fine, except I used frozenset for the _alphanum* variables and
>reverted to double quotes like the rest of the file.  Submitted as r66366.

All good. Thankyou.
History
Date User Action Args
2008-09-10 23:24:57andrewmcnamarasetmessages: + msg72996
2008-09-10 17:44:47gvanrossumsetstatus: open -> closed
assignee: gvanrossum
resolution: accepted
messages: + msg72978
2008-09-08 02:12:54andrewmcnamarasetmessages: + msg72760
2008-09-08 01:35:56andrewmcnamarasetmessages: + msg72759
2008-09-04 14:28:19pitrousetnosy: + pitrou
messages: + msg72509
2008-09-04 12:36:25andrewmcnamarasetmessages: + msg72498
2008-09-04 02:13:42benjamin.petersonsetpriority: deferred blocker
2008-09-03 00:20:38andrewmcnamarasetfiles: + re_escape-patch
2008-09-02 23:38:50andrewmcnamarasetmessages: + msg72371
2008-09-02 17:12:56gvanrossumsetnosy: + gvanrossum
messages: + msg72353
2008-09-02 02:40:08andrewmcnamarasetfiles: + re_escape.py
messages: + msg72310
2008-09-02 02:19:40andrewmcnamaracreate