Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re.escape() does not work with bytes() #48006

Closed
andrewmcnamara mannequin opened this issue Sep 2, 2008 · 10 comments
Closed

re.escape() does not work with bytes() #48006

andrewmcnamara mannequin opened this issue Sep 2, 2008 · 10 comments
Assignees
Labels
deferred-blocker topic-regex type-bug An unexpected behavior, bug, or error

Comments

@andrewmcnamara
Copy link
Mannequin

andrewmcnamara mannequin commented Sep 2, 2008

BPO 3756
Nosy @gvanrossum, @pitrou
Files
  • re_escape.py: Alternate re.escape()
  • re_escape-patch: Patch to fix re.escape() bytes() support, plus tests.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gvanrossum'
    closed_at = <Date 2008-09-10.17:44:47.818>
    created_at = <Date 2008-09-02.02:19:40.696>
    labels = ['expert-regex', 'deferred-blocker', 'type-bug']
    title = 're.escape() does not work with bytes()'
    updated_at = <Date 2008-09-10.23:24:57.182>
    user = 'https://bugs.python.org/andrewmcnamara'

    bugs.python.org fields:

    activity = <Date 2008-09-10.23:24:57.182>
    actor = 'andrewmcnamara'
    assignee = 'gvanrossum'
    closed = True
    closed_date = <Date 2008-09-10.17:44:47.818>
    closer = 'gvanrossum'
    components = ['Regular Expressions']
    creation = <Date 2008-09-02.02:19:40.696>
    creator = 'andrewmcnamara'
    dependencies = []
    files = ['11340', '11352']
    hgrepos = []
    issue_num = 3756
    keywords = []
    message_count = 10.0
    messages = ['72309', '72310', '72353', '72371', '72498', '72509', '72759', '72760', '72978', '72996']
    nosy_count = 3.0
    nosy_names = ['gvanrossum', 'andrewmcnamara', 'pitrou']
    pr_nums = []
    priority = 'deferred blocker'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue3756'
    versions = ['Python 3.0']

    @andrewmcnamara
    Copy link
    Mannequin Author

    andrewmcnamara mannequin commented Sep 2, 2008

    In python 2, re.escape() works with either str or unicode, but in
    python 3, re.escape() no longer works correctly with the bytes type.

    @andrewmcnamara andrewmcnamara mannequin added topic-regex type-bug An unexpected behavior, bug, or error labels Sep 2, 2008
    @andrewmcnamara
    Copy link
    Mannequin Author

    andrewmcnamara mannequin commented Sep 2, 2008

    The attached "re_escape.py" is a (somewhat crappy) fix for re.escape()

    @gvanrossum
    Copy link
    Member

    Mind adding a unittest?

    @andrewmcnamara
    Copy link
    Mannequin Author

    andrewmcnamara mannequin commented Sep 2, 2008

    Will do, although I'm slightly concerned that my "bytes" version of the
    function is about 50% slower than the "str" version. I can see why, I
    just can't think of a way to do it any faster. There's an inherent
    asymetry in bytes type that didn't exist before: b''.join(list(b'abc'))
    does not work. Of course, this does work: bytes(list(b'abc')), but the
    bytes constructor only accepts ints, not bytes. I'd like to see either
    the join method accept ints as well as bytes, or the bytes ctor accept
    bytes as well as ints. Or something.

    @andrewmcnamara
    Copy link
    Mannequin Author

    andrewmcnamara mannequin commented Sep 4, 2008

    On further testing, sometimes the str version is faster, sometimes the
    bytes version is faster. Never more than about 50% one way or the
    other, so probably not worth worrying about, although I still don't
    really like the implementation. Maybe it deserves a C implementation?

    @pitrou
    Copy link
    Member

    pitrou commented Sep 4, 2008

    I don't think there are cases where re.escape is performance critical -
    are there any?
    By the way, it seems to me the simplest way to write re.escape() would
    be to use a regexp to do the replacement. It might or might not be the
    fastest.

    @andrewmcnamara
    Copy link
    Mannequin Author

    andrewmcnamara mannequin commented Sep 8, 2008

    I don't think it's possible to say whether it's preformance critical -
    I can certainly image use cases such as parser generators where its
    speed could be noticed.

    I tried building a version using regular expressions, but I couldn't do
    any better than 5x slower than the existing implementations, and the
    resulting code was less readable.

    @andrewmcnamara
    Copy link
    Mannequin Author

    andrewmcnamara mannequin commented Sep 8, 2008

    I meant "I can certainly imagine use cases..."

    In case it's not clear, I think the implementation in the patch is
    "good enough" (unless someone can suggest any obvious optimisations).

    If someone can prove that re.escape() performance is causing problems
    for other modules in the standard lib (email, ctypes, warnings,
    fnmatch, _strptime use it, among others), then we might consider a C
    implementation.

    @gvanrossum
    Copy link
    Member

    Looks fine, except I used frozenset for the _alphanum* variables and
    reverted to double quotes like the rest of the file. Submitted as r66366.

    @gvanrossum gvanrossum self-assigned this Sep 10, 2008
    @andrewmcnamara
    Copy link
    Mannequin Author

    andrewmcnamara mannequin commented Sep 10, 2008

    Looks fine, except I used frozenset for the _alphanum* variables and
    reverted to double quotes like the rest of the file. Submitted as r66366.

    All good. Thankyou.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    deferred-blocker topic-regex type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants