Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn about octal escapes > 0o377 in re #66558

Closed
serhiy-storchaka opened this issue Sep 8, 2014 · 11 comments
Closed

Warn about octal escapes > 0o377 in re #66558

serhiy-storchaka opened this issue Sep 8, 2014 · 11 comments
Assignees
Labels
stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 22362
Nosy @pitrou, @vstinner, @ezio-melotti, @serhiy-storchaka
Files
  • re_octal_escape_overflow.patch
  • re_octal_escape_overflow_raise.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-09-23.20:28:00.122>
    created_at = <Date 2014-09-08.11:07:20.864>
    labels = ['expert-regex', 'type-feature', 'library']
    title = 'Warn about octal escapes > 0o377 in re'
    updated_at = <Date 2014-09-23.20:28:00.121>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2014-09-23.20:28:00.121>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-09-23.20:28:00.122>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)', 'Regular Expressions']
    creation = <Date 2014-09-08.11:07:20.864>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['36571', '36602']
    hgrepos = []
    issue_num = 22362
    keywords = ['patch']
    message_count = 11.0
    messages = ['226570', '226798', '226801', '226809', '226826', '227036', '227039', '227040', '227238', '227386', '227387']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'vstinner', 'ezio.melotti', 'mrabarnett', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue22362'
    versions = ['Python 3.5']

    @serhiy-storchaka
    Copy link
    Member Author

    Currently the re module accepts octal escapes from \400 to \777, but ignore highest bit.

    >>> re.search(r'\542', 'abc')
    <_sre.SRE_Match object; span=(1, 2), match='b'>

    This behavior looks surprising and is inconsistent with the regex module which preserve highest bit. Such escaping is not portable across different regular exception engines.

    I propose to add a warning when octal escape value is larger than 0o377. Here is preliminary patch which adds UserWarning. Or may be better to emit DeprecationWarning and then replace it by ValueError in future releases?

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement labels Sep 8, 2014
    @pitrou
    Copy link
    Member

    pitrou commented Sep 11, 2014

    I think we should simply raise ValueError in 3.5. There's no reason to accept such invalid escapes.

    @serhiy-storchaka
    Copy link
    Member Author

    Well, here is a patch which makes re raise an exception on suspicious octals.

    @vstinner
    Copy link
    Member

    re_octal_escape_overflow_raise.patch: you should write a subfunction to not repeat the error message 3 times.

    + if c > 0o377:

    Hum, I never use octal. 255 instead of 0o377 would be less surprising :-p By the way, you should also check for negative numbers.

    >>> -3 & 0xff
    253

    Before, "& 0xff" also converted negative numbers to positive in range 0..255.

    @serhiy-storchaka
    Copy link
    Member Author

    By the way, you should also check for negative numbers.

    Not in this case. You can't construct negative number from three octal digits.

    @serhiy-storchaka
    Copy link
    Member Author

    Warning or exception? This is a question.

    @serhiy-storchaka serhiy-storchaka self-assigned this Sep 18, 2014
    @vstinner
    Copy link
    Member

    Warning or exception? This is a question.

    Using -Werror, warnings raise exceptions :-)

    @pitrou
    Copy link
    Member

    pitrou commented Sep 18, 2014

    This is an error, so it should really be an exception. There's no use case for being lenient, IMO.

    @serhiy-storchaka
    Copy link
    Member Author

    If this is error, should the patch be applied to maintained releases?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 23, 2014

    New changeset 3b32f495fb38 by Serhiy Storchaka in branch 'default':
    Issue bpo-22362: Forbidden ambiguous octal escapes out of range 0-0o377 in
    https://hg.python.org/cpython/rev/3b32f495fb38

    @serhiy-storchaka
    Copy link
    Member Author

    Thanks Antoine and Victor for the review.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-regex type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants