This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients lukasz.langa, serhiy.storchaka
Date 2018-04-26.14:50:37
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1524754237.39.0.682650639539.issue33338@psf.upfronthosting.co.za>
In-reply-to
Content
It seems to me that regular expressions used in the lib2to3 version are more efficient but more complex.

$ ./python -m timeit -s 'import re; p = re.compile(r"0[bB](?:_?[01])+"); s = "0b"+"_0101"*16' 'p.match(s)'
100000 loops, best of 5: 2.45 usec per loop

$ ./python -m timeit -s 'import re; p = re.compile(r"0[bB]_?[01]+(?:_[01]+)*"); s = "0b"+"_0101"*16' 'p.match(s)'
200000 loops, best of 5: 1.08 usec per loop

$ ./python -m timeit -s 'import re; p = re.compile(r"0[xX](?:_?[0-9a-fA-F])+[lL]?"); s = "0x_0123_4567_89ab_cdef"' 'p.match(s)'
500000 loops, best of 5: 815 nsec per loop

$ ./python -m timeit -s 'import re; p = re.compile(r"0[xX]_?[\da-fA-F]+(?:_[\da-fA-F]+)*[lL]?"); s = "0x_0123_4567_89ab_cdef"' 'p.match(s)'
500000 loops, best of 5: 542 nsec per loop

Since the performance of lib2to3 is important, it is better to keep the current regexpes.

But using \d in Python 3 is a bug, it should be replaced with [0-9]. This also speeds up the regex:

$ ./python -m timeit -s 'import re; p = re.compile(r"0[xX]_?[0-9a-fA-F]+(?:_[0-9a-fA-F]+)*[lL]?"); s = "0x_0123_4567_89ab_cdef"' 'p.match(s)'
500000 loops, best of 5: 471 nsec per loop
History
Date User Action Args
2018-04-26 14:50:37serhiy.storchakasetrecipients: + serhiy.storchaka, lukasz.langa
2018-04-26 14:50:37serhiy.storchakasetmessageid: <1524754237.39.0.682650639539.issue33338@psf.upfronthosting.co.za>
2018-04-26 14:50:37serhiy.storchakalinkissue33338 messages
2018-04-26 14:50:37serhiy.storchakacreate