Message 315802 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	lukasz.langa, serhiy.storchaka
Date	2018-04-26.14:50:37
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1524754237.39.0.682650639539.issue33338@psf.upfronthosting.co.za>
In-reply-to

Content
It seems to me that regular expressions used in the lib2to3 version are more efficient but more complex. $ ./python -m timeit -s 'import re; p = re.compile(r"0[bB](?:_?[01])+"); s = "0b"+"_0101"16' 'p.match(s)' 100000 loops, best of 5: 2.45 usec per loop $ ./python -m timeit -s 'import re; p = re.compile(r"0[bB]_?[01]+(?:_[01]+)"); s = "0b"+"_0101"16' 'p.match(s)' 200000 loops, best of 5: 1.08 usec per loop $ ./python -m timeit -s 'import re; p = re.compile(r"0[xX](?:_?[0-9a-fA-F])+[lL]?"); s = "0x_0123_4567_89ab_cdef"' 'p.match(s)' 500000 loops, best of 5: 815 nsec per loop $ ./python -m timeit -s 'import re; p = re.compile(r"0[xX]_?[\da-fA-F]+(?:_[\da-fA-F]+)[lL]?"); s = "0x_0123_4567_89ab_cdef"' 'p.match(s)' 500000 loops, best of 5: 542 nsec per loop Since the performance of lib2to3 is important, it is better to keep the current regexpes. But using \d in Python 3 is a bug, it should be replaced with [0-9]. This also speeds up the regex: $ ./python -m timeit -s 'import re; p = re.compile(r"0[xX]_?[0-9a-fA-F]+(?:_[0-9a-fA-F]+)*[lL]?"); s = "0x_0123_4567_89ab_cdef"' 'p.match(s)' 500000 loops, best of 5: 471 nsec per loop

It seems to me that regular expressions used in the lib2to3 version are more efficient but more complex.

$ ./python -m timeit -s 'import re; p = re.compile(r"0[bB](?:_?[01])+"); s = "0b"+"_0101"*16' 'p.match(s)'
100000 loops, best of 5: 2.45 usec per loop

$ ./python -m timeit -s 'import re; p = re.compile(r"0[bB]_?[01]+(?:_[01]+)*"); s = "0b"+"_0101"*16' 'p.match(s)'
200000 loops, best of 5: 1.08 usec per loop

$ ./python -m timeit -s 'import re; p = re.compile(r"0[xX](?:_?[0-9a-fA-F])+[lL]?"); s = "0x_0123_4567_89ab_cdef"' 'p.match(s)'
500000 loops, best of 5: 815 nsec per loop

$ ./python -m timeit -s 'import re; p = re.compile(r"0[xX]_?[\da-fA-F]+(?:_[\da-fA-F]+)*[lL]?"); s = "0x_0123_4567_89ab_cdef"' 'p.match(s)'
500000 loops, best of 5: 542 nsec per loop

Since the performance of lib2to3 is important, it is better to keep the current regexpes.

But using \d in Python 3 is a bug, it should be replaced with [0-9]. This also speeds up the regex:

$ ./python -m timeit -s 'import re; p = re.compile(r"0[xX]_?[0-9a-fA-F]+(?:_[0-9a-fA-F]+)*[lL]?"); s = "0x_0123_4567_89ab_cdef"' 'p.match(s)'
500000 loops, best of 5: 471 nsec per loop

History
Date	User	Action	Args
2018-04-26 14:50:37	serhiy.storchaka	set	recipients: + serhiy.storchaka, lukasz.langa
2018-04-26 14:50:37	serhiy.storchaka	set	messageid: <1524754237.39.0.682650639539.issue33338@psf.upfronthosting.co.za>
2018-04-26 14:50:37	serhiy.storchaka	link	issue33338 messages
2018-04-26 14:50:37	serhiy.storchaka	create