This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mrabarnett
Recipients ezio.melotti, mrabarnett, serhiy.storchaka
Date 2013-08-08.15:05:53
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1375974354.06.0.518590129246.issue18685@psf.upfronthosting.co.za>
In-reply-to
Content
It appears that in your tests Python 3.2 is faster with Unicode than bytestrings and that unpatched Python 3.4 is a lot slower.

I get somewhat different results (Windows XP Pro, 32-bit):

C:\Python32\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 449 usec per loop

C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 506 usec per loop

C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 506 usec per loop


C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 227 usec per loop

C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 339 usec per loop

C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 504 usec per loop

For comparison, in the regex module I don't duplicate whole sections of code, but instead have a pointer to one of 3 functions (for UCS1, UCS2 and UCS4) that gets the codepoint, except for some tight loops. Doing that might be too much of a change for re.

However, the speed appears to be a lot more consistent:

C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop


C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop
History
Date User Action Args
2013-08-08 15:05:54mrabarnettsetrecipients: + mrabarnett, ezio.melotti, serhiy.storchaka
2013-08-08 15:05:54mrabarnettsetmessageid: <1375974354.06.0.518590129246.issue18685@psf.upfronthosting.co.za>
2013-08-08 15:05:54mrabarnettlinkissue18685 messages
2013-08-08 15:05:53mrabarnettcreate