This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients Patrick Maupin, ezio.melotti, mrabarnett, serhiy.storchaka
Date 2015-06-13.11:12:37
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1434193958.62.0.638871759418.issue24426@psf.upfronthosting.co.za>
In-reply-to
Content
> 1) Do you know if anybody maintains a patched version of the Python code anywhere?  I could put a package up on github/PyPI, if not.

Sorry, perhaps I misunderstood you. There are unofficial mirrors of CPython on Bitbucket [1] and GitHub [2]. They don't contain unofficial patches, but perhaps there are private clones with additional patches. Of course different Linux distributives can provide Python with own patches. And you can maintain private fork of CPython with your patches for your own or your company needs.

But if you needs only optimized regular expressions, I suggest you to look on the regex module [3]. It is more powerful and better supports Unicode.

Results of the same mickrobenchmarks for regex:

$ ./python -m timeit -s "import regex as re; p = re.compile('\n'); s = ('a'*100 + '\n')*1000" -- "p.split(s)"
1000 loops, best of 3: 544 usec per loop
$ ./python -m timeit -s "import regex as re; p = re.compile('(\n)'); s = ('a'*100 + '\n')*1000" -- "p.split(s)"
1000 loops, best of 3: 661 usec per loop
$ ./python -m timeit -s "import regex as re; p = re.compile('\n\r'); s = ('a'*100 + '\n\r')*1000" -- "p.split(s)"
1000 loops, best of 3: 521 usec per loop
$ ./python -m timeit -s "import regex as re; p = re.compile('(\n\r)'); s = ('a'*100 + '\n\r')*1000" -- "p.split(s)"
1000 loops, best of 3: 743 usec per loop

regex is slightly slower than optimized re in these cases, but is much faster than non-optimized re in the case of splitting with capturing group.

> 2) Do you know if anybody has done a good writeup on the behavior of the instruction stream to the C engine?  I could try to do some work on this and put it with the package, if not, or point to it if so.

Sorry, I don't understood you. Do you mean documenting codes of compiled re pattern? This is implementation detail and will be changed in future.

[1] https://bitbucket.org/mirror/cpython
[2] https://github.com/python/cpython
[3] https://pypi.python.org/pypi/regex
History
Date User Action Args
2015-06-13 11:12:38serhiy.storchakasetrecipients: + serhiy.storchaka, ezio.melotti, mrabarnett, Patrick Maupin
2015-06-13 11:12:38serhiy.storchakasetmessageid: <1434193958.62.0.638871759418.issue24426@psf.upfronthosting.co.za>
2015-06-13 11:12:38serhiy.storchakalinkissue24426 messages
2015-06-13 11:12:37serhiy.storchakacreate