Message 269849 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ammar2
Recipients	ammar2, benjamin.peterson, ezio.melotti, lemburg, pitrou, vstinner
Date	2016-07-05.20:10:09
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1467749411.27.0.801295605617.issue27458@psf.upfronthosting.co.za>
In-reply-to

Content
So currently as far as string concatenation goes. ceval has this nice little branch it can take if both operators are unicode types. However, since this check is an Exact check, it means that subtypes of unicode end up going through the slow code path through: PyNumber_Add -> PyUnicode_Concat. This patch aims to allow subtypes to take that optimized branch without breaking any existing behavior and without any more memory copy calls than necessary. The motivation for this change is that some templating engines (Mako/Jinja2/Cheetah) use stuff like MarkupSafe which is implemented with a unicode subtype called `Markup`. Concatenating these custom objects (pretty common for templating engines) is fairly slow. This change modifies and uses the existing cpython code to make it a fair bit faster. I think the only real "dangerous" change in here is in the cast_unicode_subtype_to_base function which uses a trick at the end to prevent deallocation of memory. I've made sure to keep it well commented but I'd appreciate any feedback on it. From what I can tell from running the test suite, all tests pass and there don't seem to be any new reference leaks.

So currently as far as string concatenation goes. ceval has this nice little branch it can take if both operators are unicode types. However, since this check is an Exact check, it means that subtypes of unicode end up going through the slow code path through: PyNumber_Add -> PyUnicode_Concat.

This patch aims to allow subtypes to take that optimized branch without breaking any existing behavior and without any more memory copy calls than necessary.

The motivation for this change is that some templating engines (Mako/Jinja2/Cheetah) use stuff like MarkupSafe which is implemented with a unicode subtype called `Markup`. Concatenating these custom objects (pretty common for templating engines) is fairly slow. This change modifies and uses the existing cpython code to make it a fair bit faster.

I think the only real "dangerous" change in here is in the cast_unicode_subtype_to_base function which uses a trick at the end to prevent deallocation of memory. I've made sure to keep it well commented but I'd appreciate any feedback on it.

From what I can tell from running the test suite, all tests pass and there don't seem to be any new reference leaks.

History
Date	User	Action	Args
2016-07-05 20:10:12	ammar2	set	recipients: + ammar2, lemburg, pitrou, vstinner, benjamin.peterson, ezio.melotti
2016-07-05 20:10:11	ammar2	set	messageid: <1467749411.27.0.801295605617.issue27458@psf.upfronthosting.co.za>
2016-07-05 20:10:11	ammar2	link	issue27458 messages
2016-07-05 20:10:10	ammar2	create