Message 71207 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	ajaksu2, barry, jackdied, pitrou
Date	2008-08-16.10:31:52
SpamBayes Score	1.5526469e-13
Marked as misclassified	No
Message-id	<1218882713.93.0.451820211561.issue2676@psf.upfronthosting.co.za>
In-reply-to

Content
Hi Jack, > Antoine, I looked at your patch and I'm not sure why you applied it > instead of applying mine (or saying +1 on me applying my patch). > > Yours uses str.partition which I pointed out is sub-optimal (same big-Oh > but with a larger constant factor) and also adds a function that returns > two things, one of which is thrown away after having a str.strip > performed on it. I added that function so that the header splitting facility is explicitly exposed as an internal API, as was the case with the regular expression. I tried to mimick the behaviour of the regex as closely as possible, which meant returning two things as well :-) I think the point of the issue is to remove the pathological (exponential) behaviour when parsing some headers, not to try to squeeze out the last microseconds out of content-type parsing (which shouldn't be, IMO, the limiting factor in email handling performance as soon as it's not super-linear). That said, I've timed the function against the regular expression and the former is always faster, even for tiny strings (e.g. "a;b"). Your patch was keeping the regular expression as a module-level constant while replacing all uses of it with a function, which I found a bit strange (I don't think people are using paramre from the outside since it's not documented, it's an internal not public API IMO). I also found it strange to devote a docstring to the discussion of a performance detail. But I don't have any strong feeling against it either, so you can still apply it if you think it's important performance-wise. Regards Antoine.

Hi Jack,

> Antoine, I looked at your patch and I'm not sure why you applied it
> instead of applying mine (or saying +1 on me applying my patch).
> 
> Yours uses str.partition which I pointed out is sub-optimal (same big-Oh
> but with a larger constant factor) and also adds a function that returns
> two things, one of which is thrown away after having a str.strip
> performed on it.

I added that function so that the header splitting facility is
explicitly exposed as an internal API, as was the case with the regular
expression. I tried to mimick the behaviour of the regex as closely as
possible, which meant returning two things as well :-)

I think the point of the issue is to remove the pathological
(exponential) behaviour when parsing some headers, not to try to squeeze
out the last microseconds out of content-type parsing (which shouldn't
be, IMO, the limiting factor in email handling performance as soon as
it's not super-linear).

That said, I've timed the function against the regular expression and
the former is always faster, even for tiny strings (e.g. "a;b").

Your patch was keeping the regular expression as a module-level constant
while replacing all uses of it with a function, which I found a bit
strange (I don't think people are using paramre from the outside since
it's not documented, it's an internal not public API IMO). I also found
it strange to devote a docstring to the discussion of a performance
detail. But I don't have any strong feeling against it either, so you
can still apply it if you think it's important performance-wise.

Regards

Antoine.

History
Date	User	Action	Args
2008-08-16 10:31:54	pitrou	set	recipients: + pitrou, barry, jackdied, ajaksu2
2008-08-16 10:31:53	pitrou	set	messageid: <1218882713.93.0.451820211561.issue2676@psf.upfronthosting.co.za>
2008-08-16 10:31:53	pitrou	link	issue2676 messages
2008-08-16 10:31:52	pitrou	create