Message182626
I did some macro-benchmarks and the proposed changes don't seem to affect the result (most likely because they are in _parse_doctype_element and _parse_doctype_attlist which should be called only once per document).
I did some profiling, and this is the result:
4437196 function calls (4436748 primitive calls) in 36.582 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
92931 7.400 0.000 17.082 0.000 parser.py:320(parse_starttag)
202 6.363 0.032 36.281 0.180 parser.py:171(goahead)
673285 5.302 0.000 5.302 0.000 {method 'match' of '_sre.SRE_Pattern' objects}
369418 3.272 0.000 4.554 0.000 _markupbase.py:48(updatepos)
83243 2.698 0.000 4.639 0.000 parser.py:421(parse_endtag)
308882 2.006 0.000 2.006 0.000 {method 'group' of '_sre.SRE_Match' objects}
270074 1.521 0.000 1.521 0.000 {method 'search' of '_sre.SRE_Pattern' objects}
92931 1.150 0.000 2.643 0.000 parser.py:378(check_for_whole_start_tag)
291079 1.028 0.000 1.028 0.000 {method 'count' of 'str' objects}
295892 0.883 0.000 0.883 0.000 {method 'startswith' of 'str' objects}
387439 0.733 0.000 0.733 0.000 {method 'lower' of 'str' objects}
403922 0.642 0.000 0.642 0.000 {method 'end' of '_sre.SRE_Match' objects}
124512 0.406 0.000 1.156 0.000 parser.py:504(unescape)
186775 0.326 0.000 0.326 0.000 {method 'start' of '_sre.SRE_Match' objects}
96213 0.255 0.000 0.255 0.000 {method 'endswith' of 'str' objects}
59522 0.253 0.000 0.253 0.000 {method 'rindex' of 'str' objects}
83226 0.215 0.000 0.215 0.000 parser.py:164(clear_cdata_mode)
6428 0.194 0.000 0.337 0.000 parser.py:507(replaceEntities)
106487 0.183 0.000 0.183 0.000 parser.py:484(handle_data)
Excluding string and regex methods, the 3 slowest methods are parse_starttag, goahead, and updatepos.
The attached patch adds a couple of simple optimizations to the first two -- I couldn't think a way to optimize updatepos.
The resulting speedup is however fairly small, so I'm not sure it's worth applying the patch.
I might try doing other benchmarks in future (should I add them somewhere in Tools?). |
|
Date |
User |
Action |
Args |
2013-02-22 03:46:48 | ezio.melotti | set | recipients:
+ ezio.melotti, terry.reedy, serhiy.storchaka, guido.reina |
2013-02-22 03:46:48 | ezio.melotti | set | messageid: <1361504808.06.0.912762802933.issue17183@psf.upfronthosting.co.za> |
2013-02-22 03:46:48 | ezio.melotti | link | issue17183 messages |
2013-02-22 03:46:46 | ezio.melotti | create | |
|