Author ezio.melotti
Recipients ezio.melotti, guido.reina, serhiy.storchaka, terry.reedy
Date 2013-02-22.03:46:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1361504808.06.0.912762802933.issue17183@psf.upfronthosting.co.za>
In-reply-to
Content
I did some macro-benchmarks and the proposed changes don't seem to affect the result (most likely because they are in _parse_doctype_element and _parse_doctype_attlist which should be called only once per document).

I did some profiling, and this is the result:
         4437196 function calls (4436748 primitive calls) in 36.582 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    92931    7.400    0.000   17.082    0.000 parser.py:320(parse_starttag)
      202    6.363    0.032   36.281    0.180 parser.py:171(goahead)
   673285    5.302    0.000    5.302    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
   369418    3.272    0.000    4.554    0.000 _markupbase.py:48(updatepos)
    83243    2.698    0.000    4.639    0.000 parser.py:421(parse_endtag)
   308882    2.006    0.000    2.006    0.000 {method 'group' of '_sre.SRE_Match' objects}
   270074    1.521    0.000    1.521    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
    92931    1.150    0.000    2.643    0.000 parser.py:378(check_for_whole_start_tag)
   291079    1.028    0.000    1.028    0.000 {method 'count' of 'str' objects}
   295892    0.883    0.000    0.883    0.000 {method 'startswith' of 'str' objects}
   387439    0.733    0.000    0.733    0.000 {method 'lower' of 'str' objects}
   403922    0.642    0.000    0.642    0.000 {method 'end' of '_sre.SRE_Match' objects}
   124512    0.406    0.000    1.156    0.000 parser.py:504(unescape)
   186775    0.326    0.000    0.326    0.000 {method 'start' of '_sre.SRE_Match' objects}
    96213    0.255    0.000    0.255    0.000 {method 'endswith' of 'str' objects}
    59522    0.253    0.000    0.253    0.000 {method 'rindex' of 'str' objects}
    83226    0.215    0.000    0.215    0.000 parser.py:164(clear_cdata_mode)
     6428    0.194    0.000    0.337    0.000 parser.py:507(replaceEntities)
   106487    0.183    0.000    0.183    0.000 parser.py:484(handle_data)

Excluding string and regex methods, the 3 slowest methods are parse_starttag, goahead, and updatepos.
The attached patch adds a couple of simple optimizations to the first two -- I couldn't think a way to optimize updatepos.
The resulting speedup is however fairly small, so I'm not sure it's worth applying the patch.
I might try doing other benchmarks in future (should I add them somewhere in Tools?).
History
Date User Action Args
2013-02-22 03:46:48ezio.melottisetrecipients: + ezio.melotti, terry.reedy, serhiy.storchaka, guido.reina
2013-02-22 03:46:48ezio.melottisetmessageid: <1361504808.06.0.912762802933.issue17183@psf.upfronthosting.co.za>
2013-02-22 03:46:48ezio.melottilinkissue17183 messages
2013-02-22 03:46:46ezio.melotticreate