Message 182185 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	ezio.melotti, guido.reina, serhiy.storchaka, terry.reedy
Date	2013-02-15.22:28:06
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1360967287.27.0.72581029241.issue17183@psf.upfronthosting.co.za>
In-reply-to

Content
I would still do a benchmark, for these reasons: 1) IIRC rawdata might be the whole document (or at least everything that has not been parsed yet); 2) the '>' is very likely to be found; This situation is fairly different from the one presented in #17170, where the strings are shorts and the character is not present in the majority of the strings. Profiling and improving html.parser (and hence _markupbase) was already on my todo list (even if admittedly not anywhere near the top :), so writing a benchmark for it might be useful for further enhancements too. (Note: HTMLParser is already fairly fast, parsing ~1.3MB/s according to http://www.crummy.com/2012/02/06/0, but I've never done anything to make it even faster, so there might still be room for improvements.)

I would still do a benchmark, for these reasons:
1) IIRC rawdata might be the whole document (or at least everything that has not been parsed yet);
2) the '>' is very likely to be found;

This situation is fairly different from the one presented in #17170, where the strings are shorts and the character is not present in the majority of the strings.

Profiling and improving html.parser (and hence _markupbase) was already on my todo list (even if admittedly not anywhere near the top :), so writing a benchmark for it might be useful for further enhancements too.

(Note: HTMLParser is already fairly fast, parsing ~1.3MB/s according to http://www.crummy.com/2012/02/06/0, but I've never done anything to make it even faster, so there might still be room for improvements.)

History
Date	User	Action	Args
2013-02-15 22:28:07	ezio.melotti	set	recipients: + ezio.melotti, terry.reedy, serhiy.storchaka, guido.reina
2013-02-15 22:28:07	ezio.melotti	set	messageid: <1360967287.27.0.72581029241.issue17183@psf.upfronthosting.co.za>
2013-02-15 22:28:07	ezio.melotti	link	issue17183 messages
2013-02-15 22:28:06	ezio.melotti	create