Issue 20623: Run test_htmlparser with unbuffered source

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/64822

classification

Title:	Run test_htmlparser with unbuffered source
Type:	enhancement	Stage:	needs patch
Components:	Tests	Versions:	Python 3.8

process

Status:	open	Resolution:
Dependencies:	15114	Superseder:
Assigned To:	ezio.melotti	Nosy List:	ezio.melotti, r.david.murray
Priority:	normal	Keywords:

Created on 2014-02-14 05:28 by ezio.melotti, last changed 2022-04-11 14:57 by admin.

Messages (1)
msg211201 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2014-02-14 05:28
Currently test_htmlparser feeds the HTML source to the parser one char at the time (except a couple of buffering-specific tests that feed the parser with chunks of text). This ensures that the parser doesn't break when the source is fed in smaller chunks (that might end in the middle of a tag). However #20288 revealed a bug that doesn't happen while feeding the parser char by char. In order to avoid similar problems, all the tests should feed the source to the parser both char by char and as a single string. So my plan is: 1) wait until #15114 is resolved and the strict mode and the strict tests are removed; 2) either change TestCaseBase._run_check() to run every test twice (possibly by using subTest), or use a subclass-based approach with a different _run_check in the two subclasses. A few notes about this: * a third kind of test that feeds the parser with chunk of arbitrary length (e.g. 5 chars) could be added as well; * the increase in run-time shouldn't matter, since all the tests take very little time to run; * I don't think it's necessary to backport this to 2.7/3.3/3.4 because it's a somewhat major refactoring, and if a bug is introduced by other changes the tests in 3.5 will find it (I expect all the bug fixes and new features to land in 3.5 too);

msg211201 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2014-02-14 05:28

Currently test_htmlparser feeds the HTML source to the parser one char at the time (except a couple of buffering-specific tests that feed the parser with chunks of text).  This ensures that the parser doesn't break when the source is fed in smaller chunks (that might end in the middle of a tag).  However #20288 revealed a bug that doesn't happen while feeding the parser char by char.

In order to avoid similar problems, all the tests should feed the source to the parser both char by char and as a single string.
So my plan is:
1) wait until #15114 is resolved and the strict mode and the strict tests are removed;
2) either change TestCaseBase._run_check() to run every test twice (possibly by using subTest), or use a subclass-based approach with a different _run_check in the two subclasses.

A few notes about this:
* a third kind of test that feeds the parser with chunk of arbitrary length (e.g. 5 chars) could be added as well;
* the increase in run-time shouldn't matter, since all the tests take very little time to run;
* I don't think it's necessary to backport this to 2.7/3.3/3.4 because it's a somewhat major refactoring, and if a bug is introduced by other changes the tests in 3.5 will find it (I expect all the bug fixes and new features to land in 3.5 too);

History
Date	User	Action	Args
2022-04-11 14:57:58	admin	set	github: 64822
2018-12-12 08:30:43	serhiy.storchaka	set	type: behavior -> enhancement versions: + Python 3.8, - Python 3.5
2014-02-14 05:29:14	ezio.melotti	set	dependencies: + Deprecate strict mode of HTMLParser
2014-02-14 05:28:56	ezio.melotti	create