Currently test_htmlparser feeds the HTML source to the parser one char at the time (except a couple of buffering-specific tests that feed the parser with chunks of text). This ensures that the parser doesn't break when the source is fed in smaller chunks (that might end in the middle of a tag). However #20288 revealed a bug that doesn't happen while feeding the parser char by char.
In order to avoid similar problems, all the tests should feed the source to the parser both char by char and as a single string.
So my plan is:
1) wait until #15114 is resolved and the strict mode and the strict tests are removed;
2) either change TestCaseBase._run_check() to run every test twice (possibly by using subTest), or use a subclass-based approach with a different _run_check in the two subclasses.
A few notes about this:
* a third kind of test that feeds the parser with chunk of arbitrary length (e.g. 5 chars) could be added as well;
* the increase in run-time shouldn't matter, since all the tests take very little time to run;
* I don't think it's necessary to backport this to 2.7/3.3/3.4 because it's a somewhat major refactoring, and if a bug is introduced by other changes the tests in 3.5 will find it (I expect all the bug fixes and new features to land in 3.5 too);
|