I would agree if the HTMLParser was compliant with the HTML 4.01 specs, but since it's more permissive and uses its own heuristic to determine what should be parsed and what shouldn't, I think it's better to use already existing heuristics (either the HTML5 ones or the ones used by the browsers).
I.e., I'm not trying to make it HTML5 compliant, just to make it work with what works on the browsers.
