Author steven.daprano
Recipients ezio.melotti, hanno, steven.daprano
Date 2018-02-19.23:02:09
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1519081329.2.0.467229070634.issue32876@psf.upfronthosting.co.za>
In-reply-to
Content
The stdlib HTML parser requires correct HTML.

To parse broken HTML, as you find in the real world, you need a third-party library like BeautifulSoup. BeautifulSoup is much more complex (about 7-8 times as many LOC) but can handle nearly anything a browser can.

I doubt the stdlib will ever compete with BeautifulSoup.
History
Date User Action Args
2018-02-19 23:02:09steven.dapranosetrecipients: + steven.daprano, ezio.melotti, hanno
2018-02-19 23:02:09steven.dapranosetmessageid: <1519081329.2.0.467229070634.issue32876@psf.upfronthosting.co.za>
2018-02-19 23:02:09steven.dapranolinkissue32876 messages
2018-02-19 23:02:09steven.dapranocreate