Message 237483 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	docs@python, ezio.melotti, martin.panter, r.david.murray, serhiy.storchaka, xkjq
Date	2015-03-08.00:08:10
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1425773290.85.0.88764030757.issue23144@psf.upfronthosting.co.za>
In-reply-to

Content
> I still think it would be worthwhile adding close() calls to > the examples in the documentation (Doc/library/html.parser.rst). If I add context manager support to HTMLParser I can update the examples to use it, but otherwise I don't think it's worth changing them now. > BTW I haven’t tested this, and maybe it is not a concern, but even with > this patch it looks like the parser will buffer unlimited data and > output nothing until close() if each string it is fed ends with an > ampersand (and otherwise contains only plain text, no tags etc). This is true, but I don't think it's a realistic case. For this to be a problem you would need: 1) Someone feeding the parser with arbitrary chunks. Text files are usually fed to the parser whole, or line by line -- arbitrary chunks are uncommon. 2) A file that contains lot of entities. In most documents charrefs are not very common, and so the chances that a chunk will split one in the middle is low. Chances that several consecutive charrefs are split in the middle is even lower. 3) A file that is very big. Even if all the file is buffered until a call to close(), it shouldn't be a concern, since most files have relatively small size. It is true that this has a quadratic complexity, but I would expect the parsing to complete in a reasonable time for average sizes.

> I still think it would be worthwhile adding close() calls to
> the examples in the documentation (Doc/library/html.parser.rst).

If I add context manager support to HTMLParser I can update the examples to use it, but otherwise I don't think it's worth changing them now.

> BTW I haven’t tested this, and maybe it is not a concern, but even with
> this patch it looks like the parser will buffer unlimited data and
> output nothing until close() if each string it is fed ends with an 
> ampersand (and otherwise contains only plain text, no tags etc).

This is true, but I don't think it's a realistic case.
For this to be a problem you would need:
1) Someone feeding the parser with arbitrary chunks.  Text files are usually fed to the parser whole, or line by line -- arbitrary chunks are uncommon.
2) A file that contains lot of entities.  In most documents charrefs are not very common, and so the chances that a chunk will split one in the middle is low.  Chances that several consecutive charrefs are split in the middle is even lower.
3) A file that is very big.  Even if all the file is buffered until a call to close(), it shouldn't be a concern, since most files have relatively small size.  It is true that this has a quadratic complexity, but I would expect the parsing to complete in a reasonable time for average sizes.

History
Date	User	Action	Args
2015-03-08 00:08:10	ezio.melotti	set	recipients: + ezio.melotti, r.david.murray, docs@python, martin.panter, serhiy.storchaka, xkjq
2015-03-08 00:08:10	ezio.melotti	set	messageid: <1425773290.85.0.88764030757.issue23144@psf.upfronthosting.co.za>
2015-03-08 00:08:10	ezio.melotti	link	issue23144 messages
2015-03-08 00:08:10	ezio.melotti	create