Message 252150 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Chenyun Yang
Recipients	Chenyun Yang, ezio.melotti, josh.r, martin.panter, xiang.zhang
Date	2015-10-02.19:18:00
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAJZGQB471E26pn7sx5UV9QLZYCZ=KSctugdrdn80gfW_KrmfAQ@mail.gmail.com>
In-reply-to	<1443665115.49.0.126579485236.issue25258@psf.upfronthosting.co.za>

Content
the example you give for <li> is a different case. <img>, <link> are void elements which are allowed to have no close tag; <li> without </li> is a browser implementation detail, most browser autocompletes </li>. Without the parser calls the handle_endtag(), the client code which uses HTMLParser won't be able to know whether the a traversal is finished. Do you have a strong reason why we should include the knowledge of void elements into the HTMLParser at this line? https://github.com/python/cpython/blob/bdfb14c688b873567d179881fc5bb67363a6074c/Lib/html/parser.py#L341 if end.endswith('/>') or (end.endswith('>') and tag in VOID_ELEMENTS) On Wed, Sep 30, 2015 at 7:05 PM, Martin Panter <report@bugs.python.org> wrote: > > Martin Panter added the comment: > > My thinking is that the knowledge that <img> does not have a closing tag > is at a higher level than the current HTMLParser class. It is similar to > knowing where the following HTML implicitly closes the <li> elements: > > <ul><li>Item A<li>Item B</ul> > > In both cases I would not expect the HTMLParser to report “virtual” empty > or closing tags. I don’t think it should report an empty <img/> or closing > </img> tag just because that is easy to do, because it would be > inconsistent with other implied HTML tags. But maybe see what other people > say. > > I don’t know your particular use case, but I would suggest if you need to > parse non-XML HTML <img> tags, use the handle_starttag() method and don’t > rely on the end tag :) > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue25258> > _______________________________________ >

the example you give for <li> is a different case.

<img>, <link> are void elements which are allowed to have no close tag;
<li> without </li> is a browser implementation detail, most browser
autocompletes </li>.

Without the parser calls the handle_endtag(), the client code which uses
HTMLParser won't be able to know whether the a traversal is finished.

Do you have a strong reason why we should include the knowledge of  void
elements into the HTMLParser at this line?

https://github.com/python/cpython/blob/bdfb14c688b873567d179881fc5bb67363a6074c/Lib/html/parser.py#L341

if end.endswith('/>') or (end.endswith('>') and tag in VOID_ELEMENTS)

On Wed, Sep 30, 2015 at 7:05 PM, Martin Panter <report@bugs.python.org>
wrote:

>
> Martin Panter added the comment:
>
> My thinking is that the knowledge that <img> does not have a closing tag
> is at a higher level than the current HTMLParser class. It is similar to
> knowing where the following HTML implicitly closes the <li> elements:
>
> <ul><li>Item A<li>Item B</ul>
>
> In both cases I would not expect the HTMLParser to report “virtual” empty
> or closing tags. I don’t think it should report an empty <img/> or closing
> </img> tag just because that is easy to do, because it would be
> inconsistent with other implied HTML tags. But maybe see what other people
> say.
>
> I don’t know your particular use case, but I would suggest if you need to
> parse non-XML HTML <img> tags, use the handle_starttag() method and don’t
> rely on the end tag :)
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue25258>
> _______________________________________
>

History
Date	User	Action	Args
2015-10-02 19:18:01	Chenyun Yang	set	recipients: + Chenyun Yang, ezio.melotti, martin.panter, josh.r, xiang.zhang
2015-10-02 19:18:01	Chenyun Yang	link	issue25258 messages
2015-10-02 19:18:00	Chenyun Yang	create