Message 376607 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nowasky.jr
Recipients	nowasky.jr
Date	2020-09-08.21:59:29
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1599602370.01.0.842812753139.issue41748@roundup.psfhosted.org>
In-reply-to

Content
HTML tags that have a attribute name starting with a comma character aren't parsed and break future calls to feed(). The problem occurs when such attribute is the second one or later in the HTML tag. Doesn't seems to affect when it's the first attribute. #POC: from html.parser import HTMLParser class MyHTMLParser(HTMLParser): def handle_starttag(self, tag, attrs): print("Encountered a start tag:", tag) parser = MyHTMLParser() #This is ok parser.feed('<yyy id="poc" a,="">') #This breaks parser.feed('<zzz id="poc" ,a="">') #Future calls to feed() will not work parser.feed('<img id="poc" src=x>')

HTML tags that have a attribute name starting with a comma character aren't parsed and break future calls to feed(). 

The problem occurs when such attribute is the second one or later in the HTML tag. Doesn't seems to affect when it's the first attribute.

#POC:

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print("Encountered a start tag:", tag)

parser = MyHTMLParser()

#This is ok
parser.feed('<yyy id="poc" a,="">')

#This breaks
parser.feed('<zzz id="poc" ,a="">')

#Future calls to feed() will not work
parser.feed('<img id="poc" src=x>')

History
Date	User	Action	Args
2020-09-08 21:59:30	nowasky.jr	set	recipients: + nowasky.jr
2020-09-08 21:59:30	nowasky.jr	set	messageid: <1599602370.01.0.842812753139.issue41748@roundup.psfhosted.org>
2020-09-08 21:59:30	nowasky.jr	link	issue41748 messages
2020-09-08 21:59:29	nowasky.jr	create