Message 251657 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	ezio.melotti, frogcoder
Date	2015-09-26.17:06:11
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1443287172.45.0.594852727462.issue25239@psf.upfronthosting.co.za>
In-reply-to

Content
This seems indeed to be a bug. The relevant bit seems to be at http://www.w3.org/TR/html5/syntax.html#consume-a-character-reference : """ If the character reference is being consumed as part of an attribute, and the last character matched is not a ";" (U+003B) character, and the next character is either a "=" (U+003D) character or an alphanumeric ASCII character, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned. However, if this next character is in fact a "=" (U+003D) character, then this is a parse error, because some legacy user agents will misinterpret the markup in those cases. """ Off the top of my head, this paragraph is not implemented in HTMLParser (and it should). Also note that <a href="go?t=buy&currency=usd">foo</a> is not valid HTML and the & should have been escaped with &.

This seems indeed to be a bug.  The relevant bit seems to be at http://www.w3.org/TR/html5/syntax.html#consume-a-character-reference :

"""
If the character reference is being consumed as part of an attribute, and the last character matched is not a ";" (U+003B) character, and the next character is either a "=" (U+003D) character or an alphanumeric ASCII character, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned. However, if this next character is in fact a "=" (U+003D) character, then this is a parse error, because some legacy user agents will misinterpret the markup in those cases.
"""

Off the top of my head, this paragraph is not implemented in HTMLParser (and it should).
Also note that <a href="go?t=buy&currency=usd">foo</a> is not valid HTML and the & should have been escaped with &amp;.

History
Date	User	Action	Args
2015-09-26 17:06:12	ezio.melotti	set	recipients: + ezio.melotti, frogcoder
2015-09-26 17:06:12	ezio.melotti	set	messageid: <1443287172.45.0.594852727462.issue25239@psf.upfronthosting.co.za>
2015-09-26 17:06:12	ezio.melotti	link	issue25239 messages
2015-09-26 17:06:11	ezio.melotti	create