Issue 42821: HTMLParser: subsequent duplicate attributes should be ignored

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/86987

classification

Title:	HTMLParser: subsequent duplicate attributes should be ignored
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.10

process

Created on 2021-01-04 08:00 by karlcow, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg384308 - (view)	Author: karl (karlcow) *	Date: 2021-01-04 08:00
This comes up while working on issue 41748 browser input data:text/html,<!doctype html><div class="bar" class="foo">text</div> browser output <div class="bar">text</div> Actual HTMLParser output see https://github.com/python/cpython/pull/24072#discussion_r551158342 ('starttag', 'div', [('class', 'bar'), ('class', 'foo')])] Expected HTMLParser output ('starttag', 'div', [('class', 'bar')])]
msg384475 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2021-01-06 06:46
If we follow the behavior of the browser, we will have to pick one of the two values and discard the other, making this value unaccessible. If we provide both, scripts and libraries that use HTMLParser will have access to both and can decide what to do. For example BeautifulSoup already does the right thing: >>> bs4.BeautifulSoup('<!doctype html><div class="bar" class="foo">text</div>') <!DOCTYPE html> <html><body><div class="bar">text</div></body></html> Changing this might also break code that rely on this behavior. I'm therefore going to close this as "not a bug".

History
Date	User	Action	Args
2022-04-11 14:59:39	admin	set	github: 86987
2021-01-06 06:46:27	ezio.melotti	set	status: open -> closed type: behavior assignee: ezio.melotti nosy: + ezio.melotti messages: + msg384475 resolution: not a bug stage: resolved
2021-01-04 08:00:54	karlcow	create