classification
Title: HTMLParser lukewarm on bogus bare attribute chars
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder: HTMLParser : A auto-tolerant parsing mode
View: 1486713
Assigned To: Nosy List: Neil Muller, ajaksu2, mkc, nnseva, r.david.murray
Priority: normal Keywords:

Created on 2004-06-18 19:33 by mkc, last changed 2010-12-03 04:14 by r.david.murray. This issue is now closed.

Messages (5)
msg60515 - (view) Author: Mike Coleman (mkc) Date: 2004-06-18 19:33
I tripped over the same problem mentioned in bug
#921657 (HTMLParser.py), except that my bogus attribute
char is '|' instead of '@'.

May I suggest that HTMLParser either require strict
compliance with the HTML spec, or alternatively that it
accept everything reasonable?  The latter approach
would be much more useful, and it would also be
valuable to have this decision documented.

In particular, 'attrfind' needs to be changed to accept
(following the '=\s*') something like the subpattern
given for 'locatestarttagend' (see the "bare value" line).
msg60516 - (view) Author: Vsevolod Novikov (nnseva) Date: 2004-10-13 10:15
Logged In: YES 
user_id=325678

see request #1046092 to fix it
msg81438 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-02-09 06:12
Per #921657, looks like the current behavior is correct.
msg121676 - (view) Author: Neil Muller (Neil Muller) Date: 2010-11-20 16:30
This should probably be solved as part of #1486713 .
msg123175 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-03 04:14
The new strict=False mode from #1486713 handles this case.
History
Date User Action Args
2010-12-03 04:14:13r.david.murraysetstatus: open -> closed

superseder: HTMLParser : A auto-tolerant parsing mode

nosy: + r.david.murray
messages: + msg123175
resolution: accepted
stage: resolved
2010-11-20 16:30:56Neil Mullersetnosy: + Neil Muller
messages: + msg121676
2009-02-09 06:12:41ajaksu2setnosy: + ajaksu2
type: enhancement
messages: + msg81438
versions: + Python 2.7, - Python 2.3
2004-06-18 19:33:18mkccreate