Issue 1752919: Exception in HTMLParser for special JavaScript code

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/45188

classification

Title:	Exception in HTMLParser for special JavaScript code
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.1, Python 2.6

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	HTMLParser.py - more robust SCRIPT tag parsing View: 670664
Assigned To:		Nosy List:	BreamoreBoy, ajaksu2, eugine_kosenko, ezio.melotti, r.david.murray
Priority:	normal	Keywords:	easy

Created on 2007-07-12 19:28 by eugine_kosenko, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg32487 - (view)	Author: Eugine Kosenko (eugine_kosenko)	Date: 2007-07-12 19:28
import HTMLParser p = HTMLParser.HTMLParser() p.feed(""" <script> <!-- bmD.write('</sc'+'ript>'); //--> </script> """) Traceback (most recent call last): File "<stdin>", line 4, in ? File "/usr/lib/python2.4/HTMLParser.py", line 108, in feed self.goahead(0) File "/usr/lib/python2.4/HTMLParser.py", line 150, in goahead k = self.parse_endtag(i) File "/usr/lib/python2.4/HTMLParser.py", line 314, in parse_endtag self.error("bad end tag: %r" % (rawdata[i:j],)) File "/usr/lib/python2.4/HTMLParser.py", line 115, in error raise HTMLParseError(message, self.getpos()) HTMLParser.HTMLParseError: bad end tag: "</sc'+'ript>", at line 4, column 12 The JavaScript code is protected via HTML comment, so HTMLParser must skip it entirely, and the parsing must be successfull. Instead of this, the JavaScript code is parsed as a part of the HTML page, and incorrect end tag is detected. If one move the actual end tag </script> up just after start tag <script>, the code is parsed without errors. Hence the code seems to be artificial, it is used often in real site counters to prevent the blocking of them.
msg85681 - (view)	Author: Daniel Diniz (ajaksu2) *	Date: 2009-04-07 04:03
Confirmed in trunk, py3k.
msg98403 - (view)	Author: Thomas Holmes (thomas.holmes)	Date: 2010-01-27 03:37
I agree, I do not feel like the precise changes to the tests feel completely ideal. I feel that this problem stems from the fact that the nameCheck function as originally written doesn't seem to completely serve its originally intended purpose. The original issue that caused the modifications of the tests were as follows: * After adding "test_abs_path()" namecheck would pass the test when it should actually fail due to the original assert performing os.path.abspath() on both paths. The obviously solution to this seemed to be to take the abspath of the user supplied path (the variable dir in this function) relative or otherwise and compare to ndir which is derived from the passed in "name" variable, populated by the mkdtemp call. * The second issue is a somewhat complex case of those asserts passing the atypical odd calls of nameCheck from the "test__RandomNameSequence" suite. Without absolute pathing of both parameters in the new first assert this test began to fail. This is because the original call was self.namecheck(<file name>, '', '', ''). These empty parameters when called with os.path.abspath() would return valid paths and these asserts would succeed. As a result, the abspath(join()) call built a proper path into the "name" parameter allowing the tests to pass in this case. * For the sake of cleanliness I decided it made more sense to split the two apart. Make one assert that specifically verifies the path the file/folder is being placed in is both absolute and matching and one that will pass the file name. Perhaps the old first assert (or second assert in the patch) can actually be removed but I think it may be getting properly tested against in one of the other test classes. I am quite confident there is a much better way to accomplish this but I did not wish to change _too_ much of the test on my first stab at this. I appreciate your feedback very much. I will work on setting up the 2.7 environment for working on issues that span the 2x/3x gap.
msg98404 - (view)	Author: Thomas Holmes (thomas.holmes)	Date: 2010-01-27 03:38
Please disregard, I commented on the wrong issue.
msg100254 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-03-01 12:32
This is a duplicate of issue 670664, which has a proposed patch.
msg116704 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010-09-17 18:25
Can't see much sense in keeping a duplicate open.

History
Date	User	Action	Args
2022-04-11 14:56:25	admin	set	github: 45188
2010-09-17 18:25:29	BreamoreBoy	set	status: open -> closed nosy: + BreamoreBoy messages: + msg116704
2010-03-01 12:32:13	r.david.murray	set	nosy: + r.david.murray messages: + msg100254 resolution: duplicate superseder: HTMLParser.py - more robust SCRIPT tag parsing stage: test needed -> resolved
2010-01-27 03:38:50	thomas.holmes	set	nosy: - thomas.holmes
2010-01-27 03:38:40	thomas.holmes	set	messages: + msg98404
2010-01-27 03:37:10	thomas.holmes	set	nosy: + thomas.holmes messages: + msg98403
2009-11-09 21:53:16	ezio.melotti	set	nosy: + ezio.melotti
2009-04-22 05:07:32	ajaksu2	set	keywords: + easy
2009-04-07 04:03:20	ajaksu2	set	versions: + Python 2.6, Python 3.1, - Python 2.4 nosy: + ajaksu2 messages: + msg85681 type: behavior stage: test needed
2007-07-12 19:28:02	eugine_kosenko	create