Message 32487 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eugine_kosenko
Recipients
Date	2007-07-12.19:28:02
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
import HTMLParser p = HTMLParser.HTMLParser() p.feed(""" <script> <!-- bmD.write('</sc'+'ript>'); //--> </script> """) Traceback (most recent call last): File "<stdin>", line 4, in ? File "/usr/lib/python2.4/HTMLParser.py", line 108, in feed self.goahead(0) File "/usr/lib/python2.4/HTMLParser.py", line 150, in goahead k = self.parse_endtag(i) File "/usr/lib/python2.4/HTMLParser.py", line 314, in parse_endtag self.error("bad end tag: %r" % (rawdata[i:j],)) File "/usr/lib/python2.4/HTMLParser.py", line 115, in error raise HTMLParseError(message, self.getpos()) HTMLParser.HTMLParseError: bad end tag: "</sc'+'ript>", at line 4, column 12 The JavaScript code is protected via HTML comment, so HTMLParser must skip it entirely, and the parsing must be successfull. Instead of this, the JavaScript code is parsed as a part of the HTML page, and incorrect end tag is detected. If one move the actual end tag </script> up just after start tag <script>, the code is parsed without errors. Hence the code seems to be artificial, it is used often in real site counters to prevent the blocking of them.

import HTMLParser

p = HTMLParser.HTMLParser()
p.feed("""
<script>
<!--
bmD.write('</sc'+'ript>');
//-->
</script>
""")

Traceback (most recent call last):
  File "<stdin>", line 4, in ?
  File "/usr/lib/python2.4/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/lib/python2.4/HTMLParser.py", line 150, in goahead
    k = self.parse_endtag(i)
  File "/usr/lib/python2.4/HTMLParser.py", line 314, in parse_endtag
    self.error("bad end tag: %r" % (rawdata[i:j],))
  File "/usr/lib/python2.4/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: bad end tag: "</sc'+'ript>", at line 4, column 12

The JavaScript code is protected via HTML comment, so HTMLParser must skip it entirely, and the parsing must be successfull.

Instead of this, the JavaScript code is parsed as a part of the HTML page, and incorrect end tag is detected. If one move the actual end tag </script> up just after start tag <script>, the code is parsed without errors.

Hence the code seems to be artificial, it is used often in real site counters to prevent the blocking of them.

History
Date	User	Action	Args
2007-08-23 14:58:32	admin	link	issue1752919 messages
2007-08-23 14:58:32	admin	create