Author yotam
Recipients Hunanyan, cpalmer, ezio.melotti, fantoozler, fdrake, georg.brandl, gsf, momat, yotam
Date 2010-09-30.21:50:03
SpamBayes Score 2.80249e-05
Marked as misclassified No
Message-id <>
The fails when inside 
  <script> ... </script>
it can fooled by JavaScript with less-than '<' conditional expressions.
In the attached example:

 $ tar tvzf lt-in-script-example.tgz | cut -c24-
     796 2010-09-30 16:52
   23678 2010-09-30 16:39 t.html

here's what happens:

 $ python t.html /tmp/t.txt
 HTMLParser: /home/yotam/src/wog/HTMLParser.bug/
 Traceback (most recent call last):
   File "", line 31, in <module>
     text = html2text(
   File "", line 23, in html2text
     te = TextExtractor(html)
   File "", line 15, in __init__
   File "/home/yotam/src/wog/HTMLParser.bug/", line 108, in feed
   File "/home/yotam/src/wog/HTMLParser.bug/", line 148, in goahead
     k = self.parse_starttag(i)
   File "/home/yotam/src/wog/HTMLParser.bug/", line 229, in parse_starttag
     endpos = self.check_for_whole_start_tag(i)
   File "/home/yotam/src/wog/HTMLParser.bug/", line 304, in check_for_whole_start_tag
     self.error("malformed start tag")
   File "/home/yotam/src/wog/HTMLParser.bug/", line 115, in error
     raise HTMLParseError(message, self.getpos())
 HTMLParser.HTMLParseError: malformed start tag, at line 396, column 332

I have a suggested patch 
fixing this problem, soon to be attached.

-- yotam
Date User Action Args
2010-09-30 21:50:07yotamsetrecipients: + yotam, fdrake, georg.brandl, fantoozler, gsf, cpalmer, ezio.melotti, momat, Hunanyan
2010-09-30 21:50:06yotamsetmessageid: <>
2010-09-30 21:50:04yotamlinkissue670664 messages
2010-09-30 21:50:03yotamcreate