This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author wplappert
Recipients wplappert
Date 2010-04-05.18:08:54
SpamBayes Score 5.206009e-09
Marked as misclassified No
Message-id <1270490935.7.0.673554223896.issue8319@psf.upfronthosting.co.za>
In-reply-to
Content
When parsing HTML and having a string along the lines of <td></td>, a call to handle_data is not issued between handle_starttag and handle_endtag, but afterwards. The problem is in HTMLparser.goahead, where the position i and j are calculated. The code reads
if i < j: self.handle_data(rawdata[i:j]) but it should be
if i <= j: self.handle_data(rawdata[i:j])

If there is data between <td> and </td>, everything works fine.

I just checked the trunk of 2.6, this occurs in line 142 of Lib/HTMLParser.py. The size of HTMLParser.py is 13407 bytes, and is dated 'Feb 26 19:25'.
History
Date User Action Args
2010-04-05 18:08:55wplappertsetrecipients: + wplappert
2010-04-05 18:08:55wplappertsetmessageid: <1270490935.7.0.673554223896.issue8319@psf.upfronthosting.co.za>
2010-04-05 18:08:54wplappertlinkissue8319 messages
2010-04-05 18:08:54wplappertcreate