Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(6)

#670664: HTMLParser.py - more robust SCRIPT tag parsing

Can't Edit
Can't Publish+Mail
Start Review
Created:
6 years, 5 months ago by fantoozler
Modified:
6 years, 5 months ago
Reviewers:
ezio.melotti
CC:
Georg, yotam_users.sourceforge.net, orsenthil, AntoinePitrou, fantoozler_users.sourceforge.net, gsf_breaksalot.org, ezio.melotti, eric.araujo, r.david.murray, momat_man.poznan.pl, arman.hunanyan_gmail.com, b3nder_yandex.ru, devnull_psf.upfronthosting.co.za, bastawhiz_gmail.com, orsenthil, devnull_psf.upfronthosting.co.za
Visibility:
Public.

Patch Set 1 #

Total comments: 6
Unified diffs Side-by-side diffs Delta from patch set Stats Patch
Lib/HTMLParser.py View 4 chunks +16 lines, -3 lines 6 comments Download
Lib/test/test_htmlparser.py View 1 chunk +12 lines, -0 lines 0 comments Download

Messages

Total messages: 1
ezio.melotti
6 years, 5 months ago #1
http://bugs.python.org/review/670664/diff/3098/9615
File Lib/HTMLParser.py (right):

http://bugs.python.org/review/670664/diff/3098/9615#newcode124
Lib/HTMLParser.py:124: def set_cdata_mode(self, tag):
2. set_cdata_mode receives the tag and saves it in self.cdata_tag.  It also sets
the 'interesting' regex to interesting_cdata, i.e. re.compile(r'<(/|\Z)').

http://bugs.python.org/review/670664/diff/3098/9615#newcode140
Lib/HTMLParser.py:140: match = self.interesting.search(rawdata, i) # < or &
3. Shouldn't this look for the closing cdata tag instead? (i.e. set the
interesting regex to something like '</' + self.cdata_tag)

http://bugs.python.org/review/670664/diff/3098/9615#newcode276
Lib/HTMLParser.py:276: self.set_cdata_mode(tag)
1. If the starttag is either <script> or <style>, set_cdata_mode is called.

http://bugs.python.org/review/670664/diff/3098/9615#newcode321
Lib/HTMLParser.py:321: self.handle_data(rawdata[i:j])
4. Isn't rawdata[i:j] the closing tag (i.e. either </script> or </style>) here? 
Shouldn't the content between the start and end tag be passed instead?
Are there tests for this?

http://bugs.python.org/review/670664/diff/3098/9615#newcode324
Lib/HTMLParser.py:324: tag = match.group(1).strip()
5. Is the strip you added here related/necessary? Is it supposed to handle cases
like "</ script >"?
Are there tests for this?

http://bugs.python.org/review/670664/diff/3098/9615#newcode328
Lib/HTMLParser.py:328: self.handle_data(rawdata[i:j])
6. See 4.
Sign in to reply to this message.

RSS Feeds Recent Issues | This issue
This is Rietveld 894c83f36cb7