http://bugs.python.org/review/670664/diff/3098/9615 File Lib/HTMLParser.py (right): http://bugs.python.org/review/670664/diff/3098/9615#newcode124 Lib/HTMLParser.py:124: def set_cdata_mode(self, tag): 2. set_cdata_mode receives the tag and saves it in self.cdata_tag. It also sets the 'interesting' regex to interesting_cdata, i.e. re.compile(r'<(/|\Z)'). http://bugs.python.org/review/670664/diff/3098/9615#newcode140 Lib/HTMLParser.py:140: match = self.interesting.search(rawdata, i) # < or & 3. Shouldn't this look for the closing cdata tag instead? (i.e. set the interesting regex to something like '</' + self.cdata_tag) http://bugs.python.org/review/670664/diff/3098/9615#newcode276 Lib/HTMLParser.py:276: self.set_cdata_mode(tag) 1. If the starttag is either <script> or <style>, set_cdata_mode is called. http://bugs.python.org/review/670664/diff/3098/9615#newcode321 Lib/HTMLParser.py:321: self.handle_data(rawdata[i:j]) 4. Isn't rawdata[i:j] the closing tag (i.e. either </script> or </style>) here? Shouldn't the content between the start and end tag be passed instead? Are there tests for this? http://bugs.python.org/review/670664/diff/3098/9615#newcode324 Lib/HTMLParser.py:324: tag = match.group(1).strip() 5. Is the strip you added here related/necessary? Is it supposed to handle cases like "</ script >"? Are there tests for this? http://bugs.python.org/review/670664/diff/3098/9615#newcode328 Lib/HTMLParser.py:328: self.handle_data(rawdata[i:j]) 6. See 4.