This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author smroid
Recipients
Date 2003-05-14.05:12:43
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=159908

Two troublesome input examples:
<table border=0 width="100%"cellspacing=0 cellpadding=0>
<option selected value=>

Here's a fix I came up with in HTMLParser.py: replace the
definition of locatestarttagend with:

locatestarttagend = re.compile(r"""
  <[a-zA-Z][-.a-zA-Z0-9:_]*          # tag name
  \s*                                # whitespace after tag name
  (?:
    (?:[a-zA-Z_][-.:a-zA-Z0-9_]*     # attribute name
      (?:\s*=\s*                     # value indicator
        (?:'[^']*'                   # LITA-enclosed value
          |\"[^\"]*\"                # LIT-enclosed value
          |[^'\">\s]+                # bare value
         )?
       )?
     )
     \s*                             # whitespace between attrs
   )*
  \s*                                # trailing whitespace
""", re.VERBOSE)
History
Date User Action Args
2008-01-20 09:55:57adminlinkissue683938 messages
2008-01-20 09:55:57admincreate