classification which include two will have problem
Author: moonflow (moonflow) Date: 2012-11-20 14:09
if a <tr> include two <a> or more,SGMLParser processing has a problem

for example:
    <td align="center" valign="top" nowrap>
    <script language="Javascript">
      if ( 4 == 4 ) document.write("<strong class=\"Critical small\">Critical</strong>");
      if ( 4 == 3 ) document.write("<strong class=\"High small\">High</strong>");
      if ( 4 == 2 ) document.write("<strong class=\"Medium small\">Medium</strong>");
      if ( 4 == 1 ) document.write("<strong class=\"Low small\">Low</strong>");
    <td valign="top" align="center" nowrap>
    <small><script type="text/javascript">document.write(FormatDate("%d-%b-%y", "2012", "11", "18"));</script></small>
    <td valign="top" align="center" nowrap><small>
    <a title="CPAI-2012-809" style="text-transform:uppercase" href="2012/cpai-08-nov.html">
    <td valign="top" nowrap align="center"><small>
    <a target="_blank" href="">CVE-2011-2089</a><br /></small>
    <td valign="top"><small>SCADA ICONICS WebHMI ActiveX Stack Overflow (2011-2089)</small></td>

def start_a(self, attrs):
        if self.is_td:       
            cve_href = [v for k, v in attrs if k == "target" and v == "_blank"]
            if cve_href:
                self.is_a = True
                self.is_cve = True

            #for SGMLParser maybe have a bug,a <tr> have two <a> has problem
            vul_href = [v for k, v in attrs if k == "style"]
            print vul_href
            if vul_href:
                vul_href = "".join([v for k, v in attrs if k == "href"])
                if vul_href.find("cve") == -1:
                    self.href_name = vul_href     
                self.href_name = ""

here print vul_href but print nothing.Is it ok?
Author: Ezio Melotti (ezio.melotti) Date: 2012-11-20 14:12
Have you tried with HTMLParser?
sgmllib is deprecated and has been removed in Python 3.
HTMLParser is also much better at parsing (broken) HTML.
Author: moonflow (moonflow) Date: 2012-11-20 14:18
I haven't tried it, the problem will not process?
Author: Ezio Melotti (ezio.melotti) Date: 2012-11-20 14:25
If what you are trying to do is extracting the link(s) that contain 'cve', you try the attached script.
Author: Ezio Melotti (ezio.melotti) Date: 2012-11-20 14:43
Sorry, I misread your code, looks like you want the href *without* 'cve'.
In that case change my code to use "'cve' not in attrs['href']" (also avoid using  s.find('cve') == -1 , and use the more readable and idiomatic  'cve' not in s ).

I think your original script doesn't work for two reasons:
1) you are looking for a table with class="tablesorter", but in the HTML the table doesn't have that class, so self.is_table is never set to True;
2) you are finding the href of the <a> with a "style" attribute and correctly setting it to self.href_name, but the value is then replaced by "" when the following <a> without "style" is found;

That said, I still suggest you to abandon sgmllib and use HTMLParser, or possibly an external module like BeautifulSoup or LXML.
