Title: not enough information in SGMLParseError
Created on 2004-11-09 16:54 by ezust, last changed 2009-04-22 16:03 by ajaksu2. This issue is now closed.

msg23075 - (view) Author: Alan Ezust (ezust) Date: 2004-11-09 16:54
When SGMLParser encounters a badly formed webpage, it
throws sgmllib.SGMLParseError with a cryptic message:

[bin] > python
Pythonlib's error message: expected name token

I think it should give the line and offset, and maybe
even the text it had problems with, in the args of the
exception. And print it out in the message. 

My extra information: error at line 1 offset 2

I tried to print it out by using parser.getpos() but it
returns values which do not correspond to the error.
How do I determine this at runtime?

testcase that reproduces this problem attached.
msg23076 - (view) Author: Alan Ezust (ezust) Date: 2004-11-09 16:55
Logged In: YES 

import sgmllib, urllib, urlparse
from sgmllib import SGMLParser

if __name__ == "__main__":
    url = ""
    parser = SGMLParser()
    data = urllib.urlopen(url).read()

    except sgmllib.SGMLParseError, ex:
        print "Pythonlib's error message: " + str(ex)
        line, offset = parser.getpos()
        lines = parser.rawdata.split("\n")
        print "My extra information: error at line %d offset
%d" % parser.getpos()
        print lines[line]
        print "%*s" % (offset, "^")
        parser = None 
msg86303 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-04-22 16:03
Closing, the message does currently include the problematic text. The
output in both 2.5 and trunk is:
Pythonlib's error message: expected name token at '<!<img src="image/at'
