classification
Title: not enough information in SGMLParseError
Type: feature request Stage: committed/rejected
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, ezust (2)
Priority: normal Keywords

Created on 2004-11-09 16:54 by ezust, last changed 2009-04-22 16:03 by ajaksu2.

Messages (3)
msg23075 - (view) Author: Alan Ezust (ezust) Date: 2004-11-09 16:54
When SGMLParser encounters a badly formed webpage, it
throws sgmllib.SGMLParseError with a cryptic message:

[bin] > python sgmlparsertest.py
Pythonlib's error message: expected name token

I think it should give the line and offset, and maybe
even the text it had problems with, in the args of the
exception. And print it out in the message. 

My extra information: error at line 1 offset 2
<head>
 ^
[bin] 

I tried to print it out by using parser.getpos() but it
returns values which do not correspond to the error.
How do I determine this at runtime?

testcase that reproduces this problem attached.
msg23076 - (view) Author: Alan Ezust (ezust) Date: 2004-11-09 16:55
Logged In: YES 
user_id=935841

import sgmllib, urllib, urlparse
from sgmllib import SGMLParser


if __name__ == "__main__":
    url = "http://www.cs.uvic.ca/~gshoja/"
    parser = SGMLParser()
    data = urllib.urlopen(url).read()

    try:
        parser.feed(data)
    except sgmllib.SGMLParseError, ex:
        print "Pythonlib's error message: " + str(ex)
        line, offset = parser.getpos()
        lines = parser.rawdata.split("\n")
        print "My extra information: error at line %d offset
%d" % parser.getpos()
        print lines[line]
        print "%*s" % (offset, "^")
        parser = None 
msg86303 - (view) Author: Daniel Diniz (ajaksu2) Date: 2009-04-22 16:03
Closing, the message does currently include the problematic text. The
output in both 2.5 and trunk is:
Pythonlib's error message: expected name token at '<!<img src="image/at'
History
Date User Action Args
2009-04-22 16:03:37ajaksu2setstatus: open -> closed

nosy: + ajaksu2
messages: + msg86303

resolution: out of date
stage: test needed -> committed/rejected
2009-02-14 21:57:43ajaksu2setstage: test needed
type: feature request
versions: + Python 2.7, - Python 2.3
2004-11-09 16:54:37ezustcreate