This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: not enough information in SGMLParseError
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, ezust
Priority: normal Keywords:

Created on 2004-11-09 16:54 by ezust, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg23075 - (view) Author: Alan Ezust (ezust) Date: 2004-11-09 16:54
When SGMLParser encounters a badly formed webpage, it
throws sgmllib.SGMLParseError with a cryptic message:

[bin] > python sgmlparsertest.py
Pythonlib's error message: expected name token

I think it should give the line and offset, and maybe
even the text it had problems with, in the args of the
exception. And print it out in the message. 

My extra information: error at line 1 offset 2
<head>
 ^
[bin] 

I tried to print it out by using parser.getpos() but it
returns values which do not correspond to the error.
How do I determine this at runtime?

testcase that reproduces this problem attached.
msg23076 - (view) Author: Alan Ezust (ezust) Date: 2004-11-09 16:55
Logged In: YES 
user_id=935841

import sgmllib, urllib, urlparse
from sgmllib import SGMLParser


if __name__ == "__main__":
    url = "http://www.cs.uvic.ca/~gshoja/"
    parser = SGMLParser()
    data = urllib.urlopen(url).read()

    try:
        parser.feed(data)
    except sgmllib.SGMLParseError, ex:
        print "Pythonlib's error message: " + str(ex)
        line, offset = parser.getpos()
        lines = parser.rawdata.split("\n")
        print "My extra information: error at line %d offset
%d" % parser.getpos()
        print lines[line]
        print "%*s" % (offset, "^")
        parser = None 
msg86303 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-04-22 16:03
Closing, the message does currently include the problematic text. The
output in both 2.5 and trunk is:
Pythonlib's error message: expected name token at '<!<img src="image/at'
History
Date User Action Args
2022-04-11 14:56:08adminsetgithub: 41156
2009-04-22 16:03:37ajaksu2setstatus: open -> closed

nosy: + ajaksu2
messages: + msg86303

resolution: out of date
stage: test needed -> resolved
2009-02-14 21:57:43ajaksu2setstage: test needed
type: enhancement
versions: + Python 2.7, - Python 2.3
2004-11-09 16:54:37ezustcreate