This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author once-off
Recipients once-off
Date 2009-03-17.11:19:28
SpamBayes Score 7.8998474e-12
Marked as misclassified No
Message-id <1237288775.97.0.464945533476.issue5498@psf.upfronthosting.co.za>
In-reply-to
Content
The attached script (sgml_error.py) was designed to output XML files
unchanged, other than expanding <empty/> tags into an opening and
closing tag, such as <empty></empty>.

It seems the SGMLParser class recognizes an empty tag, but does not emit
the closing tag until the NEXT forward slash it sees. So everything from
the forward slash in <empty/> (even the closing angle bracket) until the
next forward slash is considered to be textual data. See the following
line output.

Have I missed something here (like a conscious design limitation on the
class, an error on my part, etc), or is this really a bug with the class?

C:\Python24\Lib>python sgmllib.py H:\input.xml
start tag: <root>
data: '\n '
start tag: <tag1>
end tag: </tag1>
data: '\n '
start tag: <tag2>
data: '>\n <tag3>hello<'
end tag: </tag2>
data: 'tag3>\n'
end tag: </root>

C:\Python24\Lib>python
ActivePython 2.4.3 Build 12 (ActiveState Software Inc.) based on
Python 2.4.3 (#69, Apr 11 2006, 15:32:42) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sgml_error

Input:
<root>
 <tag1></tag1>
 <tag2/>
 <tag3>hello</tag3>
</root>

Output:
<root>
 <tag1></tag1>
 <tag2>>
 <tag3>hello<</tag2>tag3>
</root>

Expected:
<root>
 <tag1></tag1>
 <tag2></tag2>
 <tag3>hello</tag3>
</root>
History
Date User Action Args
2009-03-17 11:19:36once-offsetrecipients: + once-off
2009-03-17 11:19:35once-offsetmessageid: <1237288775.97.0.464945533476.issue5498@psf.upfronthosting.co.za>
2009-03-17 11:19:33once-offlinkissue5498 messages
2009-03-17 11:19:31once-offcreate