Message83667
The attached script (sgml_error.py) was designed to output XML files
unchanged, other than expanding <empty/> tags into an opening and
closing tag, such as <empty></empty>.
It seems the SGMLParser class recognizes an empty tag, but does not emit
the closing tag until the NEXT forward slash it sees. So everything from
the forward slash in <empty/> (even the closing angle bracket) until the
next forward slash is considered to be textual data. See the following
line output.
Have I missed something here (like a conscious design limitation on the
class, an error on my part, etc), or is this really a bug with the class?
C:\Python24\Lib>python sgmllib.py H:\input.xml
start tag: <root>
data: '\n '
start tag: <tag1>
end tag: </tag1>
data: '\n '
start tag: <tag2>
data: '>\n <tag3>hello<'
end tag: </tag2>
data: 'tag3>\n'
end tag: </root>
C:\Python24\Lib>python
ActivePython 2.4.3 Build 12 (ActiveState Software Inc.) based on
Python 2.4.3 (#69, Apr 11 2006, 15:32:42) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sgml_error
Input:
<root>
<tag1></tag1>
<tag2/>
<tag3>hello</tag3>
</root>
Output:
<root>
<tag1></tag1>
<tag2>>
<tag3>hello<</tag2>tag3>
</root>
Expected:
<root>
<tag1></tag1>
<tag2></tag2>
<tag3>hello</tag3>
</root> |
|
Date |
User |
Action |
Args |
2009-03-17 11:19:36 | once-off | set | recipients:
+ once-off |
2009-03-17 11:19:35 | once-off | set | messageid: <1237288775.97.0.464945533476.issue5498@psf.upfronthosting.co.za> |
2009-03-17 11:19:33 | once-off | link | issue5498 messages |
2009-03-17 11:19:31 | once-off | create | |
|