Message54124
I need to parse a byte stream as an XML document and,
afterwards, access the same document as a Unicode
string. I would prefer to rely on the parser's
charset-determining logic, and the 'property_encoding'
feature
("http://www.python.org/sax/properties/encoding") seems
to offer exactly this information.
However, the default Expat parser doesn't support this
feature.
---
from xml.sax import make_parser, handler
from xml.sax.xmlreader import InputSource
from sys import stdin
p = make_parser()
# Should not fail. Should it return None, or UTF-8?
assert(p.getProperty(handler.property_encoding) == None)
source = InputSource()
source.setByteStream(stdin)
p.parse(source)
# Should now be the name of the actual encoding used
assert(p.getProperty(handler.property_encoding) != None)
---
This raises SAXNotRecognizedException.
Is there another SAX parser I could use instead? |
|
Date |
User |
Action |
Args |
2007-08-23 16:08:04 | admin | link | issue923697 messages |
2007-08-23 16:08:04 | admin | create | |
|