Message 54124 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	josephw
Recipients
Date	2004-03-26.06:39:58
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
I need to parse a byte stream as an XML document and, afterwards, access the same document as a Unicode string. I would prefer to rely on the parser's charset-determining logic, and the 'property_encoding' feature ("http://www.python.org/sax/properties/encoding") seems to offer exactly this information. However, the default Expat parser doesn't support this feature. --- from xml.sax import make_parser, handler from xml.sax.xmlreader import InputSource from sys import stdin p = make_parser() # Should not fail. Should it return None, or UTF-8? assert(p.getProperty(handler.property_encoding) == None) source = InputSource() source.setByteStream(stdin) p.parse(source) # Should now be the name of the actual encoding used assert(p.getProperty(handler.property_encoding) != None) --- This raises SAXNotRecognizedException. Is there another SAX parser I could use instead?

I need to parse a byte stream as an XML document and,
afterwards, access the same document as a Unicode
string. I would prefer to rely on the parser's
charset-determining logic, and the 'property_encoding'
feature
("http://www.python.org/sax/properties/encoding") seems
to offer exactly this information.

However, the default Expat parser doesn't support this
feature.

---
from xml.sax import make_parser, handler
from xml.sax.xmlreader import InputSource
from sys import stdin

p = make_parser()

# Should not fail. Should it return None, or UTF-8?
assert(p.getProperty(handler.property_encoding) == None)

source = InputSource()
source.setByteStream(stdin)

p.parse(source)

# Should now be the name of the actual encoding used
assert(p.getProperty(handler.property_encoding) != None)
---

This raises SAXNotRecognizedException.

Is there another SAX parser I could use instead?

History
Date	User	Action	Args
2007-08-23 16:08:04	admin	link	issue923697 messages
2007-08-23 16:08:04	admin	create