classification
Title: SAX2 'property_encoding' feature not supported
Type: enhancement Stage: test needed
Components: XML Versions: Python 3.1, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, josephw
Priority: low Keywords:

Created on 2004-03-26 06:39 by josephw, last changed 2010-09-18 04:13 by josephw.

Messages (3)
msg54124 - (view) Author: Joseph Walton (josephw) Date: 2004-03-26 06:39
I need to parse a byte stream as an XML document and,
afterwards, access the same document as a Unicode
string. I would prefer to rely on the parser's
charset-determining logic, and the 'property_encoding'
feature
("http://www.python.org/sax/properties/encoding") seems
to offer exactly this information.

However, the default Expat parser doesn't support this
feature.

---
from xml.sax import make_parser, handler
from xml.sax.xmlreader import InputSource
from sys import stdin

p = make_parser()

# Should not fail. Should it return None, or UTF-8?
assert(p.getProperty(handler.property_encoding) == None)

source = InputSource()
source.setByteStream(stdin)

p.parse(source)

# Should now be the name of the actual encoding used
assert(p.getProperty(handler.property_encoding) != None)
---

This raises SAXNotRecognizedException.

Is there another SAX parser I could use instead?
msg116563 - (view) Author: Mark Lawrence (BreamoreBoy) Date: 2010-09-16 15:46
The URL referenced in msg54124 gives a 404.  It is also used as the property_encoding in the sax handler module.  Could this be fixed in 3.2 or can this issue be closed?
msg116754 - (view) Author: Joseph Walton (josephw) Date: 2010-09-18 04:13
The behaviour is unchanged in Python 3.1 and the sample program still fails.
History
Date User Action Args
2010-09-18 04:13:26josephwsetmessages: + msg116754
versions: + Python 3.1
2010-09-16 15:46:11BreamoreBoysetnosy: + BreamoreBoy
messages: + msg116563
2009-02-14 11:34:04ajaksu2setstage: test needed
components: + XML, - None
versions: + Python 2.7
2004-03-26 06:39:58josephwcreate