This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: SAX2 'property_encoding' feature not supported
Type: enhancement Stage: needs patch
Components: XML Versions: Python 3.1, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, josephw
Priority: low Keywords:

Created on 2004-03-26 06:39 by josephw, last changed 2022-04-11 14:56 by admin.

Messages (5)
msg54124 - (view) Author: Joseph Walton (josephw) Date: 2004-03-26 06:39
I need to parse a byte stream as an XML document and,
afterwards, access the same document as a Unicode
string. I would prefer to rely on the parser's
charset-determining logic, and the 'property_encoding'
feature
("http://www.python.org/sax/properties/encoding") seems
to offer exactly this information.

However, the default Expat parser doesn't support this
feature.

---
from xml.sax import make_parser, handler
from xml.sax.xmlreader import InputSource
from sys import stdin

p = make_parser()

# Should not fail. Should it return None, or UTF-8?
assert(p.getProperty(handler.property_encoding) == None)

source = InputSource()
source.setByteStream(stdin)

p.parse(source)

# Should now be the name of the actual encoding used
assert(p.getProperty(handler.property_encoding) != None)
---

This raises SAXNotRecognizedException.

Is there another SAX parser I could use instead?
msg116563 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-09-16 15:46
The URL referenced in msg54124 gives a 404.  It is also used as the property_encoding in the sax handler module.  Could this be fixed in 3.2 or can this issue be closed?
msg116754 - (view) Author: Joseph Walton (josephw) Date: 2010-09-18 04:13
The behaviour is unchanged in Python 3.1 and the sample program still fails.
msg190048 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2013-05-26 02:29
This is still an issue so the sax handler module property_encoding attribute be set to what URL?
msg190111 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-05-26 20:38
Mark, the "http://www.python.org/sax/properties/encoding" is not meant to be a web page. It's like an attribute name, but fully qualified so that attributes given by different organizations don't clash.
(There may be different usages of "encoding": is it the one set by the user, or the one determined by the parser? according to Python docs, here it's both)

Python's default Expat parser doesn't support this feature, so the present behavior is correct.
Proper support should not be difficult to add, with a XmlDeclHandler.
History
Date User Action Args
2022-04-11 14:56:03adminsetgithub: 40084
2014-02-03 17:10:57BreamoreBoysetnosy: - BreamoreBoy
2013-05-26 20:38:14amaury.forgeotdarcsetnosy: + amaury.forgeotdarc

messages: + msg190111
stage: test needed -> needs patch
2013-05-26 02:29:14BreamoreBoysetmessages: + msg190048
2010-09-18 04:13:26josephwsetmessages: + msg116754
versions: + Python 3.1
2010-09-16 15:46:11BreamoreBoysetnosy: + BreamoreBoy
messages: + msg116563
2009-02-14 11:34:04ajaksu2setstage: test needed
components: + XML, - None
versions: + Python 2.7
2004-03-26 06:39:58josephwcreate