This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: xml.sax.saxutils.escape does not escapes \x00
Type: behavior Stage:
Components: XML Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: animus, loewis
Priority: normal Keywords:

Created on 2011-12-22 14:49 by animus, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg150096 - (view) Author: Alexey Gorshkov (animus) Date: 2011-12-22 14:49
function xml.sax.saxutils.escape('\x00qweqwe<') returns '\x00qweqwe&lt;'

\x00 did not escaped to &#0;

is this is a correct behavior?

this is influences tools like xmpppy, which sends \x00 not encoded and leads to xmpp error.
msg150097 - (view) Author: Alexey Gorshkov (animus) Date: 2011-12-22 14:55
sorry, xmpppy uses it's own escape method, but anyway... :)
msg150136 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-12-23 08:36
This is correct behavior. \x00 is not supported in XML: not in raw form, and not in escaped form. To transmit binary data in XML, use base64.
msg163291 - (view) Author: Alexey Gorshkov (animus) Date: 2012-06-20 19:03
>This is correct behavior. \x00 is not supported in XML:
> not in raw form, and not in escaped form

last sentence in forth paragraph of section 1.3 in XML 1.1 specification says following:
======
Due to potential problems with APIs,
#x0 is still forbidden both directly and as a character reference.
======

And, second sentence in paragraph 2 in subsection 'Validity constraint: Notation Declared' of section 4.2.2 says following:
======
The characters to be escaped are the control characters #x0 to #x1F and #x7F (most of which cannot appear in XML), space #x20, the delimiters '<' #x3C, '>' #x3E and '"' #x22, the unwise characters '{' #x7B, '}' #x7D, '|' #x7C, '\' #x5C, '^' #x5E and '`' #x60, as well as all characters above #x7F.
======

(xml 1.1) http://www.w3.org/TR/2006/REC-xml11-20060816/
(xml 1.0) http://www.w3.org/TR/2008/REC-xml-20081126/
msg163292 - (view) Author: Alexey Gorshkov (animus) Date: 2012-06-20 19:32
What am I trying to say is: if those characters are forbidden, then maybe they need to be escaped rather than ignored?
msg163294 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-20 19:56
The characters are forbidden both in raw form *and* in escaped form. So even if they get escaped, they *still* will lead to errors. So there is no point in escaping them.
History
Date User Action Args
2022-04-11 14:57:24adminsetgithub: 57857
2018-12-29 16:44:51ned.deilylinkissue35613 superseder
2012-06-20 19:56:12loewissetstatus: open -> closed
resolution: wont fix
messages: + msg163294
2012-06-20 19:32:12animussetmessages: + msg163292
2012-06-20 19:03:32animussetstatus: closed -> open

messages: + msg163291
2011-12-23 08:36:30loewissetstatus: open -> closed
nosy: + loewis
messages: + msg150136

2011-12-22 14:55:52animussetmessages: + msg150097
2011-12-22 14:51:20animussetcomponents: + XML
2011-12-22 14:49:17animuscreate