This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Escaping string containing invalid characters as per XML
Type: Stage: resolved
Components: Versions: Python 3.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: xml.sax.saxutils.escape does not escapes \x00
View: 13648
Assigned To: Nosy List: Devika Sondhi, ned.deily
Priority: normal Keywords:

Created on 2018-12-29 13:33 by Devika Sondhi, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg332716 - (view) Author: Devika Sondhi (Devika Sondhi) Date: 2018-12-29 13:33
As per XML 1.0 and 1.1 specs, the null character is treated as invalid in an XML doc. (https://en.wikipedia.org/wiki/Valid_characters_in_XML)
Shouldn't invalid xml characters be omitted while escaping?
The current behavior(tested on Python 3.7) is as follows:

>>> from xml.sax.saxutils import escape
>>> escape("a\u0000\u0001\u0008\u000b\u000c\u000e\u001fb")
'a\x00\x01\x08\x0b\x0c\x0e\x1fb'
msg332725 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-12-29 16:44
This question has come up before.  See Issue13648 where it was pointed out that null characters "are forbidden both in raw form *and* in escaped form. So even if they get escaped, they *still* will lead to errors. So there is no point in escaping them."
History
Date User Action Args
2022-04-11 14:59:09adminsetgithub: 79794
2018-12-29 16:44:51ned.deilysetstatus: open -> closed

superseder: xml.sax.saxutils.escape does not escapes \x00

nosy: + ned.deily
messages: + msg332725
resolution: duplicate
stage: resolved
2018-12-29 13:33:58Devika Sondhicreate