This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author scoder
Recipients Jeffrey.Kintscher, eli.bendersky, johnburnett, scoder
Date 2019-05-12.07:38:02
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1557646682.61.0.877795076881.issue33303@roundup.psfhosted.org>
In-reply-to
Content
I'm really sorry again, but I only consulted the XML spec on this now (and also the way libxml2 does it), and I found that XML comment text actually does not get escaped. It's not character data, and, in fact, "--" is not even allowed at all inside of comments. (Funny enough, the HTML serialiser does escaping for both comments and PIs, but, well, that's HTML, I guess…)

https://www.w3.org/TR/REC-xml/#sec-comments

Sorry, Jeffrey, I should have looked that up in the spec much earlier, before you invested so much time into this.

There are two disallowed cases: "--" in the text content, and "-" at the end of the text (which would lead to an "--->"). Now, the thing is, such validation is currently unprecedented in ElementTree, so I don't know if we should start raising exceptions from the serialiser for this case, and if yes, which. Since comments are rare, it won't hurt performance to do that, but once we get started on this, users would probably also want their text and attribute content and their tag and attribute names to be validated, and that would hurt then.

So, I will have to reject the PR and this ticket.
History
Date User Action Args
2019-05-12 07:38:02scodersetrecipients: + scoder, eli.bendersky, johnburnett, Jeffrey.Kintscher
2019-05-12 07:38:02scodersetmessageid: <1557646682.61.0.877795076881.issue33303@roundup.psfhosted.org>
2019-05-12 07:38:02scoderlinkissue33303 messages
2019-05-12 07:38:02scodercreate