Message324922
Hi it's been a few years now since this was reported and it's still a problem, any chance of a fix for this? The API gives the impression that if you pass python strings to the XML API then the library will generate valid XML. It takes care of the charset/encoding and entity escaping aspects of XML generation so would be logical for it to in some way take care of control characters too - especially as silently generating unparseable XML is a somewhat dangerous failure mode.
I think there's a strong case for some built-in functionality to replace/ignore the control characters (perhaps as a configurable option, in case of performance worries) rather than just throwing an exception, since it's very common to have an arbitrary string generated by some other program or user input that needs to be written into an XML file (and a lot less common to be 100% sure in all cases what characters your string might contain). For those common use cases, the current situation where every python developer needs to implement their own workaround to sanitize strings isn't ideal, especially as it's not trivial to get it right and likely a lot of the community who end up 'rolling their own' are getting in wrong in some way.
[On the other hand if you guys decide this really isn't going to be fixed, then at the very least I'd suggest that the API documentation should prominently state that it is up to the users of these libraries to implement their own sanitization of control characters, since I'm sure none of us want people using python to end up with buggy applications] |
|
Date |
User |
Action |
Args |
2018-09-10 12:36:41 | benspiller | set | recipients:
+ benspiller, effbot, ods, strangefeatures, jwilk, eli.bendersky, flox, nvetoshkin, santoso.wijaya, martin.panter |
2018-09-10 12:36:41 | benspiller | set | messageid: <1536583001.54.0.56676864532.issue5166@psf.upfronthosting.co.za> |
2018-09-10 12:36:41 | benspiller | link | issue5166 messages |
2018-09-10 12:36:41 | benspiller | create | |
|