New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minidom does not encode correctly when calling Document.writexml #63111
Comments
When I have unicode data to save, it seems that it does not save correctly, giving an encode error. I know this exists on 2.7 and from checking the code in xml/dom/minidom.py it looks like it does in 3.2 as well. The method call that seem to be problematic is doc.writexml(open(filename, "wb"), "", " ", "utf-8") Currently I found this to work: doc.writexml(codecs.open(filename, "w", "utf-8"), "", " ", "utf-8") It seems like this should be handled by the writexml method since it already has the specified encoding. |
On Python 3 you should not only open file in text mode with specified encoding, but also specify the "xmlcharrefreplace" error handler.
I can suggest only one solution -- explicitly document this behavior. Perhaps we also should add a special module level function for writing DOM tree to binary file. Low-level writexml() should not be used directly. |
I am trying to resolve a issue for the first time. Can anybody please tell me or elaborate what is "module level function" specifically in this context. |
It means a function defined in the module namespace, as opposed to as a method on a class, so that 'from xml.dom.minidom import <somefunction>' will get you that function. This issue should be for documentation of the problem, since we won't add the function to 2.7. A new issue should be opened for the enhancement request of adding a module level convenience function for writing a dom out to a binary file. |
Isn't this only required in case there are non encodable characters? The docs of writexml don't seem to specify if the file should be opened in text or binary mode but istm that only text mode is supported. The advice of using xmlcharrefreplace could be added in a note. |
I added a PR for like this: .. note::
writer = open(
filename, "w", encoding="utf-8",
errors="xmlcharrefreplace")
doc.writexml(writer, "", " ", "utf-8") |
Asking users unconditionally to use the "xmlcharrefreplace" replacement method seems wrong for UTF-8. It should not be necessary. We should, however, document explicitly that the file will receive text and not bytes, i.e. that users are themselves responsible for opening the output file with the desired encoding. We should also make it clearer that the "encoding" argument to writexml() does not change that. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: