This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ElementTree.fromstring raises undocumented UnicodeError
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, r.david.murray, terry.reedy, vfaronov
Priority: normal Keywords:

Created on 2017-03-24 14:34 by vfaronov, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg290091 - (view) Author: Vasiliy Faronov (vfaronov) Date: 2017-03-24 14:34
>>> from xml.etree import ElementTree as ET
>>> ET.fromstring(b'<\xC4/>')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1314, in XML
    parser.feed(text)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 0: invalid continuation byt
e

The documentation for xml.etree.ElementTree does not mention that it can raise UnicodeError, only ParseError.

I think that either the above error should be wrapped in a ParseError, or the documentation should be amended.

This happens at least on 3.6, 3.5 and 2.7.
msg290308 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-03-24 22:57
I disagree.  The docs only sporadically mention specific exceptions for specific functions.  UnicodeDecodeError can occur any place bytes are decoded to unicode.  I think this should be closed.

Builtin exceptions are documented in https://docs.python.org/3/library/exceptions.html.  Module docs document additional exceptions defined in a module.  ParseError is one such.  https://docs.python.org/3/library/xml.etree.elementtree.html#exceptions.  It is not specifically mentioned in the entry for fromstring or .feed.

I also disagree that the decode error should be wrapped as a parse error.  It happens before parsing in the data preparation step, and the UnicodeDecodeError message give 3 pieces of information specific to the problem.
msg290463 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-03-25 04:28
Agreed with Terry.  The general policy in Python is that we let errors bubble up unless there is a good reason to do something else with them.  And errors that bubble up are not, in general, documented.  (In short, Python is not Java :)
History
Date User Action Args
2022-04-11 14:58:44adminsetgithub: 74082
2017-03-25 04:28:39r.david.murraysetstatus: pending -> closed

nosy: + r.david.murray
messages: + msg290463

resolution: not a bug
stage: resolved
2017-03-24 22:57:00terry.reedysetstatus: open -> pending

nosy: + docs@python, terry.reedy
messages: + msg290308

assignee: docs@python
components: + Documentation
2017-03-24 14:34:37vfaronovcreate