This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: expat ParseFile expects bytes, not string
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bow, christian.heimes, iritkatriel, mdehoon, sbardeau
Priority: normal Keywords:

Created on 2012-12-19 10:56 by mdehoon, last changed 2022-04-11 14:57 by admin.

Messages (4)
msg177733 - (view) Author: Michiel de Hoon (mdehoon) * Date: 2012-12-19 10:56
The expat parser in xml.parsers.expat has a Parse method and a ParseFile method. The Parse method parses a string, however the ParseFile method wants bytes.

This is a minimal example of the Parse method:

>>> import xml.parsers.expat
>>> p = xml.parsers.expat.ParserCreate()
>>> p.Parse('<?xml version="1.0"?>')

which runs fine. Note that argument to p.Parse is a string, not bytes.

This is the corresponding example of ParseFile:

>>> import xml.parsers.expat
>>> handle = open("test.xml")
>>> p = xml.parsers.expat.ParserCreate()
>>> p.ParseFile(handle)

where the file test.xml only contains <?xml version="1.0"?>
This gives an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: read() did not return a bytes object (type=str)

Opening the file test.xml in binary raises an Error:

>>> import xml.parsers.expat
>>> handle = open("test.xml", "rb")
>>> p = xml.parsers.expat.ParserCreate()
>>> p.ParseFile(handle)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
xml.parsers.expat.ExpatError: no element found: line 2, column 0

suggesting that in reality, the expat Parser needs a string, not bytes.
(the same error appears with a more meaningful XML file).

I would expect that both Parse and ParseFile accept strings, but not bytes.
msg222953 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-13 16:37
@Michiel I'm sorry about the delay in replying to you.  I can confirm the same behaviour in 3.4.1 on 3.5.0a0 on Windows 7.
msg229287 - (view) Author: Sebastien Bardeau (sbardeau) Date: 2014-10-14 13:13
Same problem here with:
Python 3.4.1 (default, Jul 30 2014, 14:02:54)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux

What is unclear to me if this is a bug or a feature. In particular, as described here http://stackoverflow.com/questions/1179305/expat-parsing-in-python-3, one can open the xml file as binary to solve the issue. But is this /really/ intended? Opening standard ASCII xml files as binaries!?
msg407203 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-11-28 12:36
Reproduced on 3.11.
History
Date User Action Args
2022-04-11 14:57:39adminsetgithub: 60930
2021-11-28 12:36:08iritkatrielsetversions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.4, Python 3.5
nosy: + iritkatriel

messages: + msg407203

components: + Library (Lib)
2019-04-26 17:47:07BreamoreBoysetnosy: - BreamoreBoy
2014-10-14 13:13:31sbardeausetnosy: + sbardeau
messages: + msg229287
2014-07-13 16:37:16BreamoreBoysetnosy: + BreamoreBoy, christian.heimes

messages: + msg222953
versions: + Python 3.4, Python 3.5, - Python 3.3
2012-12-19 13:37:34bowsetnosy: + bow
2012-12-19 10:56:49mdehooncreate