This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ElementTree parser limitation of input string size
Type: behavior Stage:
Components: XML Versions: Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Ananth Vijalapuram, scoder
Priority: normal Keywords:

Created on 2020-02-21 17:59 by Ananth Vijalapuram, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg362418 - (view) Author: Ananth Vijalapuram (Ananth Vijalapuram) Date: 2020-02-21 17:59
I am trying to parse a very large XML file. Here is the output:

python3.7.4 crif_parser.py
Retrieved 3593891712 characters <- this is printed from my script
Traceback (most recent call last):
  File "crif_parser.py", line 9, in <module>
    tree = ET.fromstring(data)
  File "python3/3.7.4/lib/python3.7/xml/etree/ElementTree.py", line 1315, in XML
    parser.feed(text)
OverflowError: size does not fit in an int
msg376545 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2020-09-08 04:54
I'd suggest feeding the data into the parser in chunks, or letting it read from a file-like object, or something like that.

Also, you probably want to do incremental processing on the data (see the XMLPullParser and iterparse), because reading 3.5GB of XML data into an in-memory tree can easily result in 10x the memory usage. You may have 40GB of RAM on your machine, but even then, I'd still recommend processing the data in incrementally.
History
Date User Action Args
2022-04-11 14:59:27adminsetgithub: 83895
2020-09-08 04:55:41scodersettitle: ElementTree limitation -> ElementTree parser limitation of input string size
2020-09-08 04:54:33scodersetnosy: + scoder

messages: + msg376545
versions: + Python 3.9, Python 3.10, - Python 3.7
2020-02-21 17:59:58Ananth Vijalapuramcreate