This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unpredictable behavior when parsing xml. (xml.etree.ElementTree.iterparse)
Type: behavior Stage:
Components: XML Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: CyberCreator
Priority: normal Keywords:

Created on 2020-10-04 11:03 by CyberCreator, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
app.rar CyberCreator, 2020-10-04 11:06 Archive with python file and xml files for testing
Messages (2)
msg377928 - (view) Author: Leonid Piletsky (CyberCreator) Date: 2020-10-04 11:03
Data is lost when parsing large files.
I have prepared 5 test files for different cases.
With their help, I learned that losses are not accidental.
In example.xml, when going to iteration 717 (i = 717), the data is lost.

In the rest of the files, I learned that data loss occurs when the number of characters changes. It looks like some kind of buffer overflow.
In example5.xml I am using randomly generated data using a generator.py.

Several xml files have been prepared to show that this is not an error in the input data, but a problem in the library itself.

I tried to trace the cause of the occurrence, and came to the conclusion that the bug lies in the compiled file.

In the ElementTree.py library file, the line
"events = self._events_queue" 
returns an empty list. This can be seen at iteration 717 in example.xml.
msg377929 - (view) Author: Leonid Piletsky (CyberCreator) Date: 2020-10-04 11:07
The file example5.xml could not be loaded, so to generate it, run the generator.py.
History
Date User Action Args
2022-04-11 14:59:36adminsetgithub: 86092
2020-10-04 11:07:41CyberCreatorsetmessages: + msg377929
2020-10-04 11:06:30CyberCreatorsetfiles: + app.rar
2020-10-04 11:03:56CyberCreatorcreate