Title: Unpredictable behavior when parsing xml. (xml.etree.ElementTree.iterparse)
Components: XML Versions: Python 3.8
Created on 2020-10-04 11:03 by CyberCreator, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg377928 - (view) Author: Leonid Piletsky (CyberCreator) Date: 2020-10-04 11:03
Data is lost when parsing large files.
I have prepared 5 test files for different cases.
With their help, I learned that losses are not accidental.
In example.xml, when going to iteration 717 (i = 717), the data is lost.

In the rest of the files, I learned that data loss occurs when the number of characters changes. It looks like some kind of buffer overflow.
In example5.xml I am using randomly generated data using a

Several xml files have been prepared to show that this is not an error in the input data, but a problem in the library itself.

I tried to trace the cause of the occurrence, and came to the conclusion that the bug lies in the compiled file.

In the library file, the line
"events = self._events_queue" 
returns an empty list. This can be seen at iteration 717 in example.xml.
msg377929 - (view) Author: Leonid Piletsky (CyberCreator) Date: 2020-10-04 11:07
The file example5.xml could not be loaded, so to generate it, run the
