This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: iterparse does not return the full subtree on "start" events
Type: behavior Stage: resolved
Components: Library (Lib), XML Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Igor Nowicki, eli.bendersky, scoder, serhiy.storchaka
Priority: normal Keywords:

Created on 2019-01-13 08:17 by Igor Nowicki, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
find_records.py Igor Nowicki, 2019-01-13 08:17 Program with minimal example to reproduce bug
Messages (2)
msg333549 - (view) Author: Igor Nowicki (Igor Nowicki) Date: 2019-01-13 08:17
Consider we have big XML file and we can't load it all into memory. We use then `iterparse` function from XML.etree.ElementTree module to parse it element by element.

Problem is, XML doesn't allow to run this smoothly and starts outputing wrong data after loading 16 kb (16*1024, found it after looking into source code). Having large number of children, we get the information that we have just a few.

To reproduce the problem, I created this example program. It makes simple xml file with progressively bigger files and tracks how many children of main objects there are counted. For small objects we have actual number, 100 children. For bigger and bigger sizes we have smaller numbers, going down to just few.
msg333551 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-01-13 09:12
This is not a bug, it's normal, documented behaviour. The children are not guaranteed to be available during the "start" event. Only the tag itself is guaranteed to be there. The guarantee that the subtree is complete is only given for the "end" event.

See the big note in the documentation:
https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse
History
Date User Action Args
2022-04-11 14:59:10adminsetgithub: 79910
2019-01-13 16:59:18ned.deilysetstatus: open -> closed
resolution: not a bug
stage: resolved
2019-01-13 09:13:48scodersettype: performance -> behavior
title: XML.etree bug -> iterparse does not return the full subtree on "start" events
2019-01-13 09:12:39scodersetmessages: + msg333551
2019-01-13 08:43:39serhiy.storchakasetnosy: + scoder, eli.bendersky, serhiy.storchaka
components: + XML
2019-01-13 08:17:55Igor Nowickicreate