Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulldom cannot handle xml file with large external entity properly #47067

Open
hanselda mannequin opened this issue May 11, 2008 · 1 comment
Open

pulldom cannot handle xml file with large external entity properly #47067

hanselda mannequin opened this issue May 11, 2008 · 1 comment
Labels
3.8 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir topic-XML

Comments

@hanselda
Copy link
Mannequin

hanselda mannequin commented May 11, 2008

BPO 2818
Nosy @scoder, @tiran, @websurfer5

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2008-05-11.13:32:17.541>
labels = ['expert-XML', '3.8', 'performance']
title = 'pulldom cannot handle xml file with large external entity properly'
updated_at = <Date 2019-05-28.08:47:13.739>
user = 'https://bugs.python.org/hanselda'

bugs.python.org fields:

activity = <Date 2019-05-28.08:47:13.739>
actor = 'Jeffrey.Kintscher'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['XML']
creation = <Date 2008-05-11.13:32:17.541>
creator = 'hanselda'
dependencies = []
files = []
hgrepos = []
issue_num = 2818
keywords = []
message_count = 1.0
messages = ['66628']
nosy_count = 5.0
nosy_names = ['scoder', 'christian.heimes', 'hanselda', 'mvolz', 'Jeffrey.Kintscher']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'resource usage'
url = 'https://bugs.python.org/issue2818'
versions = ['Python 2.7', 'Python 3.8']

@hanselda
Copy link
Mannequin Author

hanselda mannequin commented May 11, 2008

when use xml.dom.pulldom module to parse a large xml file, if all the
information is saved in one xml file, the module can handle it in the
following way without construction the whole DOM:

events = xml.dom.pulldom.parse('file.xml')
for (event, node) in events:
    process(event, node)

But if 'file.xml' contains some large external entities, for example:

<!ENTITY file_external SYSTEM "others.xml">
<body>&file_external;</body>

Then using the same python snippet above leads to enormous memory
usage. I did not perform a concrete benchmark, in one case a 3M
external xml file drained about 1 GB memory. I think in this case it
might be the whole DOM structure is constructed.

@hanselda hanselda mannequin added topic-XML performance Performance or resource usage labels May 11, 2008
@csabella csabella added the 3.8 only security fixes label May 16, 2019
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir topic-XML
Projects
None yet
Development

No branches or pull requests

2 participants