This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author akuchling
Recipients akuchling
Date 2008-02-15.15:15:01
SpamBayes Score 0.016560765
Marked as misclassified No
Message-id <>
Here's a simple test to demonstrate the problem:

from xml.sax import make_parser
from xml.sax.saxutils import prepare_input_source
parser = make_parser()
inp = prepare_input_source('file:file.xhtml')

file.xhtml contains:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="" />

If you insert a debug print into saxutils.prepare_input_source, 
in the branch which uses urllib.urlopen(), you get the above list of
inputs accessed: the XHTML 1.1 DTD, which is nicely modular and pulls in
all those other files.

I don't see a good way to fix this without breaking backward
compatibility to some degree.  The 
external-general-entities features defaults to 'on', which enables this
fetching; we could change the default to 'off', which would save the
parsing effort, but would also mean that entities like &eacute; weren't

If we had catalog support, we could ship the XHTML 1.1 DTDs and any
other DTDs of wide usage, but we don't.
Date User Action Args
2008-02-15 15:15:09akuchlingsetspambayes_score: 0.0165608 -> 0.016560765
recipients: + akuchling
2008-02-15 15:15:09akuchlingsetspambayes_score: 0.0165608 -> 0.0165608
messageid: <>
2008-02-15 15:15:02akuchlinglinkissue2124 messages
2008-02-15 15:15:01akuchlingcreate