This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: xml.sax.expatreader.ExpatParser incorrectly silently skips external character entities in attribute values
Type: Stage:
Components: Library (Lib) Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: exarkun, maxy@debian.org, terry.reedy
Priority: normal Keywords:

Created on 2009-01-15 22:34 by exarkun, last changed 2022-04-11 14:56 by admin.

Files
File name Uploaded Description Edit
entity-skipped-in-attribute-value.py exarkun, 2009-01-15 22:34
Messages (4)
msg79920 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2009-01-15 22:34
The attached program demonstrates that the ContentHandler.skippedEntity
callback is not invoked for all skipped entities.  Specifically, it is
not invoked for those in attribute values.  Additionally, it
demonstrates that when parsing a document with no DOCTYPE, skippedEntity
is not called at all; instead the parser raises an exception about an
"undefined entity".
msg79960 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2009-01-16 16:39
After further investigation, I've learned a bit more.  External entities
are forbidden in attribute values.  Their presence constitutes a "fatal
error" according to <http://www.w3.org/TR/REC-xml/#forbidden>.  This
means that dropping entities in an attribute value is incorrect. 
Instead the fatal error hook must be called.
msg80006 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2009-01-17 08:00
Neither the title nor your two posts identify the module you think needs
to be changed.  Changing the title to include that might better get
attention from someone who is familiar with that module and could deal
with the request.
msg177408 - (view) Author: Maximiliano Curia (maxy@debian.org) * Date: 2012-12-13 15:20
Hi,

There are two issues commented in this bug, both are part of libexpat.
The one related the code inconsistency is due the design of Xml_Parser.

Reading Modules/expat/xmlparse.c:5036
        else if (!entity) {
          /* Cannot report skipped entity here - see comments on
             skippedEntityHandler.
          if (skippedEntityHandler)
            skippedEntityHandler(handlerArg, name, 0);
          */
          /* Cannot call the default handler because this would be
             out of sync with the call to the startElementHandler.
          if ((pool == &tempPool) && defaultHandler)
            reportDefault(parser, enc, ptr, next);
          */
          break;
        }

That's so because libexpat startElementHandler should be called before the skippedEntityHandler, but this piece of code is processed before the call to startElementHandler.
To fix this, it would require a change in the libexpat API, adding the concept of futures to the attributes processing, and a way to obtain them with an iterator.
In any case, I don't think this is a python issue, but a known libexpat limitation. It might be forwarded to libexpat developers, but from the python point of view, it should be closed.

The second issue, is not really an issue. It's the default behaviour if an entity reference is found but there is no dtd specified (the entities declaration is a xml extension). This part is working as intended.
History
Date User Action Args
2022-04-11 14:56:44adminsetgithub: 49205
2012-12-13 15:20:53maxy@debian.orgsetmessages: + msg177408
2012-12-13 14:42:52maxy@debian.orgsetnosy: + maxy@debian.org
2009-01-17 15:42:31exarkunsettitle: inconsistent, perhaps incorrect, behavior with respect to entities parsed by xml.sax -> xml.sax.expatreader.ExpatParser incorrectly silently skips external character entities in attribute values
2009-01-17 08:00:43terry.reedysetnosy: + terry.reedy
messages: + msg80006
2009-01-16 16:39:41exarkunsetmessages: + msg79960
2009-01-15 22:34:30exarkuncreate