This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: expat infinite loop
Type: behavior Stage:
Components: XML Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: StyXman, sping
Priority: normal Keywords:

Created on 2019-10-15 17:12 by StyXman, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
expat.tar.gz StyXman, 2019-10-15 17:12 tarball with test script and files
Messages (2)
msg354747 - (view) Author: Marcos Dione (StyXman) * Date: 2019-10-15 17:12
I'm trying to add external entities support to xmltodict[1]. For that I extended the handler to have a ExternalEntityRefHandler handler. After reading a couple of files, the script lock in a tight loop.

I ran the script with gdb (!!) and found that expat think that two of the parsers are parent of each other. I setup a breakpoint in XML_ExternalEntityParserCreate() (yes, this is expat, I know) right after the new parser uses the old parser as parent (xmlparse.c:1279 in my system).

Here are the backtraces and values I found:

--- >8 ---
landuse-lowzoom None styles-otm/landuse-lowzoom.xml None

#0  XML_ExternalEntityParserCreate (oldParser=0xadc4d0, context=context@entry=0x7ffff6c871e0 "landuse-lowzoom", encodingName=encodingName@entry=0x0) at ../../src/lib/xmlparse.c:1281
#1  0x000000000044ec90 in pyexpat_xmlparser_ExternalEntityParserCreate_impl (encoding=0x0, context=0x7ffff6c871e0 "landuse-lowzoom", self=0x7ffff6d556e0) at ../Modules/pyexpat.c:943
#2  pyexpat_xmlparser_ExternalEntityParserCreate (self=0x7ffff6d556e0, args=<optimized out>, nargs=<optimized out>) at ../Modules/clinic/pyexpat.c.h:137
[...]
#15 0x000000000044d80d in my_ExternalEntityRefHandler (parser=<optimized out>, context=0xae1d2c "landuse-lowzoom", base=<optimized out>, systemId=<optimized out>, publicId=<optimized out>)
    at ../Modules/pyexpat.c:659
#16 0x00007ffff7d990c8 in doContent (parser=parser@entry=0xadc4d0, startTagLevel=startTagLevel@entry=0, enc=<optimized out>,
    s=s@entry=0xae08dd "<Map background-color=\"#e0e0e0\" srs=\"+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs +over\" buffer-size=\"256\">\n\t<!-- style definitions "..., end=end@entry=0xae0ce6 '\001' <repeats 200 times>..., nextPtr=nextPtr@entry=0xadc500, haveMore=1 '\001') at ../../src/lib/xmlparse.c:2685
#17 0x00007ffff7d9957c in contentProcessor (parser=parser@entry=0xadc4d0,
    start=start@entry=0xae08dd "<Map background-color=\"#e0e0e0\" srs=\"+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs +over\" buffer-size=\"256\">\n\t<!-- style definitions "..., end=end@entry=0xae0ce6 '\001' <repeats 200 times>..., endPtr=endPtr@entry=0xadc500) at ../../src/lib/xmlparse.c:2444
#18 0x00007ffff7d96a73 in doProlog (parser=parser@entry=0xadc4d0, enc=0x7ffff7db89e0 <utf8_encoding>,
    s=0xae08dd "<Map background-color=\"#e0e0e0\" srs=\"+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs +over\" buffer-size=\"256\">\n\t<!-- style definitions "...,
    s@entry=0xae04e4 "text-water-lowzoom SYSTEM \"styles-otm/text-water-lowzoom.xml\">\n\t<!ENTITY text-glacier-lowzoom SYSTEM \"styles-otm/text-glacier-lowzoom.xml\">\n\t<!ENTITY text-natural-poly SYSTEM \"styles-otm/text-natural-"..., end=end@entry=0xae0ce6 '\001' <repeats 200 times>..., tok=29, next=<optimized out>, nextPtr=0xadc500, haveMore=1 '\001', allowClosingDoctype=1 '\001') at ../../src/lib/xmlparse.c:4371
#19 0x00007ffff7d97f3a in prologProcessor (parser=0xadc4d0,
    s=0xae04e4 "text-water-lowzoom SYSTEM \"styles-otm/text-water-lowzoom.xml\">\n\t<!ENTITY text-glacier-lowzoom SYSTEM \"styles-otm/text-glacier-lowzoom.xml\">\n\t<!ENTITY text-natural-poly SYSTEM \"styles-otm/text-natural-"..., end=0xae0ce6 '\001' <repeats 200 times>..., nextPtr=0xadc500) at ../../src/lib/xmlparse.c:4094
#20 0x00007ffff7d9bb1c in XML_ParseBuffer (isFinal=0, len=<optimized out>, parser=0xadc4d0) at ../../src/lib/xmlparse.c:1893
#21 XML_ParseBuffer (parser=0xadc4d0, len=len@entry=2048, isFinal=isFinal@entry=0) at ../../src/lib/xmlparse.c:1863
#22 0x000000000060886d in pyexpat_xmlparser_ParseFile (self=0x7ffff6d556e0, file=<optimized out>) at ../Modules/pyexpat.c:841

(gdb) print oldParser
$33 = (XML_Parser) 0xadc4d0
(gdb) print parser
$32 = (XML_Parser) 0xadecb0

7ffff6d556e0, 7ffff6d55750
<_io.BufferedReader name='styles-otm/landuse-lowzoom.xml'>
<_io.BufferedReader name='styles-otm/landuse-lowzoom.xml'>
landuse None styles-otm/landuse.xml None

#0  XML_ExternalEntityParserCreate (oldParser=0xadecb0, context=context@entry=0x7ffff6c88660 "landuse", encodingName=encodingName@entry=0x0) at ../../src/lib/xmlparse.c:1281
#1  0x000000000044ec90 in pyexpat_xmlparser_ExternalEntityParserCreate_impl (encoding=0x0, context=0x7ffff6c88660 "landuse", self=0x7ffff6d55750) at ../Modules/pyexpat.c:943
#2  pyexpat_xmlparser_ExternalEntityParserCreate (self=0x7ffff6d55750, args=<optimized out>, nargs=<optimized out>) at ../Modules/clinic/pyexpat.c.h:137
[...]
#15 0x000000000044d80d in my_ExternalEntityRefHandler (parser=<optimized out>, context=0xae1d2c "landuse", base=<optimized out>, systemId=<optimized out>, publicId=<optimized out>) at ../Modules/pyexpat.c:659
#16 0x00007ffff7d990c8 in doContent (parser=parser@entry=0xadc4d0, startTagLevel=startTagLevel@entry=0, enc=<optimized out>,
    s=s@entry=0xae08dd "<Map background-color=\"#e0e0e0\" srs=\"+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs +over\" buffer-size=\"256\">\n\t<!-- style definitions "..., end=end@entry=0xae0ce6 '\001' <repeats 200 times>..., nextPtr=nextPtr@entry=0xadc500, haveMore=1 '\001') at ../../src/lib/xmlparse.c:2685
#17 0x00007ffff7d9957c in contentProcessor (parser=parser@entry=0xadc4d0,
    start=start@entry=0xae08dd "<Map background-color=\"#e0e0e0\" srs=\"+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs +over\" buffer-size=\"256\">\n\t<!-- style definitions "..., end=end@entry=0xae0ce6 '\001' <repeats 200 times>..., endPtr=endPtr@entry=0xadc500) at ../../src/lib/xmlparse.c:2444
#18 0x00007ffff7d96a73 in doProlog (parser=parser@entry=0xadc4d0, enc=0x7ffff7db89e0 <utf8_encoding>,
    s=0xae08dd "<Map background-color=\"#e0e0e0\" srs=\"+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs +over\" buffer-size=\"256\">\n\t<!-- style definitions "...,
    s@entry=0xae04e4 "text-water-lowzoom SYSTEM \"styles-otm/text-water-lowzoom.xml\">\n\t<!ENTITY text-glacier-lowzoom SYSTEM \"styles-otm/text-glacier-lowzoom.xml\">\n\t<!ENTITY text-natural-poly SYSTEM \"styles-otm/text-natural-"..., end=end@entry=0xae0ce6 '\001' <repeats 200 times>..., tok=29, next=<optimized out>, nextPtr=0xadc500, haveMore=1 '\001', allowClosingDoctype=1 '\001') at ../../src/lib/xmlparse.c:4371
#19 0x00007ffff7d97f3a in prologProcessor (parser=0xadc4d0,
    s=0xae04e4 "text-water-lowzoom SYSTEM \"styles-otm/text-water-lowzoom.xml\">\n\t<!ENTITY text-glacier-lowzoom SYSTEM \"styles-otm/text-glacier-lowzoom.xml\">\n\t<!ENTITY text-natural-poly SYSTEM \"styles-otm/text-natural-"..., end=0xae0ce6 '\001' <repeats 200 times>..., nextPtr=0xadc500) at ../../src/lib/xmlparse.c:4094
#20 0x00007ffff7d9bb1c in XML_ParseBuffer (isFinal=0, len=<optimized out>, parser=0xadc4d0) at ../../src/lib/xmlparse.c:1893
#21 XML_ParseBuffer (parser=0xadc4d0, len=len@entry=2048, isFinal=isFinal@entry=0) at ../../src/lib/xmlparse.c:1863
#22 0x000000000060886d in pyexpat_xmlparser_ParseFile (self=0x7ffff6d556e0, file=<optimized out>) at ../Modules/pyexpat.c:841

(gdb) print oldParser
$35 = (XML_Parser) 0xadecb0
(gdb) print parser
$34 = (XML_Parser) 0xae5e00

7ffff6d55750, 7ffff6d557c0
<_io.BufferedReader name='styles-otm/landuse.xml'>
<_io.BufferedReader name='styles-otm/landuse.xml'>
landuse-over-hillshade None styles-otm/landuse-over-hillshade.xml None

#0  XML_ExternalEntityParserCreate (oldParser=0xae5e00, context=context@entry=0x7ffff6c81a60 "landuse-over-hillshade", encodingName=encodingName@entry=0x0) at ../../src/lib/xmlparse.c:1281
#1  0x000000000044ec90 in pyexpat_xmlparser_ExternalEntityParserCreate_impl (encoding=0x0, context=0x7ffff6c81a60 "landuse-over-hillshade", self=0x7ffff6d557c0) at ../Modules/pyexpat.c:943
#2  pyexpat_xmlparser_ExternalEntityParserCreate (self=0x7ffff6d557c0, args=<optimized out>, nargs=<optimized out>) at ../Modules/clinic/pyexpat.c.h:137
[...]
#15 0x000000000044d80d in my_ExternalEntityRefHandler (parser=<optimized out>, context=0xae1d2c "landuse-over-hillshade", base=<optimized out>, systemId=<optimized out>, publicId=<optimized out>)
    at ../Modules/pyexpat.c:659
#16 0x00007ffff7d990c8 in doContent (parser=parser@entry=0xadc4d0, startTagLevel=startTagLevel@entry=0, enc=<optimized out>,
    s=s@entry=0xae08dd "<Map background-color=\"#e0e0e0\" srs=\"+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs +over\" buffer-size=\"256\">\n\t<!-- style definitions "..., end=end@entry=0xae0ce6 '\001' <repeats 200 times>..., nextPtr=nextPtr@entry=0xadc500, haveMore=1 '\001') at ../../src/lib/xmlparse.c:2685
#17 0x00007ffff7d9957c in contentProcessor (parser=parser@entry=0xadc4d0,
    start=start@entry=0xae08dd "<Map background-color=\"#e0e0e0\" srs=\"+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs +over\" buffer-size=\"256\">\n\t<!-- style definitions "..., end=end@entry=0xae0ce6 '\001' <repeats 200 times>..., endPtr=endPtr@entry=0xadc500) at ../../src/lib/xmlparse.c:2444
#18 0x00007ffff7d96a73 in doProlog (parser=parser@entry=0xadc4d0, enc=0x7ffff7db89e0 <utf8_encoding>,
    s=0xae08dd "<Map background-color=\"#e0e0e0\" srs=\"+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs +over\" buffer-size=\"256\">\n\t<!-- style definitions "...,
    s@entry=0xae04e4 "text-water-lowzoom SYSTEM \"styles-otm/text-water-lowzoom.xml\">\n\t<!ENTITY text-glacier-lowzoom SYSTEM \"styles-otm/text-glacier-lowzoom.xml\">\n\t<!ENTITY text-natural-poly SYSTEM \"styles-otm/text-natural-"..., end=end@entry=0xae0ce6 '\001' <repeats 200 times>..., tok=29, next=<optimized out>, nextPtr=0xadc500, haveMore=1 '\001', allowClosingDoctype=1 '\001') at ../../src/lib/xmlparse.c:4371
#19 0x00007ffff7d97f3a in prologProcessor (parser=0xadc4d0,
    s=0xae04e4 "text-water-lowzoom SYSTEM \"styles-otm/text-water-lowzoom.xml\">\n\t<!ENTITY text-glacier-lowzoom SYSTEM \"styles-otm/text-glacier-lowzoom.xml\">\n\t<!ENTITY text-natural-poly SYSTEM \"styles-otm/text-natural-"..., end=0xae0ce6 '\001' <repeats 200 times>..., nextPtr=0xadc500) at ../../src/lib/xmlparse.c:4094
#20 0x00007ffff7d9bb1c in XML_ParseBuffer (isFinal=0, len=<optimized out>, parser=0xadc4d0) at ../../src/lib/xmlparse.c:1893
#21 XML_ParseBuffer (parser=0xadc4d0, len=len@entry=2048, isFinal=isFinal@entry=0) at ../../src/lib/xmlparse.c:1863
#22 0x000000000060886d in pyexpat_xmlparser_ParseFile (self=0x7ffff6d556e0, file=<optimized out>) at ../Modules/pyexpat.c:841

(gdb) print oldParser
$36 = (XML_Parser) 0xae5e00
(gdb) print parser
$37 = (XML_Parser) 0xadecb0
--- 8< ---

As I hope you can see, the last two values (parent 0xae5e00, new 0xadecb0) are the exact opposite of the previous one (parent 0xadecb0, new 0xae5e00). Later, when get_hash_secret_salt() is called, it enters in a infinite loop climbing up the parent ladder.

Now, this looks like an expat issue ands not pyexpat, but given that pyexpat provides its own allocator, and that the parser addresses are returned by that, I will start opening this issue here. If it can be proven that it's an expat issue, I'll take it to their issue tracker.

-----
[1] https://github.com/martinblech/xmltodict/issues/226
msg411796 - (view) Author: (sping) * Date: 2022-01-26 21:32
Hi StyXman,

I had a closer look at the files you shared, thanks for those, very helpful!

What I found is that expat_test.py uses a single scalar variable
(_DictSAXHandler.parser) to keep track of the related parser, while it would
need a stack to allow recursion.  In a way, the current approach is equivalent
to walking up the stack as expected but never going back down.
Once I make the code use a stack, the loop goes away.  I'm pasting the patch
inline (with two spaces indented globally) below.

During debugging, these are commands I used to compare internal libexpat behavior,
that may be of interest:

  EXPAT_ACCOUNTING_DEBUG=2 python expat_test.py |& sed 's,0x[0-9a-f]\+,XXX,' | tee pyexpat.txt

  EXPAT_ACCOUNTING_DEBUG=2 xmlwf -x test1.xml |& sed 's,0x[0-9a-f]\+,XXX,' | tee xmlwf.txt

  diff -u xmlwf.txt pyexpat.txt

Here's how I quick-fixed expat_test.py to make things work:

  # diff -u expat_test.py_ORIG expat_test.py
  --- expat_test.py_ORIG  2022-01-26 21:15:27.506458671 +0100
  +++ expat_test.py       2022-01-26 22:15:08.741384932 +0100
  @@ -7,11 +7,21 @@
   
       parser.ExternalEntityRefHandler = handler.externalEntityRef
   
  -    # store the parser in the handler so we can recurse
  -    handler.parser = parser
  -
   
   class _DictSAXHandler(object):
  +    def __init__(self):
  +        self._parsers = []
  +        
  +    def push_parser(self, parser):
  +        self._parsers.append(parser)
  +    
  +    def pop_parser(self):
  +        self._parsers.pop()
  +
  +    @property
  +    def parser(self):
  +        return self._parsers[-1]
  +
       def externalEntityRef(self, context, base, sysId, pubId):
           print(context, base, sysId, pubId)
           external_parser = self.parser.ExternalEntityParserCreate(context)
  @@ -19,7 +29,9 @@
           setup_parser(external_parser, self)
           f = open(sysId, 'rb')
           print(f)
  +        self.push_parser(external_parser)
           external_parser.ParseFile(f)
  +        self.pop_parser()
           print(f)
   
           # all OK
  @@ -36,12 +48,13 @@
           namespace_separator
       )
       setup_parser(parser, handler)
  +    handler.push_parser(parser)
   
       if hasattr(xml_input, 'read'):
           parser.ParseFile(xml_input)
       else:
           parser.Parse(xml_input, True)
  -    return handler.item
  +    # return handler.item  # there is no .item
   
   
   parse(open('test1.xml', 'rb'))
   
What do you tink?

PS: Please note that processing external entities has security implications
    (see https://en.wikipedia.org/wiki/XML_external_entity_attack).

Best, Sebastian
History
Date User Action Args
2022-04-11 14:59:21adminsetgithub: 82668
2022-01-26 21:32:55spingsetnosy: + sping
messages: + msg411796
2019-10-15 17:12:40StyXmancreate