Message411796
Hi StyXman,
I had a closer look at the files you shared, thanks for those, very helpful!
What I found is that expat_test.py uses a single scalar variable
(_DictSAXHandler.parser) to keep track of the related parser, while it would
need a stack to allow recursion. In a way, the current approach is equivalent
to walking up the stack as expected but never going back down.
Once I make the code use a stack, the loop goes away. I'm pasting the patch
inline (with two spaces indented globally) below.
During debugging, these are commands I used to compare internal libexpat behavior,
that may be of interest:
EXPAT_ACCOUNTING_DEBUG=2 python expat_test.py |& sed 's,0x[0-9a-f]\+,XXX,' | tee pyexpat.txt
EXPAT_ACCOUNTING_DEBUG=2 xmlwf -x test1.xml |& sed 's,0x[0-9a-f]\+,XXX,' | tee xmlwf.txt
diff -u xmlwf.txt pyexpat.txt
Here's how I quick-fixed expat_test.py to make things work:
# diff -u expat_test.py_ORIG expat_test.py
--- expat_test.py_ORIG 2022-01-26 21:15:27.506458671 +0100
+++ expat_test.py 2022-01-26 22:15:08.741384932 +0100
@@ -7,11 +7,21 @@
parser.ExternalEntityRefHandler = handler.externalEntityRef
- # store the parser in the handler so we can recurse
- handler.parser = parser
-
class _DictSAXHandler(object):
+ def __init__(self):
+ self._parsers = []
+
+ def push_parser(self, parser):
+ self._parsers.append(parser)
+
+ def pop_parser(self):
+ self._parsers.pop()
+
+ @property
+ def parser(self):
+ return self._parsers[-1]
+
def externalEntityRef(self, context, base, sysId, pubId):
print(context, base, sysId, pubId)
external_parser = self.parser.ExternalEntityParserCreate(context)
@@ -19,7 +29,9 @@
setup_parser(external_parser, self)
f = open(sysId, 'rb')
print(f)
+ self.push_parser(external_parser)
external_parser.ParseFile(f)
+ self.pop_parser()
print(f)
# all OK
@@ -36,12 +48,13 @@
namespace_separator
)
setup_parser(parser, handler)
+ handler.push_parser(parser)
if hasattr(xml_input, 'read'):
parser.ParseFile(xml_input)
else:
parser.Parse(xml_input, True)
- return handler.item
+ # return handler.item # there is no .item
parse(open('test1.xml', 'rb'))
What do you tink?
PS: Please note that processing external entities has security implications
(see https://en.wikipedia.org/wiki/XML_external_entity_attack).
Best, Sebastian |
|
Date |
User |
Action |
Args |
2022-01-26 21:32:55 | sping | set | recipients:
+ sping, StyXman |
2022-01-26 21:32:55 | sping | set | messageid: <1643232775.76.0.358495635944.issue38487@roundup.psfhosted.org> |
2022-01-26 21:32:55 | sping | link | issue38487 messages |
2022-01-26 21:32:55 | sping | create | |
|