This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author r.david.murray
Recipients akuchling, maltehelmert, r.david.murray
Date 2009-03-17.12:32:59
SpamBayes Score 4.633446e-09
Marked as misclassified No
Message-id <1237293182.79.0.314871463999.issue2170@psf.upfronthosting.co.za>
In-reply-to
Content
I checked the speed of the proposed patch, and found that it was
definitely slower than the original code.  So I took another look at the
original, and refactored it in a different way: instead of moving the
sibling relinking into a second pass, I changed to code to only relink
siblings when a node is removed.  The new patch passes all test, and is
faster than the old code.  I tested the timing both against the same
small nested document I used in testNormalize2, and by running normalize
on a 37K html document (a copy of the xml.dom.minidom chapter from the
Library Reference):

original code:
testNormalize2: [2.5144219398498535, 2.5053589344024658, 2.5059471130371094]
example.html:   [44.641155958175659, 44.575434923171997, 44.996657133102417]

original patch
testNormalize2: [2.7070891857147217, 2.7012341022491455, 2.7003159523010254]
example.html:   [67.908604860305786, 68.088788986206055, 67.92288613319397]

My patch
testNormalize2: [2.4626028537750244, 2.4619381427764893, 2.4617609977722168]
example.html:   [22.780415058135986, 22.780103921890259, 22.721666097640991]

IMO my refactoring is also easier to understand than either the old code
or the proposed patch.

Patch, including new test, is attached, and also pushed to
bzr+ssh://bazaar.launchpad.net/~rdmurray/python/issue2170.
History
Date User Action Args
2009-03-17 12:33:03r.david.murraysetrecipients: + r.david.murray, akuchling, maltehelmert
2009-03-17 12:33:02r.david.murraysetmessageid: <1237293182.79.0.314871463999.issue2170@psf.upfronthosting.co.za>
2009-03-17 12:33:01r.david.murraylinkissue2170 messages
2009-03-17 12:33:00r.david.murraycreate