This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: rewrite of minidom.Node.normalize
Type: enhancement Stage: commit review
Components: XML Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: akuchling Nosy List: akuchling, maltehelmert, r.david.murray
Priority: normal Keywords: patch

Created on 2008-02-23 19:01 by maltehelmert, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue2170.patch r.david.murray, 2009-03-19 12:46
Messages (14)
msg62794 - (view) Author: Malte Helmert (maltehelmert) Date: 2008-02-23 19:01
In the discussion of #1433694 on the #python-dev channel, it was
observed that the normalize method of minidom.Node could take some
refactoring. A patch is attached.
msg83508 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-03-13 02:29
I have applied the supplied patch to the current trunk, and it passes
the tests.  I've also attached a more extensive test case that exercises
the recursion.
msg83673 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-03-17 12:32
I checked the speed of the proposed patch, and found that it was
definitely slower than the original code.  So I took another look at the
original, and refactored it in a different way: instead of moving the
sibling relinking into a second pass, I changed to code to only relink
siblings when a node is removed.  The new patch passes all test, and is
faster than the old code.  I tested the timing both against the same
small nested document I used in testNormalize2, and by running normalize
on a 37K html document (a copy of the xml.dom.minidom chapter from the
Library Reference):

original code:
testNormalize2: [2.5144219398498535, 2.5053589344024658, 2.5059471130371094]
example.html:   [44.641155958175659, 44.575434923171997, 44.996657133102417]

original patch
testNormalize2: [2.7070891857147217, 2.7012341022491455, 2.7003159523010254]
example.html:   [67.908604860305786, 68.088788986206055, 67.92288613319397]

My patch
testNormalize2: [2.4626028537750244, 2.4619381427764893, 2.4617609977722168]
example.html:   [22.780415058135986, 22.780103921890259, 22.721666097640991]

IMO my refactoring is also easier to understand than either the old code
or the proposed patch.

Patch, including new test, is attached, and also pushed to
bzr+ssh://bazaar.launchpad.net/~rdmurray/python/issue2170.
msg83674 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-03-17 12:35
Er, I meant to say, "as easy to understand" as the proposed patch.
msg83681 - (view) Author: Malte Helmert (maltehelmert) Date: 2009-03-17 16:05
I eyeballed the new patch and wonder if the case where text nodes are
collapsed is handled correctly. Assume that self.childNodes contains
nodes [A, B, C], where A and B are non-empty text nodes and C is a
non-text node.

The algorithm would collapse A and B into A and then unlink B, and I
think C.previousSibling would still reference the unlinked B, rather
than the correct A. Am I missing something?

If this is indeed a bug in the proposed patch, I suggest adding an
additional test for this case.

The bug itself should not be too hard to fix; the "elif" case would need
the same

    if child.nextSibling:
        child.nextSibling.previousSibling = child.previousSibling

assignment as the "if" case.
msg83683 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-03-17 16:37
You are absolutely right.  I will write a test case and fix the patch.
msg83689 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-03-17 17:47
Added several more tests, to validate the sibling link code more, and
fixed the bug.
msg83707 - (view) Author: Malte Helmert (maltehelmert) Date: 2009-03-17 22:58
While we're cleaning up:
       data = child.data
       if not data:
could be
       if not child.data:
since data is not used again.

Alternatively, you could use data in place of child.data later on for a
small speed-up, but I doubt that we care about such small optimizations
here. If we do, then we should also assign a name to child.nextSibling,
though.

I found the three-line duplication between the two branches mildly
refactor-worthy, but when I eliminated it the control logic got more
involved and I don't think that's a good trade-off here.
msg83709 - (view) Author: Malte Helmert (maltehelmert) Date: 2009-03-17 23:02
I removed my original patch which has been superseded by David's patch.

David, I suggest that you also remove test_minidom.patch which is
superseded by your later patch (see http://bugs.python.org/msg77766 for
policy on this).
msg83807 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-03-19 12:46
Removed old patches, updated patch to remvoe the unnecessary local
variable assignment (also pushed to launchpad).
msg83808 - (view) Author: Malte Helmert (maltehelmert) Date: 2009-03-19 13:23
Short review: code looks good to me, patch applies cleanly to trunk,
passes tests.

@akuchling: I don't know if you remember, but this rewrite was
originally suggested by you on a bug day some time ago. I think David's
patch is ready to be applied.
msg85559 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-04-05 19:21
Andrew, any objection to my going ahead and applying this patch?
msg85762 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2009-04-08 01:15
The new version of the code looks all right, so I
think this patch can be applied.
msg85831 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-04-09 22:20
Committed in r71414.
History
Date User Action Args
2022-04-11 14:56:31adminsetgithub: 46423
2009-04-09 22:20:30r.david.murraysetstatus: open -> closed
resolution: accepted
messages: + msg85831
2009-04-09 18:46:19r.david.murraysetversions: - Python 2.6, Python 3.0
2009-04-08 01:15:28akuchlingsetmessages: + msg85762
2009-04-05 19:21:14r.david.murraysetmessages: + msg85559
2009-03-25 12:20:44r.david.murraysetstage: patch review -> commit review
2009-03-23 02:17:31brett.cannonsetstage: patch review
2009-03-19 13:23:55maltehelmertsetmessages: + msg83808
2009-03-19 12:46:27r.david.murraysetfiles: + issue2170.patch

messages: + msg83807
2009-03-19 12:45:30r.david.murraysetfiles: - issue2170.patch
2009-03-19 12:39:56r.david.murraysetfiles: - test_minidom.patch
2009-03-17 23:02:39maltehelmertsetmessages: + msg83709
2009-03-17 23:01:56maltehelmertsetfiles: - minidom.diff
2009-03-17 22:58:34maltehelmertsetmessages: + msg83707
2009-03-17 17:47:58r.david.murraysetfiles: - issue2170.patch
2009-03-17 17:47:42r.david.murraysetfiles: + issue2170.patch

messages: + msg83689
2009-03-17 16:37:21r.david.murraysetmessages: + msg83683
2009-03-17 16:05:45maltehelmertsetmessages: + msg83681
2009-03-17 12:35:28r.david.murraysetmessages: + msg83674
2009-03-17 12:33:01r.david.murraysetfiles: + issue2170.patch

messages: + msg83673
versions: + Python 3.0, Python 3.1, Python 2.7
2009-03-13 02:29:11r.david.murraysetfiles: + test_minidom.patch

nosy: + r.david.murray
messages: + msg83508

keywords: + patch
2008-02-23 19:45:41akuchlingsetassignee: akuchling
nosy: + akuchling
2008-02-23 19:01:06maltehelmertcreate