Title: XML text behaviour change if there are comments
Type: behavior Stage: resolved
Components: Library (Lib), XML Versions: Python 3.9, Python 3.8
Status: closed Resolution: fixed
Assigned To: scoder Nosy List: Dima.Tisnek, Jeffrey.Kintscher, eli.bendersky, scoder, serhiy.storchaka, xtreak
Priority: normal Keywords: 3.8regression, patch

Created on 2019-06-25 08:48 by Dima.Tisnek, last changed 2019-07-24 18:49 by scoder. This issue is now closed.

Messages (8)
msg346493 - (view) Author: Dima Tisnek (Dima.Tisnek) * Date: 2019-06-25 08:48

from xml.etree import ElementTree

XML = "<a>foo<!-- comment -->bar</a>"

a = ElementTree.fromstring(XML)

# Testing 3.7.3 vs. 3.8.0b1; macOS

… ~> python3.7
… ~> python3.8
msg346497 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-06-25 09:27
Bisecting gives me commit 43851a202c (issue36673) before which "foobar" was returned and after the commit "bar" is returned.
msg346505 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-06-25 10:32
Just to add the Python implementation seems to return "foobar" on commenting the C accelerators imports. So I guess it's a problem with the C implementation in the commit being different from Python implementation.
msg346802 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-06-28 05:39
I think it might be this call that strikes here:

treebuilder_flush_data() is not made for concatenating text, it simply replaces it. If both text parts come separately, and the comment between the two is discarded, then the last one overwrites the first one.
msg346803 - (view) Author: Dima Tisnek (Dima.Tisnek) * Date: 2019-06-28 06:17
Yes that does look suspicious!
msg346920 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-06-30 07:45
I'm working on a patch. It's not entirely trivial, so it might take a couple of days.
msg348394 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-07-24 18:08
New changeset c6cb4cdd21c0c3a09b0617dbfaa7053d3bfa6def by Stefan Behnel in branch 'master':
bpo-37399: Correctly attach tail text to the last element/comment/pi (GH-14856)
msg348399 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-07-24 18:46
New changeset bb697899aa65d90488af1950ac7cceeb3877d409 by Stefan Behnel in branch '3.8':
[3.8] bpo-37399: Correctly attach tail text to the last element/comment/pi (GH-14856) (GH-14936)
