classification
Title: XML text behaviour change if there are comments
Type: behavior Stage: resolved
Components: Library (Lib), XML Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: scoder Nosy List: Dima.Tisnek, Jeffrey.Kintscher, eli.bendersky, scoder, serhiy.storchaka, xtreak
Priority: normal Keywords: 3.8regression, patch

Created on 2019-06-25 08:48 by Dima.Tisnek, last changed 2019-07-24 18:49 by scoder. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 14856 merged scoder, 2019-07-19 08:29
PR 14936 merged scoder, 2019-07-24 18:28
Messages (8)
msg346493 - (view) Author: Dima Tisnek (Dima.Tisnek) * Date: 2019-06-25 08:48
Example:

# mre.py
from xml.etree import ElementTree

XML = "<a>foo<!-- comment -->bar</a>"

a = ElementTree.fromstring(XML)
print(list(a.itertext()))

# Testing 3.7.3 vs. 3.8.0b1; macOS

… ~> python3.7 mre.py
['foobar']
… ~> python3.8 mre.py
['bar']
msg346497 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-06-25 09:27
Bisecting gives me commit 43851a202c (issue36673) before which "foobar" was returned and after the commit "bar" is returned.
msg346505 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-06-25 10:32
Just to add the Python implementation seems to return "foobar" on commenting the C accelerators imports. So I guess it's a problem with the C implementation in the commit being different from Python implementation.
msg346802 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-06-28 05:39
I think it might be this call that strikes here:

https://github.com/python/cpython/commit/43851a202c#diff-f3b827d6e1d5c270ab42bc2c0523c1d2R2842

treebuilder_flush_data() is not made for concatenating text, it simply replaces it. If both text parts come separately, and the comment between the two is discarded, then the last one overwrites the first one.
msg346803 - (view) Author: Dima Tisnek (Dima.Tisnek) * Date: 2019-06-28 06:17
Yes that does look suspicious!
msg346920 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-06-30 07:45
I'm working on a patch. It's not entirely trivial, so it might take a couple of days.
msg348394 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-07-24 18:08
New changeset c6cb4cdd21c0c3a09b0617dbfaa7053d3bfa6def by Stefan Behnel in branch 'master':
bpo-37399: Correctly attach tail text to the last element/comment/pi (GH-14856)
https://github.com/python/cpython/commit/c6cb4cdd21c0c3a09b0617dbfaa7053d3bfa6def
msg348399 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-07-24 18:46
New changeset bb697899aa65d90488af1950ac7cceeb3877d409 by Stefan Behnel in branch '3.8':
[3.8] bpo-37399: Correctly attach tail text to the last element/comment/pi (GH-14856) (GH-14936)
https://github.com/python/cpython/commit/bb697899aa65d90488af1950ac7cceeb3877d409
History
Date User Action Args
2019-07-24 18:49:04scodersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-07-24 18:46:12scodersetmessages: + msg348399
2019-07-24 18:28:10scodersetpull_requests: + pull_request14709
2019-07-24 18:08:07scodersetmessages: + msg348394
2019-07-19 08:29:49scodersetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request14646
2019-06-30 07:45:10scodersetassignee: scoder
messages: + msg346920
2019-06-29 01:05:28Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher
2019-06-28 06:17:49Dima.Tisneksetmessages: + msg346803
2019-06-28 05:39:29scodersetmessages: + msg346802
2019-06-25 10:32:40xtreaksetmessages: + msg346505
2019-06-25 09:43:37serhiy.storchakasetcomponents: + XML
2019-06-25 09:43:28serhiy.storchakasetkeywords: + 3.8regression
type: behavior
stage: needs patch
2019-06-25 09:27:04xtreaksetnosy: + eli.bendersky, serhiy.storchaka, xtreak, scoder

messages: + msg346497
versions: + Python 3.9
2019-06-25 08:48:22Dima.Tisnekcreate