classification
Title: Modify serializer for xml.etree.ElementTree to allow forcing the use of long tag closing
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: adpoliak, eli.bendersky, eric.araujo, python-dev, serhiy.storchaka
Priority: low Keywords: patch

Created on 2012-03-21 04:46 by adpoliak, last changed 2013-01-13 22:31 by eli.bendersky. This issue is now closed.

Files
File name Uploaded Description Edit
ElementTree_py-force_long_tags.patch adpoliak, 2012-03-21 04:46 Patch that adds a parameter to write() of an ElementTree object to allow forcing the use of long tag closing
ElementTree_py-force_long_tags-v2.patch adpoliak, 2012-03-24 00:57 Revised patch -- logic order changes and clearer path for patched file
ElementTree-force_long_tags-v3.patch adpoliak, 2012-06-13 02:42 Latest version of patch
ElementTree-force_long_tags-v3.patch serhiy.storchaka, 2012-06-14 20:57 Regenerate patch for review. Strip tail whitespaces. review
etree_short_empty_elements.patch serhiy.storchaka, 2013-01-10 21:12 review
etree_short_empty_elements_2.patch serhiy.storchaka, 2013-01-13 12:17 review
Messages (20)
msg156472 - (view) Author: Ariel Poliak (adpoliak) Date: 2012-03-21 04:46
As it stands in Hg, when the write() method of an xml.etree.ElementTree object is called, and a tag within the XML tree has no child tags or defined text, the tag is written using the short notation "<tag ... />".

Whether or not the short notation is used instead of the long "<tag ...></tag>" notation is used should be configurable by the programmer, without having to resort to serializing the XML into a string and then doing replace() on said string.

The attached patch adds an optional parameter to the write() method that provides this choice.
If the 'use_long_xml_tags' parameter is not set (or otherwise evaluates to the boolean False), the current behavior applies.
If this parameter evaluates to the boolean True, long tags are used when producing XML output.
msg156476 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-03-21 06:15
-            if long_xml or text or len(elem):
+            if text or len(elem) or long_xml:

Use alternatives in order of decreasing probability.
msg156648 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-03-23 12:06
Hello, thanks for the patch!

Since this is a new feature, I suggest discussion it on the python-ideas list first.

Next, as for your patch:

1. What is ElementTree_new.py?
2. Note that new features have to be added to in-development versions of Python - 3.3 in our case. In 3.3, the C and Python implementations of the ElementTree API must be compatible. Hence, all new features have to be added to both implementations.
msg156682 - (view) Author: Ariel Poliak (adpoliak) Date: 2012-03-24 00:57
To answer eli.bendersky's questions:

1. That's just the name of the file with my changes in it.
I pulled the original file from Hg, copied it to "ElementTree_new.py", made my changes, and created a patch from the two.

2. I'm not very familiar with the structure of the codebase for Python, so I'll provide some information that sounds relevant to me...

The changes I made were for the ElementTree.py file under cpython/Lib/xml/etree/ .
I used http://hg.python.org/cpython/file/54055646fd1f/Lib/xml/etree/ElementTree.py as the base to make a new patch, reflecting storchaka's recommendation on logic order and a clarification on path name for the modified file.

The source for the 'ElementC14N' module is not part of Python, so I cannot modify the code for the '_serialize_c14n' function.
It appears that this is dependent on http://bugs.python.org/issue13611 .

Looks like I may need to refactor this patch to work in a way that does not alter the signature for the _serialize_* methods.
msg162524 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-06-08 12:34
Any progress, or can this issue be closed?
msg162699 - (view) Author: Ariel Poliak (adpoliak) Date: 2012-06-13 02:42
Made a new patch.
This one contains changes for xml.etree.ElementTree for cpython, jython, and stackless.
It also contains changes to Modules/_elementtree.c for cpython and stackless.

The changes within this patch do not change the signature for the _serialize_* methods, so it can be used with any third-party library that extends ElementTree.
msg162828 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-14 21:13
I don't think that the three new fields in each Element is a suitable price for this very rare used feature.
msg162841 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-06-15 04:18
Agree with Serhiy. Why are these flags required in Element?

Also, I'm moving this to 3.4 since the patch came too late in the 3.3 process - the first beta is very soon, after which we prefer not to add new features.
msg162849 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-15 06:46
xml.sax.saxutils.XMLGenerator constructor has a parameter short_empty_elements (False by default). For consistency new ElementTree.write parameter must have the same name (True by default for compatibility).
msg162948 - (view) Author: Ariel Poliak (adpoliak) Date: 2012-06-16 02:33
Ideally, this would be taken care by the _serialize_xml() with a parameter specified when called from within write().

However, the signature for the _serialize_xml() function cannot be changed, as it needs to match the signature for the rest of the _serialize_*() functions (since which serializing function is chosen from a dictionary that then calls the specific function using the same parameters.

An alternative to this would be to create a single variable within the scope of ElementTree at runtime if the code calls to write out the full tags closing, and have the _serialize_xml() function check for the presence and value of that variable.

I initially approached the problem via the flags on Element instead due to the perceived usefulness of giving the programmer full control on how the tree is serialized into XML.

However, if I'm the only one that sees that as useful, I can certainly refactor the code to go with the above solution (or some other more elegant solution).
msg164715 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012-07-06 03:23
I see no harm in modifying the signature of the private _serialize_* functions to accept another argument or dict of options.
msg179555 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-10 15:04
Ariel, are you interested in pursuing this issue?

Serhiy, I see you assigned this to yourself - would you like to submit a patch?
msg179575 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-10 16:58
> Serhiy, I see you assigned this to yourself - would you like to submit a patch?

Not right now. This is low priority for me too. But I want to see this feature in 3.4.
msg179595 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-10 20:40
Well, here is a patch which add short_empty_elements flag (as for XMLGenerator) to write(), tostring() and tostringlist() methods of ElementTree.
msg179867 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-13 12:17
Patch updated (tostring() and tostringlist() refet to write() about short_empty_elements parameter). Perhaps descriptions of encoding and method parameters should not be repeated too?

> Why do you force short_empty_elements to be keyword only?

Because sequences of parameters in XMLGenerator(), ElementTree.write(), ElementTree.tostring() are different and this can confuse. Also it will be easer to deprecate or rename keyword-only parameter in future (in favor of general fabric for example). I think that all optional, non-basic and very rarely used parameters should by keyword-only.
msg179874 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-13 14:05
New changeset 58168d69b496 by Eli Bendersky in branch 'default':
Close #14377: Add a new parameter to ElementTree.write and some module-level
http://hg.python.org/cpython/rev/58168d69b496
msg179875 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2013-01-13 14:09
I don’t think a space before the slash should be added.  (It was common in the days of XHTML 1 because of an SGML parsing hack.)
msg179878 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-13 14:27
On Sun, Jan 13, 2013 at 6:09 AM, Éric Araujo <report@bugs.python.org> wrote:

>
> Éric Araujo added the comment:
>
> I don’t think a space before the slash should be added.  (It was common in
> the days of XHTML 1 because of an SGML parsing hack.)
>

Ok, will fix.
msg179893 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-13 20:43
I think Éric means different spaces, spaces in empty tags (<empty /> vs <empty/>). I don't know what the standard says about this. It should a separated issue.

As for line continuations in docs, in all cases where they are occurred, a space used before a backslash for readability. I have reverted this change in 50606131a987.
msg179898 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-01-13 22:31
OK, thanks.
History
Date User Action Args
2013-01-13 22:31:44eli.benderskysetmessages: + msg179898
2013-01-13 20:43:54serhiy.storchakasetmessages: + msg179893
2013-01-13 14:27:21eli.benderskysetmessages: + msg179878
2013-01-13 14:09:41eric.araujosetnosy: + eric.araujo
messages: + msg179875
2013-01-13 14:05:09python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg179874

resolution: fixed
stage: patch review -> resolved
2013-01-13 12:17:57serhiy.storchakasetfiles: + etree_short_empty_elements_2.patch

messages: + msg179867
2013-01-10 21:12:07serhiy.storchakasetfiles: + etree_short_empty_elements.patch
2013-01-10 20:40:55serhiy.storchakasetmessages: + msg179595
stage: needs patch -> patch review
2013-01-10 16:58:15serhiy.storchakasetmessages: + msg179575
2013-01-10 15:04:15eli.benderskysetmessages: + msg179555
2013-01-07 18:34:07serhiy.storchakasetassignee: serhiy.storchaka
2012-11-04 17:09:32serhiy.storchakasetstage: patch review -> needs patch
2012-07-06 03:23:19eli.benderskysetmessages: + msg164715
2012-06-16 02:33:06adpoliaksetmessages: + msg162948
2012-06-15 06:46:07serhiy.storchakasetmessages: + msg162849
2012-06-15 04:18:31eli.benderskysetpriority: normal -> low

stage: patch review
messages: + msg162841
versions: + Python 3.4, - Python 3.2
2012-06-14 21:13:52serhiy.storchakasetmessages: + msg162828
2012-06-14 20:57:10serhiy.storchakasetfiles: + ElementTree-force_long_tags-v3.patch
2012-06-13 02:42:25adpoliaksetfiles: + ElementTree-force_long_tags-v3.patch

messages: + msg162699
2012-06-08 12:34:07eli.benderskysetmessages: + msg162524
2012-03-24 00:57:31adpoliaksetfiles: + ElementTree_py-force_long_tags-v2.patch

messages: + msg156682
2012-03-23 12:06:03eli.benderskysetmessages: + msg156648
2012-03-21 19:24:51ned.deilysetnosy: + eli.bendersky
2012-03-21 06:15:19serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg156476
2012-03-21 04:46:26adpoliakcreate