Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify serializer for xml.etree.ElementTree to allow forcing the use of long tag closing #58585

Closed
adpoliak mannequin opened this issue Mar 21, 2012 · 20 comments
Closed
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@adpoliak
Copy link
Mannequin

adpoliak mannequin commented Mar 21, 2012

BPO 14377
Nosy @merwok, @serhiy-storchaka
Files
  • ElementTree_py-force_long_tags.patch: Patch that adds a parameter to write() of an ElementTree object to allow forcing the use of long tag closing
  • ElementTree_py-force_long_tags-v2.patch: Revised patch -- logic order changes and clearer path for patched file
  • ElementTree-force_long_tags-v3.patch: Latest version of patch
  • ElementTree-force_long_tags-v3.patch: Regenerate patch for review. Strip tail whitespaces.
  • etree_short_empty_elements.patch
  • etree_short_empty_elements_2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2013-01-13.14:05:09.775>
    created_at = <Date 2012-03-21.04:46:26.240>
    labels = ['type-feature', 'library']
    title = 'Modify serializer for xml.etree.ElementTree to allow forcing the use of long tag closing'
    updated_at = <Date 2013-01-13.22:31:44.118>
    user = 'https://bugs.python.org/adpoliak'

    bugs.python.org fields:

    activity = <Date 2013-01-13.22:31:44.118>
    actor = 'eli.bendersky'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2013-01-13.14:05:09.775>
    closer = 'python-dev'
    components = ['Library (Lib)']
    creation = <Date 2012-03-21.04:46:26.240>
    creator = 'adpoliak'
    dependencies = []
    files = ['24978', '25008', '25946', '26010', '28678', '28715']
    hgrepos = []
    issue_num = 14377
    keywords = ['patch']
    message_count = 20.0
    messages = ['156472', '156476', '156648', '156682', '162524', '162699', '162828', '162841', '162849', '162948', '164715', '179555', '179575', '179595', '179867', '179874', '179875', '179878', '179893', '179898']
    nosy_count = 5.0
    nosy_names = ['eric.araujo', 'eli.bendersky', 'python-dev', 'serhiy.storchaka', 'adpoliak']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue14377'
    versions = ['Python 3.4']

    @adpoliak
    Copy link
    Mannequin Author

    adpoliak mannequin commented Mar 21, 2012

    As it stands in Hg, when the write() method of an xml.etree.ElementTree object is called, and a tag within the XML tree has no child tags or defined text, the tag is written using the short notation "<tag ... />".

    Whether or not the short notation is used instead of the long "<tag ...></tag>" notation is used should be configurable by the programmer, without having to resort to serializing the XML into a string and then doing replace() on said string.

    The attached patch adds an optional parameter to the write() method that provides this choice.
    If the 'use_long_xml_tags' parameter is not set (or otherwise evaluates to the boolean False), the current behavior applies.
    If this parameter evaluates to the boolean True, long tags are used when producing XML output.

    @adpoliak adpoliak mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Mar 21, 2012
    @serhiy-storchaka
    Copy link
    Member

    •        if long_xml or text or len(elem):
      

    + if text or len(elem) or long_xml:

    Use alternatives in order of decreasing probability.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Mar 23, 2012

    Hello, thanks for the patch!

    Since this is a new feature, I suggest discussion it on the python-ideas list first.

    Next, as for your patch:

    1. What is ElementTree_new.py?
    2. Note that new features have to be added to in-development versions of Python - 3.3 in our case. In 3.3, the C and Python implementations of the ElementTree API must be compatible. Hence, all new features have to be added to both implementations.

    @adpoliak
    Copy link
    Mannequin Author

    adpoliak mannequin commented Mar 24, 2012

    To answer eli.bendersky's questions:

    1. That's just the name of the file with my changes in it.
      I pulled the original file from Hg, copied it to "ElementTree_new.py", made my changes, and created a patch from the two.

    2. I'm not very familiar with the structure of the codebase for Python, so I'll provide some information that sounds relevant to me...

    The changes I made were for the ElementTree.py file under cpython/Lib/xml/etree/ .
    I used http://hg.python.org/cpython/file/54055646fd1f/Lib/xml/etree/ElementTree.py as the base to make a new patch, reflecting storchaka's recommendation on logic order and a clarification on path name for the modified file.

    The source for the 'ElementC14N' module is not part of Python, so I cannot modify the code for the '_serialize_c14n' function.
    It appears that this is dependent on http://bugs.python.org/issue13611 .

    Looks like I may need to refactor this patch to work in a way that does not alter the signature for the _serialize_* methods.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jun 8, 2012

    Any progress, or can this issue be closed?

    @adpoliak
    Copy link
    Mannequin Author

    adpoliak mannequin commented Jun 13, 2012

    Made a new patch.
    This one contains changes for xml.etree.ElementTree for cpython, jython, and stackless.
    It also contains changes to Modules/_elementtree.c for cpython and stackless.

    The changes within this patch do not change the signature for the _serialize_* methods, so it can be used with any third-party library that extends ElementTree.

    @serhiy-storchaka
    Copy link
    Member

    I don't think that the three new fields in each Element is a suitable price for this very rare used feature.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jun 15, 2012

    Agree with Serhiy. Why are these flags required in Element?

    Also, I'm moving this to 3.4 since the patch came too late in the 3.3 process - the first beta is very soon, after which we prefer not to add new features.

    @serhiy-storchaka
    Copy link
    Member

    xml.sax.saxutils.XMLGenerator constructor has a parameter short_empty_elements (False by default). For consistency new ElementTree.write parameter must have the same name (True by default for compatibility).

    @adpoliak
    Copy link
    Mannequin Author

    adpoliak mannequin commented Jun 16, 2012

    Ideally, this would be taken care by the _serialize_xml() with a parameter specified when called from within write().

    However, the signature for the _serialize_xml() function cannot be changed, as it needs to match the signature for the rest of the _serialize_*() functions (since which serializing function is chosen from a dictionary that then calls the specific function using the same parameters.

    An alternative to this would be to create a single variable within the scope of ElementTree at runtime if the code calls to write out the full tags closing, and have the _serialize_xml() function check for the presence and value of that variable.

    I initially approached the problem via the flags on Element instead due to the perceived usefulness of giving the programmer full control on how the tree is serialized into XML.

    However, if I'm the only one that sees that as useful, I can certainly refactor the code to go with the above solution (or some other more elegant solution).

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jul 6, 2012

    I see no harm in modifying the signature of the private _serialize_* functions to accept another argument or dict of options.

    @serhiy-storchaka serhiy-storchaka self-assigned this Jan 7, 2013
    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 10, 2013

    Ariel, are you interested in pursuing this issue?

    Serhiy, I see you assigned this to yourself - would you like to submit a patch?

    @serhiy-storchaka
    Copy link
    Member

    Serhiy, I see you assigned this to yourself - would you like to submit a patch?

    Not right now. This is low priority for me too. But I want to see this feature in 3.4.

    @serhiy-storchaka
    Copy link
    Member

    Well, here is a patch which add short_empty_elements flag (as for XMLGenerator) to write(), tostring() and tostringlist() methods of ElementTree.

    @serhiy-storchaka
    Copy link
    Member

    Patch updated (tostring() and tostringlist() refet to write() about short_empty_elements parameter). Perhaps descriptions of encoding and method parameters should not be repeated too?

    Why do you force short_empty_elements to be keyword only?

    Because sequences of parameters in XMLGenerator(), ElementTree.write(), ElementTree.tostring() are different and this can confuse. Also it will be easer to deprecate or rename keyword-only parameter in future (in favor of general fabric for example). I think that all optional, non-basic and very rarely used parameters should by keyword-only.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 13, 2013

    New changeset 58168d69b496 by Eli Bendersky in branch 'default':
    Close bpo-14377: Add a new parameter to ElementTree.write and some module-level
    http://hg.python.org/cpython/rev/58168d69b496

    @python-dev python-dev mannequin closed this as completed Jan 13, 2013
    @merwok
    Copy link
    Member

    merwok commented Jan 13, 2013

    I don’t think a space before the slash should be added. (It was common in the days of XHTML 1 because of an SGML parsing hack.)

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 13, 2013

    On Sun, Jan 13, 2013 at 6:09 AM, Éric Araujo <report@bugs.python.org> wrote:

    Éric Araujo added the comment:

    I don’t think a space before the slash should be added. (It was common in
    the days of XHTML 1 because of an SGML parsing hack.)

    Ok, will fix.

    @serhiy-storchaka
    Copy link
    Member

    I think Éric means different spaces, spaces in empty tags (<empty /> vs <empty/>). I don't know what the standard says about this. It should a separated issue.

    As for line continuations in docs, in all cases where they are occurred, a space used before a backslash for readability. I have reverted this change in 50606131a987.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 13, 2013

    OK, thanks.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants