Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.dom.minidom wrong indentation writing for CDATA section #80588

Closed
vsurjaninov mannequin opened this issue Mar 23, 2019 · 5 comments
Closed

xml.dom.minidom wrong indentation writing for CDATA section #80588

vsurjaninov mannequin opened this issue Mar 23, 2019 · 5 comments
Labels
3.8 only security fixes topic-XML type-feature A feature request or enhancement

Comments

@vsurjaninov
Copy link
Mannequin

vsurjaninov mannequin commented Mar 23, 2019

BPO 36407
Nosy @scoder, @serhiy-storchaka, @vsurjaninov
PRs
  • bpo-36407: Fix writing indentations of CDATA section (xml.dom.minidom) #12514
  • [2.7] bpo-36407: Fix writing indentations of CDATA section (xml.dom.minidom). (GH-12514) #12578
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-03-27.12:08:27.988>
    created_at = <Date 2019-03-23.15:38:49.094>
    labels = ['expert-XML', 'type-feature', '3.8']
    title = 'xml.dom.minidom wrong indentation writing for CDATA section'
    updated_at = <Date 2019-03-27.12:08:27.987>
    user = 'https://github.com/vsurjaninov'

    bugs.python.org fields:

    activity = <Date 2019-03-27.12:08:27.987>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-03-27.12:08:27.988>
    closer = 'serhiy.storchaka'
    components = ['XML']
    creation = <Date 2019-03-23.15:38:49.094>
    creator = 'vsurjaninov'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36407
    keywords = ['patch']
    message_count = 5.0
    messages = ['338681', '338701', '338936', '338939', '338943']
    nosy_count = 4.0
    nosy_names = ['scoder', 'eli.bendersky', 'serhiy.storchaka', 'vsurjaninov']
    pr_nums = ['12514', '12578']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue36407'
    versions = ['Python 3.8']

    @vsurjaninov
    Copy link
    Mannequin Author

    vsurjaninov mannequin commented Mar 23, 2019

    If we are writing xml with CDATA section and leaving non-empty indentation and new-line parameters, a parent node of the section will contain useless indentation, that will be parsed as a text.

    Example:

    >>doc = minidom.Document()
    >>root = doc.createElement('root')
    >>doc.appendChild(root)
    >>node = doc.createElement('node')
    >>root.appendChild(node)
    >>data = doc.createCDATASection('</data>')
    >>node.appendChild(data)
    >>print(doc.toprettyxml(indent=‘ ‘ * 4)
    <?xml version="1.0" ?>
    <root>
    <node>
    <![CDATA[</data>]]> </node>
    </root>

    If we try to parse this output doc, we won’t get CDATA value correctly.

    Following code returns a string that contains only indentation characters:

    >>doc = minidom.parseString(xml_text)
    >>doc.getElementsByTagName('node')[0].firstChild.nodeValue

    Returns a string with CDATA value and indentation characters:

    >>doc.getElementsByTagName('node')[0].firstChild.wholeText

    But we have a workaround:

    >>data.nodeType = data.TEXT_NODE

    >>print(doc.toprettyxml(indent=‘ ‘ * 4)
    <?xml version="1.0" ?>
    <root>
    <node><![CDATA[</data>]]></node>
    </root>

    It will be parsed correctly:

    >>doc.getElementsByTagName('node')[0].firstChild.nodeValue
    </data>

    But I think it will be better if we fix the writing function, which would set this as default behavior.

    @vsurjaninov vsurjaninov mannequin added topic-XML type-feature A feature request or enhancement labels Mar 23, 2019
    @scoder
    Copy link
    Contributor

    scoder commented Mar 23, 2019

    Yes, this case is incorrect. Pretty printing should not change character content inside of a simple tag.

    The PR looks good to me.

    @scoder scoder added the 3.8 only security fixes label Mar 23, 2019
    @serhiy-storchaka
    Copy link
    Member

    New changeset 384b81d by Serhiy Storchaka (Vladimir Surjaninov) in branch 'master':
    bpo-36407: Fix writing indentations of CDATA section (xml.dom.minidom). (GH-12514)
    384b81d

    @serhiy-storchaka
    Copy link
    Member

    Should we backport this change? I am not sure.

    @scoder
    Copy link
    Contributor

    scoder commented Mar 27, 2019

    I don't think this should be backported. Pretty-printing is not a production relevant feature, more of a "debugging, diffing and help users see what they get" kind of feature. It's good to have it fixed for the future, but we shouldn't bother users with it during a point release.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes topic-XML type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants