Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.sax.xmlreader.XMLReader.getProperty (xml.sax.handler.property_xml_string) returns bytes #50935

Open
cms103 mannequin opened this issue Aug 11, 2009 · 7 comments
Open
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes topic-XML type-bug An unexpected behavior, bug, or error

Comments

@cms103
Copy link
Mannequin

cms103 mannequin commented Aug 11, 2009

BPO 6686
Nosy @loewis, @amauryfa, @scoder, @taleinat, @tiran, @jfgossage, @ukarroum
PRs
  • bpo-6686: Fix Lib.xml.sax.expatreader.GetProperty to return a string object #9715
  • bpo-35018: Sax parser provides no user access to lexical handlers. #10328
  • bpo-6686: Replaced String with Bytes in xml.sax.handler documentation #30612
  • Files
  • expatreader.py.patch: Patch to return xml.sax.handler.property_xml_string as a string rather than bytes.
  • expatreader.py.patch2: Patch to return xml.sax.handler.property_xml_string as a string and to provide the Locator2 interface.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2009-08-11.19:19:52.203>
    labels = ['expert-XML', 'type-bug', '3.9', '3.10', '3.11']
    title = 'xml.sax.xmlreader.XMLReader.getProperty (xml.sax.handler.property_xml_string) returns bytes'
    updated_at = <Date 2022-01-15.16:25:18.444>
    user = 'https://bugs.python.org/cms103'

    bugs.python.org fields:

    activity = <Date 2022-01-15.16:25:18.444>
    actor = 'iritkatriel'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['XML']
    creation = <Date 2009-08-11.19:19:52.203>
    creator = 'cms103'
    dependencies = []
    files = ['14701', '14702']
    hgrepos = []
    issue_num = 6686
    keywords = ['patch']
    message_count = 7.0
    messages = ['91482', '91503', '91504', '91505', '110871', '327700', '327708']
    nosy_count = 8.0
    nosy_names = ['loewis', 'amaury.forgeotdarc', 'scoder', 'taleinat', 'christian.heimes', 'cms103', 'Jonathan.Gossage', 'ukarroum']
    pr_nums = ['9715', '10328', '30612']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue6686'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @cms103
    Copy link
    Mannequin Author

    cms103 mannequin commented Aug 11, 2009

    The documentation for the xml.sax.handler.property_xml_string SAX
    property states that it should be "data type: String". However when
    retrieving this value in Python 3.1 it returns a bytes object instead.

    This makes handling the returned value very difficult because there is
    no method for retrieving the character set encoding that the XML was
    originally encoded with.

    This is currently blocking the port of SimpleTAL to Python 3 achieving
    feature parity with Python 2.

    @cms103 cms103 mannequin added topic-XML type-bug An unexpected behavior, bug, or error labels Aug 11, 2009
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Aug 12, 2009

    Would you like to contribute a patch?

    @cms103
    Copy link
    Mannequin Author

    cms103 mannequin commented Aug 12, 2009

    I'm not familiar with the inner workings of the expat integration with
    Python, so the attached patches need careful review.

    The first patch (expatreader.py.patch) is the minimum to resolve this
    issue. The second patch (expatreader.py.patch2) also exposes the
    version and encoding parameters via the Locator2 interface
    (http://www.saxproject.org/apidoc/org/xml/sax/ext/Locator2.html), which
    I'd recommend including.

    @cms103
    Copy link
    Mannequin Author

    cms103 mannequin commented Aug 12, 2009

    Adding second patch.

    @amauryfa
    Copy link
    Member

    A unit test (or even a sample script) showing the desired feature is needed.

    @taleinat
    Copy link
    Contributor

    See additional research and discussion in the comments of PR python/issues-test-cpython#9715.

    Simply changing this to return a string rather than bytes would break backwards compatibility.

    I certainly agree that this should have returned a string in the first place, especially since the Unicode decoding is otherwise completely abstracted away and the encoding used is not made available.

    Our options:

    1. Return a string starting with 3.8, document the change in What's New & fix the docs for older 3.x.
    2. Continue returning bytes, update the docs for all 3.x that this returns bytes, and that there's no good way to know the proper encoding to use for decoding it.
    3. As 2 above, but also expose the encoding used.

    Since this appears to be rarely used and option 3 requires significantly more effort than the others, I am against it.

    Option 2 seems the safest, but I'd like to hear more from those more experienced with XML.

    @taleinat taleinat added 3.7 (EOL) end of life 3.8 only security fixes labels Oct 14, 2018
    @jfgossage
    Copy link
    Mannequin

    jfgossage mannequin commented Oct 14, 2018

    The other thing to consider which also supports option 2 is that xml.parsers.expat provides an interface to the Expat parser which is easier to use and more complete than the Sax parser implementation and is the implementation likely to be used by anyone needing a streaming parser.

    @iritkatriel iritkatriel added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Jan 15, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants