Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expat sax parser silently ignores the InputSource protocol #46428

Closed
ygale mannequin opened this issue Feb 24, 2008 · 10 comments
Closed

Expat sax parser silently ignores the InputSource protocol #46428

ygale mannequin opened this issue Feb 24, 2008 · 10 comments
Assignees
Labels
stdlib Python modules in the Lib dir topic-unicode topic-XML type-bug An unexpected behavior, bug, or error

Comments

@ygale
Copy link
Mannequin

ygale mannequin commented Feb 24, 2008

BPO 2175
Nosy @loewis, @freddrake, @birkenfeld, @tiran, @ezio-melotti, @serhiy-storchaka
Dependencies
  • bpo-17089: Expat parser parses strings only when XML encoding is UTF-8
  • Files
  • sax_character_stream.patch: Patch for 3.x
  • sax_character_stream-2.7.patch: Patch for 2.7
  • sax_character_stream_3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-04-02.20:31:57.464>
    created_at = <Date 2008-02-24.14:03:02.470>
    labels = ['expert-XML', 'type-bug', 'library', 'expert-unicode']
    title = 'Expat sax parser silently ignores the InputSource protocol'
    updated_at = <Date 2015-04-02.20:31:57.463>
    user = 'https://bugs.python.org/ygale'

    bugs.python.org fields:

    activity = <Date 2015-04-02.20:31:57.463>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-04-02.20:31:57.464>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)', 'Unicode', 'XML']
    creation = <Date 2008-02-24.14:03:02.470>
    creator = 'ygale'
    dependencies = ['17089']
    files = ['29061', '29062', '38696']
    hgrepos = []
    issue_num = 2175
    keywords = ['patch', 'needs review']
    message_count = 10.0
    messages = ['62901', '62903', '116975', '116984', '117170', '181383', '182055', '231555', '239311', '239936']
    nosy_count = 9.0
    nosy_names = ['loewis', 'fdrake', 'georg.brandl', 'ygale', 'christian.heimes', 'ezio.melotti', 'tshepang', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'critical'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue2175'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

    @ygale
    Copy link
    Mannequin Author

    ygale mannequin commented Feb 24, 2008

    The expat sax parser in xml.sax.expatreader
    does not fully support the InputSource protocol
    in xml.sax.xmlreader. It only accepts
    byte streams. It ignores the encoding
    indicated in the InputStream object and
    only uses the encoding read from
    the XML or defaults to UTF-8.

    Rather than silently doing the wrong thing,
    it should raise an error when fed a character stream,
    or when given an encoding, via the InputSource
    interface.

    And most importantly, these limitations should be mentioned
    in the documentation.

    @ygale ygale mannequin added docs Documentation in the Doc dir extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir topic-unicode topic-XML type-bug An unexpected behavior, bug, or error labels Feb 24, 2008
    @ygale
    Copy link
    Mannequin Author

    ygale mannequin commented Feb 24, 2008

    See also: bpo-1483 and bpo-2174.

    @jafo jafo mannequin assigned loewis Mar 20, 2008
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Sep 20, 2010

    As nobody appears to be interested I'll close this in a couple of weeks unless someone objects.

    @ygale
    Copy link
    Mannequin Author

    ygale mannequin commented Sep 20, 2010

    Perhaps more people would be interested if
    you raise the priority. This bug can cause
    serious data corruption, or even crashes.
    It should also be tagged as "easy".

    An alternative would be to remove the expat
    sax parser from the libraries, since we don't
    support it. But that seems a little extreme.

    @birkenfeld
    Copy link
    Member

    I'll have a look.

    @birkenfeld birkenfeld assigned birkenfeld and unassigned loewis Sep 23, 2010
    @admin admin mannequin assigned docspython and unassigned birkenfeld Oct 29, 2010
    @serhiy-storchaka
    Copy link
    Member

    Here is a patch, which made xml.sax.xmlreader and related utilities to support character stream. A lot of new tests added (including Yitz Gale's tests from bpo-1483). Some old tests fixed (they were used text stream as byte stream, this doesn't work in general case).

    @serhiy-storchaka serhiy-storchaka removed docs Documentation in the Doc dir extension-modules C modules in the Modules dir labels Feb 4, 2013
    @serhiy-storchaka
    Copy link
    Member

    This patch is rather complicated and I doubt whether it is necessary to apply it to the older version. Can anyone review it?

    @serhiy-storchaka
    Copy link
    Member

    Ping.

    @serhiy-storchaka
    Copy link
    Member

    Updated to the tip, added whatsnew entry and fixed the documentation.

    What parts of this patch besides tests are worth to be applied to maintained releases?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 2, 2015

    New changeset 84d49ad9109b by Serhiy Storchaka in branch '2.7':
    Issue bpo-2175: Added tests for xml.sax.saxutils.prepare_input_source().
    https://hg.python.org/cpython/rev/84d49ad9109b

    New changeset fa47897e7889 by Serhiy Storchaka in branch '3.4':
    Issue bpo-2175: Added tests for xml.sax.saxutils.prepare_input_source().
    https://hg.python.org/cpython/rev/fa47897e7889

    New changeset e0292b3ba245 by Serhiy Storchaka in branch 'default':
    Issue bpo-2175: Added tests for xml.sax.saxutils.prepare_input_source().
    https://hg.python.org/cpython/rev/e0292b3ba245

    New changeset 407883c52bf3 by Serhiy Storchaka in branch 'default':
    Issue bpo-2175: SAX parsers now support a character stream of InputSource object.
    https://hg.python.org/cpython/rev/407883c52bf3

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-unicode topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants