This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Expat sax parser silently ignores the InputSource protocol
Type: behavior Stage: resolved
Components: Library (Lib), Unicode, XML Versions: Python 3.4, Python 3.5, Python 2.7
Status: closed Resolution: fixed
Dependencies: 17089 Superseder:
Assigned To: serhiy.storchaka Nosy List: christian.heimes, ezio.melotti, fdrake, georg.brandl, loewis, python-dev, serhiy.storchaka, tshepang, ygale
Priority: critical Keywords: needs review, patch

Created on 2008-02-24 14:03 by ygale, last changed 2022-04-11 14:56 by admin. This issue is now closed.

File name Uploaded Description Edit
sax_character_stream.patch serhiy.storchaka, 2013-02-13 17:47 Patch for 3.x review
sax_character_stream-2.7.patch serhiy.storchaka, 2013-02-13 17:48 Patch for 2.7 review
sax_character_stream_3.patch serhiy.storchaka, 2015-03-26 07:26 review
Messages (10)
msg62901 - (view) Author: Yitz Gale (ygale) Date: 2008-02-24 14:03
The expat sax parser in xml.sax.expatreader
does not fully support the InputSource protocol
in xml.sax.xmlreader. It only accepts
byte streams. It ignores the encoding
indicated in the InputStream object and
only uses the encoding read from
the XML or defaults to UTF-8.

Rather than silently doing the wrong thing,
it should raise an error when fed a character stream,
or when given an encoding, via the InputSource

And most importantly, these limitations should be mentioned
in the documentation.
msg62903 - (view) Author: Yitz Gale (ygale) Date: 2008-02-24 14:09
See also: #1483 and #2174.
msg116975 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-09-20 21:23
As nobody appears to be interested I'll close this in a couple of weeks unless someone objects.
msg116984 - (view) Author: Yitz Gale (ygale) Date: 2010-09-20 21:46
Perhaps more people would be interested if
you raise the priority. This bug can cause
serious data corruption, or even crashes.
It should also be tagged as "easy".

An alternative would be to remove the expat
sax parser from the libraries, since we don't
support it. But that seems a little extreme.
msg117170 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-09-23 06:45
I'll have a look.
msg181383 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-04 19:46
Here is a patch, which made xml.sax.xmlreader and related utilities to support character stream. A lot of new tests added (including Yitz Gale's tests from issue1483). Some old tests fixed (they were used text stream as byte stream, this doesn't work in general case).
msg182055 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-13 17:50
This patch is rather complicated and I doubt whether it is necessary to apply it to the older version. Can anyone review it?
msg231555 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-23 12:11
msg239311 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-26 07:26
Updated to the tip, added whatsnew entry and fixed the documentation.

What parts of this patch besides tests are worth to be applied to maintained releases?
msg239936 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-04-02 18:01
New changeset 84d49ad9109b by Serhiy Storchaka in branch '2.7':
Issue #2175: Added tests for xml.sax.saxutils.prepare_input_source().

New changeset fa47897e7889 by Serhiy Storchaka in branch '3.4':
Issue #2175: Added tests for xml.sax.saxutils.prepare_input_source().

New changeset e0292b3ba245 by Serhiy Storchaka in branch 'default':
Issue #2175: Added tests for xml.sax.saxutils.prepare_input_source().

New changeset 407883c52bf3 by Serhiy Storchaka in branch 'default':
Issue #2175: SAX parsers now support a character stream of InputSource object.
Date User Action Args
2022-04-11 14:56:31adminsetgithub: 46428
2015-04-02 20:31:57serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015-04-02 18:01:02python-devsetnosy: + python-dev
messages: + msg239936
2015-03-26 07:26:06serhiy.storchakasetfiles: + sax_character_stream_3.patch

messages: + msg239311
2014-11-23 12:11:48serhiy.storchakasetkeywords: + needs review

messages: + msg231555
versions: + Python 3.5, - Python 3.3
2014-02-03 15:42:18BreamoreBoysetnosy: - BreamoreBoy
2013-12-18 22:08:55serhiy.storchakasetnosy: + christian.heimes

versions: - Python 3.2
2013-02-13 21:52:51fdrakesetnosy: + fdrake
2013-02-13 17:52:22serhiy.storchakalinkissue10590 dependencies
2013-02-13 17:50:59serhiy.storchakasetmessages: + msg182055
2013-02-13 17:48:32serhiy.storchakasetfiles: + sax_character_stream-2.7.patch
2013-02-13 17:47:52serhiy.storchakasetfiles: + sax_character_stream.patch
2013-02-13 17:47:14serhiy.storchakasetfiles: - sax_character_stream.patch
2013-02-04 19:46:21serhiy.storchakasetfiles: + sax_character_stream.patch

components: - Documentation, Extension Modules
versions: + Python 3.4
keywords: + patch
nosy: + ezio.melotti

messages: + msg181383
stage: patch review
2013-01-31 10:02:25serhiy.storchakasetdependencies: + Expat parser parses strings only when XML encoding is UTF-8
2013-01-16 18:26:43serhiy.storchakasetassignee: docs@python -> serhiy.storchaka

nosy: + serhiy.storchaka
2012-01-11 12:31:16tshepangsetnosy: + tshepang
2011-06-12 18:34:16terry.reedysetversions: + Python 3.3, - Python 3.1
2010-10-29 10:07:21adminsetassignee: georg.brandl -> docs@python
2010-09-23 06:45:14georg.brandlsetpriority: normal -> critical

nosy: + georg.brandl
messages: + msg117170

assignee: loewis -> georg.brandl
2010-09-20 21:46:26ygalesetstatus: pending -> open

messages: + msg116984
2010-09-20 21:23:47BreamoreBoysetstatus: open -> pending
nosy: + BreamoreBoy
messages: + msg116975

2010-06-09 22:00:39terry.reedysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6, Python 2.5, Python 3.0
2008-03-20 02:42:37jafosetpriority: normal
assignee: loewis
nosy: + loewis
2008-02-24 14:09:11ygalesetmessages: + msg62903
2008-02-24 14:03:02ygalecreate