This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: sax.xmlreader.InputSource.setCharacterStream() does not work?
Type: Stage:
Components: XML Versions: Python 3.4
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: sourcejedi
Priority: normal Keywords:

Created on 2016-04-24 15:58 by sourcejedi, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (2)
msg264111 - (view) Author: Alan Jenkins (sourcejedi) * Date: 2016-04-24 15:58
python3-3.4.3-5.fc23-x86_64

So far I spelunked here.  Starting from <https://github.com/kurtmckee/feedparser/issues/30>.  I experimented with using setCharacterStream() instead of setByteStream()

setCharacterStream() is shown in documentation but exercising it fails

>>> help(InputSource)
 |  setCharacterStream(self, charfile)
 |      Set the character stream for this input source. (The stream
 |      must be a Python 2.0 Unicode-wrapped file-like that performs
 |      conversion to Unicode strings.)
 |      
 |      If there is a character stream specified, the SAX parser will
 |      ignore any byte stream and will not attempt to open a URI
 |      connection to the system identifier.

Actually using an InputSource set up this way errors out as follows:

  File "/home/alan/.local/lib/python3.4/site-packages/feedparser-5.2.1-py3.4.egg/feedparser/api.py", line 236, in parse
  File "/usr/lib64/python3.4/site-packages/drv_libxml2.py", line 146, in parse
    source = saxutils.prepare_input_source(source)
  File "/usr/lib64/python3.4/xml/sax/saxutils.py", line 355, in prepare_input_source
    sysidfilename = os.path.join(basehead, sysid)
  File "/usr/lib64/python3.4/posixpath.py", line 79, in join
    if b.startswith(sep):
AttributeError: 'NoneType' object has no attribute 'startswith'

because the character stream is not actually used:

def prepare_input_source(source, base=""):
    """This function takes an InputSource and an optional base URL and
    returns a fully resolved InputSource object ready for reading."""

    if isinstance(source, str):
        source = xmlreader.InputSource(source)
    elif hasattr(source, "read"):
        f = source
        source = xmlreader.InputSource()
        source.setByteStream(f)
        if hasattr(f, "name") and isinstance(f.name, str):
            source.setSystemId(f.name)

    if source.getByteStream() is None:
        sysid = source.getSystemId()
        basehead = os.path.dirname(os.path.normpath(base))
        sysidfilename = os.path.join(basehead, sysid)
msg264112 - (view) Author: Alan Jenkins (sourcejedi) * Date: 2016-04-24 16:02
Looks like this is documented elsewhere and fixed in 3.5.

https://fossies.org/diffs/Python/3.4.3_vs_3.5.0/Doc/library/xml.sax.reader.rst-diff.html
History
Date User Action Args
2022-04-11 14:58:30adminsetgithub: 71025
2016-04-24 16:02:41sourcejedisetstatus: open -> closed

messages: + msg264112
2016-04-24 15:58:51sourcejedicreate