Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sax.parser cannot get xml data from a subprocess pipe #67104

Closed
JocelynJ mannequin opened this issue Nov 21, 2014 · 12 comments
Closed

sax.parser cannot get xml data from a subprocess pipe #67104

JocelynJ mannequin opened this issue Nov 21, 2014 · 12 comments
Assignees
Labels
easy topic-XML type-bug An unexpected behavior, bug, or error

Comments

@JocelynJ
Copy link
Mannequin

JocelynJ mannequin commented Nov 21, 2014

BPO 22915
Nosy @tiran, @bitdancer, @serhiy-storchaka
Files
  • test-22915.diff: test
  • toto.py: testcase
  • sax_non_str_file_name.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-11-27.20:53:22.182>
    created_at = <Date 2014-11-21.23:54:11.033>
    labels = ['expert-XML', 'easy', 'type-bug']
    title = 'sax.parser cannot get xml data from a subprocess pipe'
    updated_at = <Date 2014-11-27.21:10:16.436>
    user = 'https://bugs.python.org/JocelynJ'

    bugs.python.org fields:

    activity = <Date 2014-11-27.21:10:16.436>
    actor = 'JocelynJ'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-11-27.20:53:22.182>
    closer = 'serhiy.storchaka'
    components = ['XML']
    creation = <Date 2014-11-21.23:54:11.033>
    creator = 'JocelynJ'
    dependencies = []
    files = ['37249', '37250', '37252']
    hgrepos = []
    issue_num = 22915
    keywords = ['patch', 'easy']
    message_count = 12.0
    messages = ['231504', '231505', '231508', '231537', '231538', '231545', '231553', '231568', '231572', '231777', '231780', '231782']
    nosy_count = 6.0
    nosy_names = ['christian.heimes', 'r.david.murray', 'python-dev', 'serhiy.storchaka', 'JocelynJ', 'Avneesh.Chadha']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue22915'
    versions = ['Python 3.4', 'Python 3.5']

    @JocelynJ
    Copy link
    Mannequin Author

    JocelynJ mannequin commented Nov 21, 2014

    With the attached code, and an xml file, I got the following error with python 3.4.2:

    $ cat toto.xml
    <?xml version='1.0' encoding='UTF-8'?>
    <test></test>
    $ python3.4 toto.py 
    Traceback (most recent call last):
      File "toto.py", line 10, in <module>
        parse(proc.stdout, ContentHandler())
      File "/usr/lib/python3.4/xml/sax/__init__.py", line 33, in parse
        parser.parse(source)
      File "/usr/lib/python3.4/xml/sax/expatreader.py", line 107, in parse
        xmlreader.IncrementalParser.parse(self, source)
      File "/usr/lib/python3.4/xml/sax/xmlreader.py", line 119, in parse
        self.prepareParser(source)
      File "/usr/lib/python3.4/xml/sax/expatreader.py", line 111, in prepareParser
        self._parser.SetBase(source.getSystemId())
    TypeError: must be str, not int

    The same test works with python2, and I would expect the code to work with python 3 too.

    Thanks,
    Jocelyn

    @JocelynJ JocelynJ mannequin added topic-XML type-bug An unexpected behavior, bug, or error labels Nov 21, 2014
    @bitdancer
    Copy link
    Member

    My guess is that this is similar to bpo-21044.

    @bitdancer
    Copy link
    Member

    I've looked at the sax code, and this does indeed have the same root cause: in python2 a dummy string was used for the 'name' attribute of io objects that are based on file descriptors, whereas in python3 the name is the integer value of the file descriptor. In test_sax we can see getSystemId being tested that it returns a filename (see test_expat_locator_withinfo).

    The fix should be analogous to the bpo-21044 fix: check that 'name' is a string and if not don't use it. I'm marking this as easy; hopefully someone will want to tackle figuring out exactly where in the code it needs to be fixed and adding tests for it.

    @bitdancer bitdancer added the easy label Nov 22, 2014
    @JocelynJ
    Copy link
    Mannequin Author

    JocelynJ mannequin commented Nov 22, 2014

    Here is a patch to add a test to test_sax.py.

    I'm not sure on the fix. Is the following one a correct one ?

         def prepareParser(self, source):
             if source.getSystemId() is not None:
    -            self._parser.SetBase(source.getSystemId())
    +            self._parser.SetBase(str(source.getSystemId()))

    Or should I remove the call to SetBase if getSystemId() is not a string ?

    @JocelynJ
    Copy link
    Mannequin Author

    JocelynJ mannequin commented Nov 22, 2014

    Forgot to attach the testcase when opening the bug.

    @bitdancer
    Copy link
    Member

    Has anyone investigated what exactly sax uses SystemId/SetBase *for*? I think think we need that info in order to decide what to do, and I'm not familiar with sax.

    @serhiy-storchaka
    Copy link
    Member

    This bug should be fixed in other place. Here is a patch.

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 23, 2014
    @bitdancer
    Copy link
    Member

    Serhiy's patch looks correct to me. Given that if the source doesn't have a name attribute it is simply not set in the existing code, this change should be "safe" (backward compatible).

    Elsewhere the possibility was raised of converting the int to a string (<fdopen: N>), but that issue is a more global one and would apply at the io module level...and if implemented this fix would automatically take advantage of it.

    So I think this should be committed.

    @JocelynJ
    Copy link
    Mannequin Author

    JocelynJ mannequin commented Nov 23, 2014

    The only explicit documentation I found on SystemId is from the java specification (it is my understanding that python sax implementation is adapted from Java one):
    http://www.saxproject.org/apidoc/org/xml/sax/InputSource.html#setSystemId%28java.lang.String%29

    The documentation says that "The system identifier is optional if there is a byte stream or a character stream".

    So, I agree that Serhiy's patch looks correct.

    Note that I'm not sure that my testcase with a subprocess is covered by Serhiy's tests, as these tests call parser() with a file object.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 27, 2014

    New changeset 27ae1a476ef7 by Serhiy Storchaka in branch '3.4':
    Issue bpo-22915: SAX parser now supports files opened with file descriptor or
    https://hg.python.org/cpython/rev/27ae1a476ef7

    New changeset ce9881eecfb4 by Serhiy Storchaka in branch 'default':
    Issue bpo-22915: SAX parser now supports files opened with file descriptor or
    https://hg.python.org/cpython/rev/ce9881eecfb4

    @serhiy-storchaka
    Copy link
    Member

    Jocelyn, in both cases the argument of parse() is a file object with integer name. Tests use one of simplest way to create such object.

    @JocelynJ
    Copy link
    Mannequin Author

    JocelynJ mannequin commented Nov 27, 2014

    Ok, I understand your tests now.

    Thanks for the fixes !

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants