Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xmllib unable to parse in UTF8 format #71307

Closed
enricoscame mannequin opened this issue May 25, 2016 · 4 comments
Closed

xmllib unable to parse in UTF8 format #71307

enricoscame mannequin opened this issue May 25, 2016 · 4 comments

Comments

@enricoscame
Copy link
Mannequin

enricoscame mannequin commented May 25, 2016

BPO 27120
Nosy @serhiy-storchaka
Files
  • xmllib.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-05-27.06:03:07.297>
    created_at = <Date 2016-05-25.09:09:33.839>
    labels = ['expert-XML']
    title = 'xmllib unable to parse in UTF8 format'
    updated_at = <Date 2016-05-27.06:03:07.296>
    user = 'https://bugs.python.org/enricoscame'

    bugs.python.org fields:

    activity = <Date 2016-05-27.06:03:07.296>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-05-27.06:03:07.297>
    closer = 'serhiy.storchaka'
    components = ['XML']
    creation = <Date 2016-05-25.09:09:33.839>
    creator = 'enrico.scame'
    dependencies = []
    files = ['42991']
    hgrepos = []
    issue_num = 27120
    keywords = []
    message_count = 4.0
    messages = ['266322', '266339', '266344', '266479']
    nosy_count = 2.0
    nosy_names = ['serhiy.storchaka', 'enrico.scame']
    pr_nums = []
    priority = 'normal'
    resolution = 'wont fix'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue27120'
    versions = ['Python 2.7']

    @enricoscame
    Copy link
    Mannequin Author

    enricoscame mannequin commented May 25, 2016

    The xmllib.XMLParser seems to be unable to parse
    an XML file that contains cyrillic characters.

    File "xmllib.pyc", line 172, in feed
    File "xmllib.pyc", line 268, in goahead
    File "xmllib.pyc", line 798, in syntax_error
    Error: Syntax error at line 8: illegal character in content

    @enricoscame enricoscame mannequin added the topic-XML label May 25, 2016
    @serhiy-storchaka
    Copy link
    Member

    Could you please provide minimal reproducer? Minimal script and minimal data that expose the issue.

    @enricoscame
    Copy link
    Mannequin Author

    enricoscame mannequin commented May 25, 2016

    I have attached xmllib.py. This file is in python23\lib folder.

    The strings in XML file are in cyrillic language.

    My code:
    import xmllib

    class Parser(xmllib.XMLParser):
        # a simple styling engine
    
        def __init__(self):
            xmllib.XMLParser.__init__(self)
            self.cursupervisore = None
            self.curdata        = ''
    
            self.elements = {'Superv':(self.starttag_superv, self.endtag_superv)
    ........
                            }
        def load(self, file):
            while 1:
                s = file.readline()
    
                if not s:
                    break
                self.feed(s)
            self.close()
    
    def read_plant_tree(filexml):
          c = Parser()
          c.load(filexml)

    @serhiy-storchaka
    Copy link
    Member

    See also bpo-222587. Seems this was the reason why the xmllib module was deprecated.

    Use the xml package for parsing XML (xml.etree.ElementTree, xml.dom.minidom, xml.sax, etc).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant