This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: xmllib unable to parse in UTF8 format
Type: Stage: resolved
Components: XML Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: enrico.scame, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-05-25 09:09 by enrico.scame, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
xmllib.py enrico.scame, 2016-05-25 13:14
Messages (4)
msg266322 - (view) Author: Enrico (enrico.scame) Date: 2016-05-25 09:09
The xmllib.XMLParser seems to be unable to parse 
an XML file that contains cyrillic characters.


   File "xmllib.pyc", line 172, in feed
   File "xmllib.pyc", line 268, in goahead
   File "xmllib.pyc", line 798, in syntax_error
 Error: Syntax error at line 8: illegal character in content
msg266339 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-25 12:36
Could you please provide minimal reproducer? Minimal script and minimal data that expose the issue.
msg266344 - (view) Author: Enrico (enrico.scame) Date: 2016-05-25 13:14
I have attached xmllib.py. This file is in python23\lib folder.

The strings in XML file are in cyrillic language.

My code:
import xmllib

class Parser(xmllib.XMLParser):
    # a simple styling engine

    def __init__(self):
        xmllib.XMLParser.__init__(self)
        self.cursupervisore = None
        self.curdata        = ''

        self.elements = {'Superv':(self.starttag_superv, self.endtag_superv)
........
                        }
    def load(self, file):
        while 1:
            s = file.readline()

            if not s:
                break
            self.feed(s)
        self.close()

def read_plant_tree(filexml):
      c = Parser()
      c.load(filexml)
msg266479 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-27 06:02
See also issue222587. Seems this was the reason why the xmllib module was deprecated.

Use the xml package for parsing XML (xml.etree.ElementTree, xml.dom.minidom, xml.sax, etc).
History
Date User Action Args
2022-04-11 14:58:31adminsetgithub: 71307
2016-05-27 06:03:07serhiy.storchakasetstatus: open -> closed
stage: test needed -> resolved
2016-05-27 06:02:48serhiy.storchakasetresolution: wont fix
messages: + msg266479
2016-05-25 13:14:10enrico.scamesetfiles: + xmllib.py

messages: + msg266344
2016-05-25 12:36:20serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg266339
stage: test needed
2016-05-25 09:09:33enrico.scamecreate