classification
Title: [Py30a3] xml.parsers.expat recognizes encoding="utf-8" but not encoding="utf8"
Type: behavior Stage:
Components: Library (Lib), XML Versions: Python 3.0
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, georg.brandl, mark
Priority: normal Keywords:

Created on 2008-03-12 11:03 by mark, last changed 2008-03-17 07:42 by georg.brandl. This issue is now closed.

Messages (4)
msg63471 - (view) Author: Mark Summerfield (mark) * Date: 2008-03-12 11:03
Here is how to reproduce the bug:

from xml.etree.ElementTree import parse
import io
xml1 = """<?xml version="1.0" encoding="utf8"?>
<test>text</test>"""
xml2 = """<?xml version="1.0" encoding="utf-8"?>
<test>text</test>"""
f1 = io.StringIO(xml1)
f2 = io.StringIO(xml2)
tree2 = parse(f2) # this uses "utf-8" and works fine
tree1 = parse(f1)
Traceback (most recent call last):
  File "<pyshell#20>", line 1, in <module>
    tree1 = parse(f1)
  File
"/home/mark/opt/python30a3/lib/python3.0/xml/etree/ElementTree.py", line
823, in parse
    tree.parse(source, parser)
  File
"/home/mark/opt/python30a3/lib/python3.0/xml/etree/ElementTree.py", line
561, in parse
    parser.feed(data)
  File
"/home/mark/opt/python30a3/lib/python3.0/xml/etree/ElementTree.py", line
1201, in feed
    self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: unknown encoding: line 1, column 30
msg63516 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-03-13 21:08
Should the parser recognize "utf8"? I looked at the XML standard [1] and
it referred me to the IANA's charts [2]. It appears the the only correct
way to denote UTF-8 is "UTF-8".


[1] http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
[2] http://www.iana.org/assignments/character-sets
msg63558 - (view) Author: Mark Summerfield (mark) * Date: 2008-03-15 17:52
You're right that the parser should not recognise "utf8" since it isn't
correct XML (as per the references you gave).

I made the mistake because I used the etree module and wrote an XML file
with encoding "utf8" which etree accepted. I've now switched to using
"UTF-8".
msg63621 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-03-17 07:42
Okay to close this, then?
History
Date User Action Args
2008-03-17 07:42:41georg.brandlsetstatus: open -> closed
resolution: wont fix
messages: + msg63621
nosy: + georg.brandl
2008-03-15 17:52:04marksetmessages: + msg63558
2008-03-13 21:08:32benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg63516
2008-03-13 08:16:17marksettype: behavior
components: + Library (Lib), XML
2008-03-12 11:03:55markcreate