classification
Title: Parsing XML file with Unicode characters causes problem
Type: behavior Stage:
Components: XML Versions: Python 3.0
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, jaylogan
Priority: normal Keywords:

Created on 2008-08-31 19:43 by jaylogan, last changed 2009-02-21 03:15 by benjamin.peterson. This issue is now closed.

Files
File name Uploaded Description Edit
read_song_xml.py jaylogan, 2008-08-31 19:43 Python program to load an XML file argument
Messages (2)
msg72211 - (view) Author: Joshua Logan (jaylogan) Date: 2008-08-31 19:43
Python 3.0b2 will not parse the XML file located at
http://rubyquiz.com/SongLibrary.xml.gz

It complains of a UnicodeEncodeError 
'charmap' codec can't encode character '\xc8' in position 45: ch
aracter maps to <undefined>

I included a sample program, just in case I was doing something wrong
while coding.

Python 3.0b2 (r30b2:65106, Jul 18 2008, 18:44:17) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
msg82519 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-02-20 07:40
The encoding used by the windows terminal (usually cp850) is not able to
encode all the characters, so when you print the text that you extract
from the xml file the terminal is not able able to display some
characters. If you remove the print() it works fine. You can also try to
write the results on a file using utf-8.

This issue can be closed.
History
Date User Action Args
2009-02-21 03:15:31benjamin.petersonsetstatus: open -> closed
resolution: not a bug
2009-02-20 07:40:17ezio.melottisetnosy: + ezio.melotti
messages: + msg82519
2008-08-31 19:43:38jaylogancreate