Issue1290
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2007-10-18 01:58 by sharmila, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
testdata.txt | sharmila, 2007-10-18 01:58 |
Messages (8) | |||
---|---|---|---|
msg56511 - (view) | Author: Sharmila Sivakumar (sharmila) | Date: 2007-10-18 01:58 | |
I try to load the data in the testdata.txt file into a dom. I tried import xml.dom.minidom as dom data = open('testdata.txt','r').read() mydom = dom.parseString(data) I get the following error >>> mydom.firstChild.childNodes Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in position 18: ordinal not in range(128) So I tried decoding the data and using it but it failed again. >>> mydom2 = dom.parseString(data.decode('utf-8')) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.5/site-packages/_xmlplus/dom/minidom.py", line 1925, in parseString return expatbuilder.parseString(string) File "/usr/lib/python2.5/site-packages/_xmlplus/dom/expatbuilder.py", line 942, in parseString return builder.parseString(string) File "/usr/lib/python2.5/site-packages/_xmlplus/dom/expatbuilder.py", line 223, in parseString parser.Parse(string, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\u014d' in position 173: ordinal not in range(128) I am willing to fix this myself if I'm given the permission. |
|||
msg56514 - (view) | Author: Facundo Batista (facundobatista) * | Date: 2007-10-18 03:36 | |
Downloaded the testdata.txt file, and yes, it's UTF-8: facundo@pomcat:~/devel$ file testdata.txt testdata.txt: UTF-8 Unicode text But I opened it perfectly! Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import xml.dom.minidom as dom >>> data = open('testdata.txt','r').read() >>> mydom = dom.parseString(data) >>> mydom <xml.dom.minidom.Document instance at 0xb7c03b0c> >>> In which platform you're working? And yes, you have absolute permission to fix it, patchs are always welcomed! |
|||
msg56518 - (view) | Author: Sharmila Sivakumar (sharmila) | Date: 2007-10-18 04:41 | |
Thanks for your quick response Facundo. I'm working on Ubuntu 7.04, python 2.5.1 Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 This error occurs when the default encoding is 'ascii'. When I change the default encoding to 'utf-8' it works for me too. Is, by any chance, your default encoding 'utf-8'? On 10/18/07, Facundo Batista <report@bugs.python.org> wrote: > > > Facundo Batista added the comment: > > Downloaded the testdata.txt file, and yes, it's UTF-8: > > facundo@pomcat:~/devel$ file testdata.txt > testdata.txt: UTF-8 Unicode text > > But I opened it perfectly! > > Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) > [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import xml.dom.minidom as dom > >>> data = open('testdata.txt','r').read() > >>> mydom = dom.parseString(data) > >>> mydom > <xml.dom.minidom.Document instance at 0xb7c03b0c> > >>> > > In which platform you're working? > > And yes, you have absolute permission to fix it, patchs are always > welcomed! > > ---------- > nosy: +facundobatista > resolution: -> works for me > status: open -> closed > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue1290> > __________________________________ > |
|||
msg56519 - (view) | Author: Sharmila Sivakumar (sharmila) | Date: 2007-10-18 04:45 | |
Oops Facundo, that will work. It actually fails * after the dom construction* when you do mydom.firstChild.childNodes I request you to try it again. The prob is there is some encoding and decoding done within the parser, and it uses the default encoding 'ascii'. This fails for utf-8 data. On 10/18/07, Sharmila Sivakumar <report@bugs.python.org> wrote: > > > Sharmila Sivakumar added the comment: > > Thanks for your quick response Facundo. > > I'm working on Ubuntu 7.04, python 2.5.1 > Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) > [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 > > This error occurs when the default encoding is 'ascii'. When I change the > default encoding to 'utf-8' it works for me too. Is, by any chance, your > default encoding 'utf-8'? > > On 10/18/07, Facundo Batista <report@bugs.python.org> wrote: > > > > > > Facundo Batista added the comment: > > > > Downloaded the testdata.txt file, and yes, it's UTF-8: > > > > facundo@pomcat:~/devel$ file testdata.txt > > testdata.txt: UTF-8 Unicode text > > > > But I opened it perfectly! > > > > Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) > > [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import xml.dom.minidom as dom > > >>> data = open('testdata.txt','r').read() > > >>> mydom = dom.parseString(data) > > >>> mydom > > <xml.dom.minidom.Document instance at 0xb7c03b0c> > > >>> > > > > In which platform you're working? > > > > And yes, you have absolute permission to fix it, patchs are always > > welcomed! > > > > ---------- > > nosy: +facundobatista > > resolution: -> works for me > > status: open -> closed > > > > __________________________________ > > Tracker <report@bugs.python.org> > > <http://bugs.python.org/issue1290> > > __________________________________ > > > > Added file: http://bugs.python.org/file8559/unnamed > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue1290> > __________________________________ > |
|||
msg56542 - (view) | Author: Raghuram Devarakonda (draghuram) | Date: 2007-10-18 20:43 | |
When I run the code in a script, I don't get the error. *************** marvin:cpython$ python Python 2.5 (r25:51908, Jan 24 2007, 12:48:15) [GCC 4.1.0 (SUSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import xml.dom.minidom as dom >>> data = open('testdata.txt','r').read() >>> mydom = dom.parseString(data) >>> mydom.firstChild.childNodes Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in position 18: ordinal not in range(128) >>> import sys >>> sys.getdefaultencoding() 'ascii' marvin:cpython$ python dom.py marvin:cpython$ *************** Can you try and see if you can run it from the script too? |
|||
msg56543 - (view) | Author: Raghuram Devarakonda (draghuram) | Date: 2007-10-18 20:44 | |
I forgot to show dom.py source. marvin:cpython$ cat dom.py import xml.dom.minidom as dom data = open('testdata.txt','r').read() mydom = dom.parseString(data) mydom.firstChild.childNodes |
|||
msg56556 - (view) | Author: Raghuram Devarakonda (draghuram) | Date: 2007-10-19 14:56 | |
The fact that the problem occurs only from the command line and not when run from a script indicates that the real issue is in trying to print the object. Sure enough, if you modify the script to do repr(mydom.firstChild.childNodes), it gets the same problem. So the issue may have some thing to do with how the object is constructed in repr(). I don't have time right now to dig deeper but the parser itself may not have any encoding/decoding issues (apart of ability to print these high level objects). |
|||
msg56719 - (view) | Author: Facundo Batista (facundobatista) * | Date: 2007-10-24 19:12 | |
CharacterData.__repr__ was constructing a string in response that keeped having a non-ascii character. Fixed in rev 58641. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:27 | admin | set | github: 45631 |
2007-10-24 19:12:12 | facundobatista | set | resolution: works for me -> fixed messages: + msg56719 |
2007-10-24 16:28:51 | facundobatista | set | files: - unnamed |
2007-10-24 16:28:45 | facundobatista | set | files: - unnamed |
2007-10-19 14:56:11 | draghuram | set | messages: + msg56556 |
2007-10-18 20:44:24 | draghuram | set | messages: + msg56543 |
2007-10-18 20:43:20 | draghuram | set | nosy:
+ draghuram messages: + msg56542 |
2007-10-18 04:45:14 | sharmila | set | files:
+ unnamed messages: + msg56519 |
2007-10-18 04:41:15 | sharmila | set | files:
+ unnamed messages: + msg56518 |
2007-10-18 03:36:31 | facundobatista | set | status: open -> closed resolution: works for me messages: + msg56514 nosy: + facundobatista |
2007-10-18 01:58:15 | sharmila | create |