This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: xml.parsers.expat make a dictionary which keys are broken if buffer_text is False.
Type: behavior Stage:
Components: XML Versions: Python 2.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ggenellina, kawai, tksmashiw
Priority: normal Keywords:

Created on 2009-01-23 03:52 by tksmashiw, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (10)
msg80398 - (view) Author: Takeshi Matsuyama (tksmashiw) Date: 2009-01-23 03:52
When I make a dictionary by parsing "legacy-icon-mapping.xml"(which is a
part of
icon-naming-utils[http://tango.freedesktop.org/Tango_Icon_Library]) with
the following script, the three keys of the dictionary are collapsed if
the "buffer_text" attribute is False.

=====================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import with_statement
import sys
from xml.parsers.expat import ParserCreate
import codecs

class Database:
  """Make a dictionary which is accessible by Databese.dict"""
  def __init__(self, buffer_text):
    self.cnt = None
    self.name = None
    self.data = None
    self.dict = {}
    p = ParserCreate()
    p.buffer_text = buffer_text

    p.StartElementHandler = self.start_element
    p.EndElementHandler = self.end_element
    p.CharacterDataHandler = self.char_data

    with open("/usr/share/icon-naming-utils/legacy-icon-mapping.xml",
'r') as f:
      p.ParseFile(f)

  def start_element(self, name, attrs):
    if name == 'context':
      self.cnt = attrs["dir"]
    if name == 'icon':
      self.name = attrs["name"]
  
  def end_element(self, name):
    if name == 'link':
      self.dict[self.data] = (self.cnt, self.name)

  def char_data(self, data):
    self.data = data.strip()

def print_set(aset):
  for e in aset:
    print '\t' + e

if __name__ == '__main__':
  sys.stdout = codecs.getwriter('utf_8')(sys.stdout)
  map_false_dict = Database(False).dict
  map_true_dict = Database(True).dict
  print "The keys which exist if buffer_text=False but don't exist if
buffer_text=True are"
  print_set(set(map_false_dict.keys()) - set(map_true_dict.keys()))
  print "The keys which exist if buffer_text=True but don't exist if
buffer_text=False are"
  print_set(set(map_true_dict.keys()) - set(map_false_dict.keys()))
=====================

The result of running this script is
======================
The keys which exist if buffer_text=False but don't exist if
buffer_text=True are
	rt-descending
	ock_text_right
	lc
The keys which exist if buffer_text=True but don't exist if
buffer_text=False are
	stock_text_right
	gnome-mime-application-vnd.stardivision.calc
	gtk-sort-descending
======================
I confirmed it in Python-2.5.2 on Fedora 10.
msg80432 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-01-24 01:31
If the xml file is small enough, could you attach it to the issue? Or 
provide a download location? I could not find it myself (without 
downloading the whole package)

(Note that Python 2.5 only gets security fixes now, so unless this 
still fails with 2.6 or later, this issue is likely to be closed)
msg80435 - (view) Author: Takeshi Matsuyama (tksmashiw) Date: 2009-01-24 04:10
Thanks for reply!

>If the xml file is small enough, could you attach it to the issue? Or 
>provide a download location?
Sorry, I found here.
http://webcvs.freedesktop.org/icon-theme/icon-naming-utils/legacy-icon-mapping.xml?revision=1.75&content-type=text%2Fplain&pathrev=1.75

>(Note that Python 2.5 only gets security fixes now, so unless this 
>still fails with 2.6 or later, this issue is likely to be closed)
I roughly confirmed the same problem on python-3.0 on MS Windows 2 weeks
ago, but need to verify more strictly...
msg80438 - (view) Author: HiroakiKawai (kawai) Date: 2009-01-24 08:48
The sample code has bug. expat is OK.

Method char_data must append the incoming characters because the 
character sequence is an buffered input.
  def char_data(self, data):
    self.data += data

You should reset it by self.data = '' at end_element().
msg80449 - (view) Author: Takeshi Matsuyama (tksmashiw) Date: 2009-01-24 14:10
Hi kawai.
I got correct output by modifying the code like you say, but I still
cannot understand why this happens.
Could you tell me more briefly, or point any documents about it?
I can't find any notes which say don't pass strings but append it for
CharacterDataHandler in official documents.
Does everyone know/understand it already? Only I am so stupid? (;;)
msg80451 - (view) Author: HiroakiKawai (kawai) Date: 2009-01-24 14:25
That's the spec of XML SAX interface.
msg80453 - (view) Author: HiroakiKawai (kawai) Date: 2009-01-24 14:54
Please read "The ContentHandler.characters() callback is missing data!" 
http://www.saxproject.org/faq.html

and close this issue :)
msg80454 - (view) Author: Takeshi Matsuyama (tksmashiw) Date: 2009-01-24 15:21
a mistake of my former message, briefly -> in detail

>Please read "The ContentHandler.characters() callback is missing data!" 
>http://www.saxproject.org/faq.html
I was just reading above site. it is now very clear for me.
Thanks kawai and I'm sorry to take up your time, gagenellina.
msg80638 - (view) Author: Takeshi Matsuyama (tksmashiw) Date: 2009-01-27 09:57
From msg80438
>You should reset it by self.data = '' at end_element().

It seems that we should reset it at start_element() like this,
============================
def start_element(self, name, attrs):
  ...abbr...
  if name == 'link':
    self.data = ''
=============================
or unwanted \s, \t, and \n mix in "self.data".
That's all, thanks.
msg80851 - (view) Author: Takeshi Matsuyama (tksmashiw) Date: 2009-01-31 02:56
Could someone close this?
History
Date User Action Args
2022-04-11 14:56:44adminsetgithub: 49286
2009-01-31 03:16:24benjamin.petersonsetstatus: open -> closed
resolution: not a bug
2009-01-31 02:56:03tksmashiwsetmessages: + msg80851
2009-01-27 09:57:58tksmashiwsetmessages: + msg80638
2009-01-24 15:21:18tksmashiwsetmessages: + msg80454
2009-01-24 14:54:30kawaisetmessages: + msg80453
2009-01-24 14:25:26kawaisetmessages: + msg80451
2009-01-24 14:10:16tksmashiwsetmessages: + msg80449
2009-01-24 08:48:18kawaisetnosy: + kawai
messages: + msg80438
2009-01-24 04:10:26tksmashiwsetmessages: + msg80435
2009-01-24 01:31:03ggenellinasetnosy: + ggenellina
messages: + msg80432
2009-01-23 03:52:20tksmashiwcreate