This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients lemburg
Date 2010-11-29.11:10:53
SpamBayes Score 3.7754577e-11
Marked as misclassified No
Message-id <1291029056.3.0.883986671592.issue10575@psf.upfronthosting.co.za>
In-reply-to
Content
The script only patches numeric data into the table (field 8), but does not update the digit field (field 7).

As a result, ideographs used for Chinese digits are not recognized as digits and not evaluated by int(), long() and float():

    http://en.wikipedia.org/wiki/Numbers_in_Chinese_culture

>>> unicode('三', 'utf-8')
u'\u4e09'

>>> int(unicode('三', 'utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'decimal' codec can't encode character u'\u4e09' in position 0: invalid decimal Unicode string
> <stdin>(1)<module>()

>>> import unicodedata
>>> unicodedata.digit(unicode('三', 'utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not a digit

The code point refers to the digit 3.
History
Date User Action Args
2010-11-29 11:10:56lemburgsetrecipients: + lemburg
2010-11-29 11:10:56lemburgsetmessageid: <1291029056.3.0.883986671592.issue10575@psf.upfronthosting.co.za>
2010-11-29 11:10:54lemburglinkissue10575 messages
2010-11-29 11:10:54lemburgcreate