This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients doerwalter, loewis
Date 2009-06-23.21:35:59
SpamBayes Score 3.8835601e-13
Marked as misclassified No
Message-id <1245792961.5.0.460725375108.issue6331@psf.upfronthosting.co.za>
In-reply-to
Content
I think the patch is incorrect: the default value for the script
property ought to be Unknown, not Common (despite UCD.html saying the
contrary; see UTR#24 and Scripts.txt).

I'm puzzled why you use a hard-coded list of script names. The set of
scripts will certainly change across Unicode versions, and I think it
would be better to learn the script names from Scripts.txt.

Out of curiosity: how does the addition of the script property affect
the number of distinct database records, and the total size of the database?

I think a common application would be lower-cases script names, for more
efficient comparison; UCD has also changed the spelling of the script
names over time (from being all-capital before). So I propose that
a) two functions are provided: one with the original script names, and
one with the lower-case script names
b) keep cached versions of interned script name strings in separate
arrays, to avoid PyString_FromString every time.

I'm doubtful that script names need to be provided for old database
versions, so I would be happy to not record the script for old versions,
and raise an exception if somebody tries to get the script for an old
database version - surely applications of the old database records won't
be accessing the script property, anyway.
History
Date User Action Args
2009-06-23 21:36:01loewissetrecipients: + loewis, doerwalter
2009-06-23 21:36:01loewissetmessageid: <1245792961.5.0.460725375108.issue6331@psf.upfronthosting.co.za>
2009-06-23 21:36:00loewislinkissue6331 messages
2009-06-23 21:35:59loewiscreate