Message93265
> * U+0027 APOSTROPHE
hardcoded (see below)
> * U+00AD SOFT HYPHEN (SHY)
has the "Format (Cf)" property and thus is included automatically
> * U+2019 RIGHT SINGLE QUOTATION MARK
hardcoded (see below)
I hardcoded some characters into Tools/unicode/makeunicodedata.py:
>>> print ' '.join([u':', u'\xb7', u'\u0387', u'\u05f4', u'\u2027',
u'\ufe13', u'\ufe55', u'\uff1a'] + [u"'", u'.', u'\u2018', u'\u2019',
u'\u2024', u'\ufe52', u'\uff07', u'\uff0e'])
: · · ״ ‧ ︓ ﹕ : ' . ‘ ’ ․ ﹒ ' .
Those cannot currently be extracted automatically, as neither
DerivedCoreProperties.txt nor the source file for property
"Word_Break(C) = MidLetter or MidNumLet" are provided in the script.
As I said, the patch is only a second best solution, as the correct
path would be implementing the word breaking algorithm as described in
the newest standard. This patch is just an improvement over the current
situation. |
|
Date |
User |
Action |
Args |
2009-09-29 09:20:13 | christoph | set | recipients:
+ christoph, lemburg, ggenellina, ezio.melotti |
2009-09-29 09:20:13 | christoph | set | messageid: <1254216013.11.0.953096626201.issue6412@psf.upfronthosting.co.za> |
2009-09-29 09:20:11 | christoph | link | issue6412 messages |
2009-09-29 09:20:11 | christoph | create | |
|