Message264707
Extract of nfc_nfkc():
/* Hangul Composition. We don't need to check for <LV,T>
pairs, since we always have decomposed data. */
code = PyUnicode_READ(kind, data, i);
if (LBase <= code && code < (LBase+LCount) &&
i + 1 < len &&
VBase <= PyUnicode_READ(kind, data, i+1) &&
PyUnicode_READ(kind, data, i+1) <= (VBase+VCount)) {
int LIndex, VIndex;
LIndex = code - LBase;
VIndex = PyUnicode_READ(kind, data, i+1) - VBase;
code = SBase + (LIndex*VCount+VIndex)*TCount;
i+=2;
if (i < len &&
TBase <= PyUnicode_READ(kind, data, i) &&
PyUnicode_READ(kind, data, i) <= (TBase+TCount)) {
code += PyUnicode_READ(kind, data, i)-TBase;
i++;
}
output[o++] = code;
continue;
}
With the input string (1101 116e, 11a7), we get:
* LIndex = 1
* VIndex = 13
code = SBase + (LIndex*VCount+VIndex)*TCount + (ch3 - TBase)
= 0xAC00 + (1 * 21 + 13) * 28 + 0
= 0xafb8
Constants:
* LBase = 0x1100, LCount = 19
* VBase = 0x1161, VCount = 21
* TBase = 0x11A7, TCount = 28
* SBase = 0xAC00
The problem is maybe than we used the 3rd character whereas (ch3 - TBase) is equal to 0. |
|
Date |
User |
Action |
Args |
2016-05-03 10:01:41 | vstinner | set | recipients:
+ vstinner, arigo, ezio.melotti |
2016-05-03 10:01:41 | vstinner | set | messageid: <1462269701.85.0.753085346225.issue26917@psf.upfronthosting.co.za> |
2016-05-03 10:01:41 | vstinner | link | issue26917 messages |
2016-05-03 10:01:41 | vstinner | create | |
|