This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients arigo, ezio.melotti, vstinner
Date 2016-05-03.10:01:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1462269701.85.0.753085346225.issue26917@psf.upfronthosting.co.za>
In-reply-to
Content
Extract of nfc_nfkc():

      /* Hangul Composition. We don't need to check for <LV,T>
         pairs, since we always have decomposed data. */
      code = PyUnicode_READ(kind, data, i);
      if (LBase <= code && code < (LBase+LCount) &&
          i + 1 < len &&
          VBase <= PyUnicode_READ(kind, data, i+1) &&
          PyUnicode_READ(kind, data, i+1) <= (VBase+VCount)) {
          int LIndex, VIndex;
          LIndex = code - LBase;
          VIndex = PyUnicode_READ(kind, data, i+1) - VBase;
          code = SBase + (LIndex*VCount+VIndex)*TCount;
          i+=2;
          if (i < len &&
              TBase <= PyUnicode_READ(kind, data, i) &&
              PyUnicode_READ(kind, data, i) <= (TBase+TCount)) {
              code += PyUnicode_READ(kind, data, i)-TBase;
              i++;
          }
          output[o++] = code;
          continue;
      }

With the input string (1101 116e, 11a7), we get:

* LIndex = 1
* VIndex = 13


code = SBase + (LIndex*VCount+VIndex)*TCount + (ch3 - TBase)
= 0xAC00 + (1 * 21 + 13) * 28 + 0
= 0xafb8

Constants:

* LBase = 0x1100, LCount = 19
* VBase = 0x1161, VCount = 21
* TBase = 0x11A7, TCount = 28
* SBase = 0xAC00

The problem is maybe than we used the 3rd character whereas (ch3 - TBase) is equal to 0.
History
Date User Action Args
2016-05-03 10:01:41vstinnersetrecipients: + vstinner, arigo, ezio.melotti
2016-05-03 10:01:41vstinnersetmessageid: <1462269701.85.0.753085346225.issue26917@psf.upfronthosting.co.za>
2016-05-03 10:01:41vstinnerlinkissue26917 messages
2016-05-03 10:01:41vstinnercreate