This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author terry.reedy
Recipients gpolo, kbk, loewis, roger.serwy, serhiy.storchaka, terry.reedy
Date 2014-02-03.02:21:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1391394066.96.0.578116780492.issue20368@psf.upfronthosting.co.za>
In-reply-to
Content
The core of the patch is a wrapper that traps UnicodeDecodeErrors, corrects the strings, and re-decodes. A Python version might look like

def unicodeFromTclStringAndSize(s, size):
  try:
    return <PyUnicode_DecodeUTF8(s, size, NULL)>
  except UnicodeDecodeError:
    if b'\xc0\x80' in s:
      s.replace(b'\xc0\x80', b'\x00')
      return <PyUnicode_DecodeUTF8(s, size, NULL)>
    else:
      raise

This is used in a couple of additional wrappers and all direct decode calls are replaced with wrappers. New tests are added. Overall, a great idea, and I want to see this patch in 3.4. But, how many of the replacement sites are exercised by the tests?

There are a few changes that seem unrelated to nulls, which might have been left for another patch. Example:

-#if TCL_UTF_MAX==3
         return PyUnicode_FromKindAndData(
-            PyUnicode_2BYTE_KIND, Tcl_GetUnicode(value),
+            sizeof(Tcl_UniChar), Tcl_GetUnicode(value),
             Tcl_GetCharLength(value));
-#else
-        return PyUnicode_FromKindAndData(
-            PyUnicode_4BYTE_KIND, Tcl_GetUnicode(value),
-            Tcl_GetCharLength(value));
-#endif

Do you know if this code block is tested.
History
Date User Action Args
2014-02-03 02:21:07terry.reedysetrecipients: + terry.reedy, loewis, kbk, gpolo, roger.serwy, serhiy.storchaka
2014-02-03 02:21:06terry.reedysetmessageid: <1391394066.96.0.578116780492.issue20368@psf.upfronthosting.co.za>
2014-02-03 02:21:06terry.reedylinkissue20368 messages
2014-02-03 02:21:06terry.reedycreate