Message210075
The core of the patch is a wrapper that traps UnicodeDecodeErrors, corrects the strings, and re-decodes. A Python version might look like
def unicodeFromTclStringAndSize(s, size):
try:
return <PyUnicode_DecodeUTF8(s, size, NULL)>
except UnicodeDecodeError:
if b'\xc0\x80' in s:
s.replace(b'\xc0\x80', b'\x00')
return <PyUnicode_DecodeUTF8(s, size, NULL)>
else:
raise
This is used in a couple of additional wrappers and all direct decode calls are replaced with wrappers. New tests are added. Overall, a great idea, and I want to see this patch in 3.4. But, how many of the replacement sites are exercised by the tests?
There are a few changes that seem unrelated to nulls, which might have been left for another patch. Example:
-#if TCL_UTF_MAX==3
return PyUnicode_FromKindAndData(
- PyUnicode_2BYTE_KIND, Tcl_GetUnicode(value),
+ sizeof(Tcl_UniChar), Tcl_GetUnicode(value),
Tcl_GetCharLength(value));
-#else
- return PyUnicode_FromKindAndData(
- PyUnicode_4BYTE_KIND, Tcl_GetUnicode(value),
- Tcl_GetCharLength(value));
-#endif
Do you know if this code block is tested. |
|
Date |
User |
Action |
Args |
2014-02-03 02:21:07 | terry.reedy | set | recipients:
+ terry.reedy, loewis, kbk, gpolo, roger.serwy, serhiy.storchaka |
2014-02-03 02:21:06 | terry.reedy | set | messageid: <1391394066.96.0.578116780492.issue20368@psf.upfronthosting.co.za> |
2014-02-03 02:21:06 | terry.reedy | link | issue20368 messages |
2014-02-03 02:21:06 | terry.reedy | create | |
|