classification
Title: Tkinter: handle the null character
Type: behavior Stage: resolved
Components: Extension Modules, Tkinter Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: gpolo, kbk, loewis, python-dev, roger.serwy, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2014-01-23 15:15 by serhiy.storchaka, last changed 2014-02-04 23:31 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
tkinter_null_character.patch serhiy.storchaka, 2014-01-23 15:15 review
tkinter_null_character_2.patch serhiy.storchaka, 2014-02-03 10:08 review
Messages (7)
msg208954 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-23 15:15
Tcl/Tk uses modified UTF-8 encoding to represent strings as C strings (char*). Because C strings are NUL-terminated, the null character represented as illegal UTF-8 sequence \xc0\x80.

Current Tkinter code is not very aware about this. It has special handling the "\xc0\x80" string (i.e. encoded single null character) in one place, but doesn't handle encoded null character contained in larger string. As result Tkinter may truncate strings contained the null character, or return wrong result.

The proposed patch fixes many issues with the null character (converting from Tcl to Python strings). NUL is still forbidden in string arguments of many methods.

Also the patch enhances error handling for variable-related commands.
msg210012 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-02 20:44
If there are no objections I'll commit this patch tomorrow.
msg210075 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-02-03 02:21
The core of the patch is a wrapper that traps UnicodeDecodeErrors, corrects the strings, and re-decodes. A Python version might look like

def unicodeFromTclStringAndSize(s, size):
  try:
    return <PyUnicode_DecodeUTF8(s, size, NULL)>
  except UnicodeDecodeError:
    if b'\xc0\x80' in s:
      s.replace(b'\xc0\x80', b'\x00')
      return <PyUnicode_DecodeUTF8(s, size, NULL)>
    else:
      raise

This is used in a couple of additional wrappers and all direct decode calls are replaced with wrappers. New tests are added. Overall, a great idea, and I want to see this patch in 3.4. But, how many of the replacement sites are exercised by the tests?

There are a few changes that seem unrelated to nulls, which might have been left for another patch. Example:

-#if TCL_UTF_MAX==3
         return PyUnicode_FromKindAndData(
-            PyUnicode_2BYTE_KIND, Tcl_GetUnicode(value),
+            sizeof(Tcl_UniChar), Tcl_GetUnicode(value),
             Tcl_GetCharLength(value));
-#else
-        return PyUnicode_FromKindAndData(
-            PyUnicode_4BYTE_KIND, Tcl_GetUnicode(value),
-            Tcl_GetCharLength(value));
-#endif

Do you know if this code block is tested.
msg210106 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-03 10:08
> But, how many of the replacement sites are exercised by the tests?

I added tests for most the replacement sites and updated tests has even more tests.

split() and splitlist() -- tested. Unfortunately they are tested only for bytes argument because these methods reject unicode string argument with NUL.

Tcl_Obj.string, Tcl_Obj.typename and Tcl_Obj.__str__() -- not tested. There are no explicit tests for these properties and methods. Seems as Tcl_Obj.typename can't be tested for NUL.

eval(), evalfile() -- tested.

Variable's methods -- tested.

exprstring() -- tested. I added tests for exprstring(), exprdouble(), exprlong(), exprboolean() in the patch.

record() -- not tested. There are no explicit tests for record() and I have no ideas how it can be used in Python.

C functions:

FromObj() and Tkapp_CallResult() -- implicitly tested in a lot of tests, in particular in test_passing_values and test_user_command.

PythonCmd() -- tested in test_user_command.


> There are a few changes that seem unrelated to nulls, which might have been left for another patch.

They are just make code more robust. For example Tcl can be compiled with TCL_UTF_MAX=6. In this case Python will work correctly most time but can work incorrectly or crash on specific rare data. With proposed changes it will raise SystemError early. Yes, it is worth separate issue.

> Do you know if this code block is tested.

It is implicitly tested in many tests which tests non-ASCII strings.
msg210118 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-02-03 12:14
With the additional tests, it seems reasonable to apply.
msg210155 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-03 19:39
New changeset a6ba6db9edb4 by Serhiy Storchaka in branch '2.7':
Issue #20368: Add tests for Tkinter methods exprstring(), exprdouble(),
http://hg.python.org/cpython/rev/a6ba6db9edb4

New changeset 825c8db8b1e2 by Serhiy Storchaka in branch '3.3':
Issue #20368: Add tests for Tkinter methods exprstring(), exprdouble(),
http://hg.python.org/cpython/rev/825c8db8b1e2

New changeset 28ec384e7dcc by Serhiy Storchaka in branch 'default':
Issue #20368: Add tests for Tkinter methods exprstring(), exprdouble(),
http://hg.python.org/cpython/rev/28ec384e7dcc

New changeset 65c29c07bb31 by Serhiy Storchaka in branch '2.7':
Issue #20368: The null character now correctly passed from Tcl to Python (in
http://hg.python.org/cpython/rev/65c29c07bb31

New changeset 08e3343f01a5 by Serhiy Storchaka in branch '3.3':
Issue #20368: The null character now correctly passed from Tcl to Python.
http://hg.python.org/cpython/rev/08e3343f01a5

New changeset 321b714653e3 by Serhiy Storchaka in branch 'default':
Issue #20368: The null character now correctly passed from Tcl to Python.
http://hg.python.org/cpython/rev/321b714653e3
msg210278 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-04 23:31
New changeset d83ce3a2d954 by Christian Heimes in branch '3.3':
Issue #20515: Fix NULL pointer dereference introduced by issue #20368
http://hg.python.org/cpython/rev/d83ce3a2d954

New changeset 145032f626d3 by Christian Heimes in branch 'default':
Issue #20515: Fix NULL pointer dereference introduced by issue #20368
http://hg.python.org/cpython/rev/145032f626d3
History
Date User Action Args
2014-02-04 23:31:42python-devsetmessages: + msg210278
2014-02-03 21:49:03serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2014-02-03 19:39:30python-devsetnosy: + python-dev
messages: + msg210155
2014-02-03 12:14:21terry.reedysetmessages: + msg210118
2014-02-03 10:08:28serhiy.storchakasetfiles: + tkinter_null_character_2.patch

messages: + msg210106
2014-02-03 02:21:06terry.reedysetmessages: + msg210075
2014-02-02 20:44:03serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg210012
2014-01-23 15:15:02serhiy.storchakacreate