Message56933
A correction for the problem found by GvR in change 58692:
> There's one mystery: if I remove ob_sstate from the PyStringObject struct,
> some (unicode) string literals are mutilated, e.g. ('\\1', '\1') prints
> ('\\1', '\t'). This must be an out of bounds write or something that I
> can't track down. (It doesn't help that it doesn't occur in debug mode.
> And no, make clean + recompilation doesn't help either.)
>
> So, in the mean time, I just keep the field, renamed to 'ob_placeholder'.
I think I found the problem. It reproduces on Windows, with a slightly
different input
>>> ('\\2','\1')
('\\2', '\n')
(the win32 release build is of the kind "optimized with debug info", so
using the debugger is possible)
The problem is in unicodeobject.c::PyUnicode_DecodeUnicodeEscape:
- the input buffer is not null-terminated
- when decoding octal escape, we increment s without checking if it is
still in the limits.
In my case, the "\1" was followed by a "2" in memory, hence the bogus
chr(0o12) == '\n'.
Also corrected a potential problem when the string ends with a \:
PyUnicode_DecodeUnicodeEscape("\\t", 1) should return an error. |
|
| Date |
User |
Action |
Args |
| 2007-10-29 21:21:29 | amaury.forgeotdarc | set | spambayes_score: 0.0153734 -> 0.0153734 recipients:
+ amaury.forgeotdarc, gvanrossum |
| 2007-10-29 21:21:29 | amaury.forgeotdarc | set | spambayes_score: 0.0153734 -> 0.0153734 messageid: <1193692889.31.0.0582486390862.issue1359@psf.upfronthosting.co.za> |
| 2007-10-29 21:21:29 | amaury.forgeotdarc | link | issue1359 messages |
| 2007-10-29 21:21:28 | amaury.forgeotdarc | create | |
|