This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author amaury.forgeotdarc
Recipients amaury.forgeotdarc, gvanrossum
Date 2007-10-29.21:21:28
SpamBayes Score 0.015373402
Marked as misclassified No
Message-id <1193692889.31.0.0582486390862.issue1359@psf.upfronthosting.co.za>
In-reply-to
Content
A correction for the problem found by GvR in change 58692:

> There's one mystery: if I remove ob_sstate from the PyStringObject struct,
> some (unicode) string literals are mutilated, e.g. ('\\1', '\1') prints
> ('\\1', '\t').  This must be an out of bounds write or something that I
> can't track down.  (It doesn't help that it doesn't occur in debug mode.
> And no, make clean + recompilation doesn't help either.)
> 
> So, in the mean time, I just keep the field, renamed to 'ob_placeholder'.

I think I found the problem. It reproduces on Windows, with a slightly
different input
    >>> ('\\2','\1')
    ('\\2', '\n')
(the win32 release build is of the kind "optimized with debug info", so
using the debugger is possible)

The problem is in unicodeobject.c::PyUnicode_DecodeUnicodeEscape:
- the input buffer is not null-terminated
- when decoding octal escape, we increment s without checking if it is
still in the limits.
In my case, the "\1" was followed by a "2" in memory, hence the bogus
chr(0o12) == '\n'.

Also corrected a potential problem when the string ends with a \:
PyUnicode_DecodeUnicodeEscape("\\t", 1) should return an error.
Files
File name Uploaded
unicodeEscape.diff amaury.forgeotdarc, 2007-10-29.21:21:28
History
Date User Action Args
2007-10-29 21:21:29amaury.forgeotdarcsetspambayes_score: 0.0153734 -> 0.015373402
recipients: + amaury.forgeotdarc, gvanrossum
2007-10-29 21:21:29amaury.forgeotdarcsetspambayes_score: 0.0153734 -> 0.0153734
messageid: <1193692889.31.0.0582486390862.issue1359@psf.upfronthosting.co.za>
2007-10-29 21:21:29amaury.forgeotdarclinkissue1359 messages
2007-10-29 21:21:28amaury.forgeotdarccreate