Issue1359
Created on 2007-10-29 21:21 by amaury.forgeotdarc, last changed 2007-10-30 00:29 by gvanrossum.
| File name |
Uploaded |
Description |
Edit |
Remove |
|
unicodeEscape.diff
|
amaury.forgeotdarc,
2007-10-29 21:21
|
|
|
|
|
msg56933 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) |
Date: 2007-10-29 21:21 |
|
A correction for the problem found by GvR in change 58692:
> There's one mystery: if I remove ob_sstate from the PyStringObject struct,
> some (unicode) string literals are mutilated, e.g. ('\\1', '\1') prints
> ('\\1', '\t'). This must be an out of bounds write or something that I
> can't track down. (It doesn't help that it doesn't occur in debug mode.
> And no, make clean + recompilation doesn't help either.)
>
> So, in the mean time, I just keep the field, renamed to 'ob_placeholder'.
I think I found the problem. It reproduces on Windows, with a slightly
different input
>>> ('\\2','\1')
('\\2', '\n')
(the win32 release build is of the kind "optimized with debug info", so
using the debugger is possible)
The problem is in unicodeobject.c::PyUnicode_DecodeUnicodeEscape:
- the input buffer is not null-terminated
- when decoding octal escape, we increment s without checking if it is
still in the limits.
In my case, the "\1" was followed by a "2" in memory, hence the bogus
chr(0o12) == '\n'.
Also corrected a potential problem when the string ends with a \:
PyUnicode_DecodeUnicodeEscape("\\t", 1) should return an error.
|
|
msg56935 - (view) |
Author: Guido van Rossum (gvanrossum) |
Date: 2007-10-29 21:45 |
|
Thanks!!
The patch as given doesn't quite work -- it gives an error message on
string literals like '\s'. By reverting some of your code and only
keeping the bounds checks for the octal escape code I got it to work
though. See my checkin.
Committed revision 58707.
This needs to be backported too; it seems a pure coincidence it didn't
trigger before!
|
|
msg56937 - (view) |
Author: Guido van Rossum (gvanrossum) |
Date: 2007-10-29 21:58 |
|
(Hold on, the assert I added triggers in debug mode.)
|
|
msg56938 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) |
Date: 2007-10-29 22:01 |
|
I don't like this assert either: this function is part of the public
API, and should not crash the interpreter just because of a trailing \.
To test easily:
import ctypes
decode = ctypes.pythonapi.PyUnicodeUCS2_DecodeUnicodeEscape
decode.restype = ctypes.py_object
decode(b'\\1', 1, None)
This should gently raise a UnicodeDecodeError IMO.
|
|
msg56939 - (view) |
Author: Guido van Rossum (gvanrossum) |
Date: 2007-10-29 22:08 |
|
Better fix:
Committed revision 58708.
|
|
| Date |
User |
Action |
Args |
| 2007-10-30 00:29:54 | gvanrossum | set | status: open -> closed |
| 2007-10-29 22:08:51 | gvanrossum | set | messages:
+ msg56939 |
| 2007-10-29 22:01:58 | amaury.forgeotdarc | set | messages:
+ msg56938 |
| 2007-10-29 21:58:39 | gvanrossum | set | status: closed -> open messages:
+ msg56937 |
| 2007-10-29 21:45:50 | gvanrossum | set | status: open -> closed assignee: gvanrossum resolution: accepted messages:
+ msg56935 |
| 2007-10-29 21:21:29 | amaury.forgeotdarc | create | |
|