Issue 1359: py3k: out of bounds read in PyUnicode_DecodeUnicodeEscape

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/45700

classification

Title:	py3k: out of bounds read in PyUnicode_DecodeUnicodeEscape
Type:		Stage:
Components:	Interpreter Core	Versions:	Python 3.0

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	gvanrossum	Nosy List:	amaury.forgeotdarc, gvanrossum
Priority:	normal	Keywords:

Created on 2007-10-29 21:21 by amaury.forgeotdarc, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
unicodeEscape.diff	amaury.forgeotdarc, 2007-10-29 21:21

Messages (5)
msg56933 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2007-10-29 21:21
A correction for the problem found by GvR in change 58692: > There's one mystery: if I remove ob_sstate from the PyStringObject struct, > some (unicode) string literals are mutilated, e.g. ('\\1', '\1') prints > ('\\1', '\t'). This must be an out of bounds write or something that I > can't track down. (It doesn't help that it doesn't occur in debug mode. > And no, make clean + recompilation doesn't help either.) > > So, in the mean time, I just keep the field, renamed to 'ob_placeholder'. I think I found the problem. It reproduces on Windows, with a slightly different input >>> ('\\2','\1') ('\\2', '\n') (the win32 release build is of the kind "optimized with debug info", so using the debugger is possible) The problem is in unicodeobject.c::PyUnicode_DecodeUnicodeEscape: - the input buffer is not null-terminated - when decoding octal escape, we increment s without checking if it is still in the limits. In my case, the "\1" was followed by a "2" in memory, hence the bogus chr(0o12) == '\n'. Also corrected a potential problem when the string ends with a \: PyUnicode_DecodeUnicodeEscape("\\t", 1) should return an error.
msg56935 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-29 21:45
Thanks!! The patch as given doesn't quite work -- it gives an error message on string literals like '\s'. By reverting some of your code and only keeping the bounds checks for the octal escape code I got it to work though. See my checkin. Committed revision 58707. This needs to be backported too; it seems a pure coincidence it didn't trigger before!
msg56937 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-29 21:58
(Hold on, the assert I added triggers in debug mode.)
msg56938 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2007-10-29 22:01
I don't like this assert either: this function is part of the public API, and should not crash the interpreter just because of a trailing \. To test easily: import ctypes decode = ctypes.pythonapi.PyUnicodeUCS2_DecodeUnicodeEscape decode.restype = ctypes.py_object decode(b'\\1', 1, None) This should gently raise a UnicodeDecodeError IMO.
msg56939 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-29 22:08
Better fix: Committed revision 58708.

History
Date	User	Action	Args
2022-04-11 14:56:27	admin	set	github: 45700
2007-10-30 00:29:54	gvanrossum	set	status: open -> closed
2007-10-29 22:08:51	gvanrossum	set	messages: + msg56939
2007-10-29 22:01:58	amaury.forgeotdarc	set	messages: + msg56938
2007-10-29 21:58:39	gvanrossum	set	status: closed -> open messages: + msg56937
2007-10-29 21:45:50	gvanrossum	set	status: open -> closed assignee: gvanrossum resolution: accepted messages: + msg56935
2007-10-29 21:21:29	amaury.forgeotdarc	create