Message 258092 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	gvanrossum, serhiy.storchaka, vstinner
Date	2016-01-12.09:54:14
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1452592454.92.0.0368970992087.issue26090@psf.upfronthosting.co.za>
In-reply-to

Content
The C code often uses %.<number><format> in PyUnicode_FromFormat(). %.200s protects from unlimited output when broken pointer points on random non-null-terminated data. %.200R is used to limit the size of human-readable messages. In all these case formatted string can look well-formed with short data, but mis-formed (not closed quote, truncated backslash escaping or � decoded from truncated UTF-8 sequence) with long data. I propose to make truncating in PyUnicode_FromFormat() more smart. 1. Truncated %R should keep at least one end character (the quote or ">"). 2. Truncated output should include "..." or "[...]" as truncating sign. 3. \c, \OOO, \xXX, \uXXXX, and \UXXXXXXXX should not be truncated. It is better to omit these sequences at all (cut the string before them) that output them truncated. 4. Doesn't truncate UTF-8 sequence inside a character for %s.

The C code often uses %.<number><format> in PyUnicode_FromFormat(). %.200s protects from unlimited output when broken pointer points on random non-null-terminated data. %.200R is used to limit the size of human-readable messages.

In all these case formatted string can look well-formed with short data, but mis-formed (not closed quote, truncated backslash escaping or � decoded from truncated UTF-8 sequence) with long data.

I propose to make truncating in PyUnicode_FromFormat() more smart.

1. Truncated %R should keep at least one end character (the quote or ">").
2. Truncated output should include "..." or "[...]" as truncating sign.
3. \c, \OOO, \xXX, \uXXXX, and \UXXXXXXXX should not be truncated. It is better to omit these sequences at all (cut the string before them) that output them truncated.
4. Doesn't truncate UTF-8 sequence inside a character for %s.

History
Date	User	Action	Args
2016-01-12 09:54:14	serhiy.storchaka	set	recipients: + serhiy.storchaka, gvanrossum, vstinner
2016-01-12 09:54:14	serhiy.storchaka	set	messageid: <1452592454.92.0.0368970992087.issue26090@psf.upfronthosting.co.za>
2016-01-12 09:54:14	serhiy.storchaka	link	issue26090 messages
2016-01-12 09:54:14	serhiy.storchaka	create