More correct string truncating in PyUnicode_FromFormat() #70278

serhiy-storchaka · 2016-01-12T09:54:15Z

BPO	26090
Nosy	@gvanrossum, @vstinner, @ezio-melotti, @serhiy-storchaka

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2016-01-12.09:54:14.845>
labels = ['interpreter-core', 'type-feature']
title = 'More correct string truncating in PyUnicode_FromFormat()'
updated_at = <Date 2016-06-21.07:22:59.885>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2016-06-21.07:22:59.885>
actor = 'Drekin'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2016-01-12.09:54:14.845>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 26090
keywords = []
message_count = 5.0
messages = ['258092', '258095', '258108', '258111', '258118']
nosy_count = 5.0
nosy_names = ['gvanrossum', 'vstinner', 'ezio.melotti', 'serhiy.storchaka', 'Drekin']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue26090'
versions = ['Python 3.6']

serhiy-storchaka · 2016-01-12T09:54:15Z

The C code often uses %.<number><format> in PyUnicode_FromFormat(). %.200s protects from unlimited output when broken pointer points on random non-null-terminated data. %.200R is used to limit the size of human-readable messages.

In all these case formatted string can look well-formed with short data, but mis-formed (not closed quote, truncated backslash escaping or � decoded from truncated UTF-8 sequence) with long data.

I propose to make truncating in PyUnicode_FromFormat() more smart.

Truncated %R should keep at least one end character (the quote or ">").
Truncated output should include "..." or "[...]" as truncating sign.
\c, \OOO, \xXX, \uXXXX, and \UXXXXXXXX should not be truncated. It is better to omit these sequences at all (cut the string before them) that output them truncated.
Doesn't truncate UTF-8 sequence inside a character for %s.

vstinner · 2016-01-12T10:01:47Z

See my old issue bpo-10833 which proposed to *remove* the arbitrary limit
on strings. It was rejected.

gvanrossum · 2016-01-12T16:29:09Z

Could we make this feature available at the Python level too? It sounds
really useful.

--Guido (mobile)
On Jan 12, 2016 2:01 AM, "STINNER Victor" <report@bugs.python.org> wrote:

STINNER Victor added the comment:

See my old issue bpo-10833 which proposed to *remove* the arbitrary limit
on strings. It was rejected.

----------

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue26090\>

serhiy-storchaka · 2016-01-12T17:49:09Z

I think we can make this feature available with classic formatting '%.100r', but with new formatting '{0:.100!r}' (especially with f-strings) this can be not so easy.

gvanrossum · 2016-01-12T19:08:39Z

Well it seems a little odd to spend effort on a corner case of the C-level
error messages if we can't even replicate it in pure Python.

serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jan 12, 2016

ezio-melotti transferred this issue from another repository Apr 10, 2022

gvanrossum mentioned this issue Jun 23, 2022

re.compile() repr end quote truncated #70256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More correct string truncating in PyUnicode_FromFormat() #70278

More correct string truncating in PyUnicode_FromFormat() #70278

serhiy-storchaka commented Jan 12, 2016

serhiy-storchaka commented Jan 12, 2016

vstinner commented Jan 12, 2016

gvanrossum commented Jan 12, 2016

serhiy-storchaka commented Jan 12, 2016

gvanrossum commented Jan 12, 2016

More correct string truncating in PyUnicode_FromFormat() #70278

More correct string truncating in PyUnicode_FromFormat() #70278

Comments

serhiy-storchaka commented Jan 12, 2016

serhiy-storchaka commented Jan 12, 2016

vstinner commented Jan 12, 2016

gvanrossum commented Jan 12, 2016

serhiy-storchaka commented Jan 12, 2016

gvanrossum commented Jan 12, 2016