Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More correct string truncating in PyUnicode_FromFormat() #70278

Open
serhiy-storchaka opened this issue Jan 12, 2016 · 5 comments
Open

More correct string truncating in PyUnicode_FromFormat() #70278

serhiy-storchaka opened this issue Jan 12, 2016 · 5 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 26090
Nosy @gvanrossum, @vstinner, @ezio-melotti, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2016-01-12.09:54:14.845>
labels = ['interpreter-core', 'type-feature']
title = 'More correct string truncating in PyUnicode_FromFormat()'
updated_at = <Date 2016-06-21.07:22:59.885>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2016-06-21.07:22:59.885>
actor = 'Drekin'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2016-01-12.09:54:14.845>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 26090
keywords = []
message_count = 5.0
messages = ['258092', '258095', '258108', '258111', '258118']
nosy_count = 5.0
nosy_names = ['gvanrossum', 'vstinner', 'ezio.melotti', 'serhiy.storchaka', 'Drekin']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue26090'
versions = ['Python 3.6']

@serhiy-storchaka
Copy link
Member Author

The C code often uses %.<number><format> in PyUnicode_FromFormat(). %.200s protects from unlimited output when broken pointer points on random non-null-terminated data. %.200R is used to limit the size of human-readable messages.

In all these case formatted string can look well-formed with short data, but mis-formed (not closed quote, truncated backslash escaping or � decoded from truncated UTF-8 sequence) with long data.

I propose to make truncating in PyUnicode_FromFormat() more smart.

  1. Truncated %R should keep at least one end character (the quote or ">").
  2. Truncated output should include "..." or "[...]" as truncating sign.
  3. \c, \OOO, \xXX, \uXXXX, and \UXXXXXXXX should not be truncated. It is better to omit these sequences at all (cut the string before them) that output them truncated.
  4. Doesn't truncate UTF-8 sequence inside a character for %s.

@serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jan 12, 2016
@vstinner
Copy link
Member

See my old issue bpo-10833 which proposed to *remove* the arbitrary limit
on strings. It was rejected.

@gvanrossum
Copy link
Member

Could we make this feature available at the Python level too? It sounds
really useful.

--Guido (mobile)
On Jan 12, 2016 2:01 AM, "STINNER Victor" <report@bugs.python.org> wrote:

STINNER Victor added the comment:

See my old issue bpo-10833 which proposed to *remove* the arbitrary limit
on strings. It was rejected.

----------


Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue26090\>


@serhiy-storchaka
Copy link
Member Author

I think we can make this feature available with classic formatting '%.100r', but with new formatting '{0:.100!r}' (especially with f-strings) this can be not so easy.

@gvanrossum
Copy link
Member

Well it seems a little odd to spend effort on a corner case of the C-level
error messages if we can't even replicate it in pure Python.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants