Title: PyLong_FromString documentation should state that the string must be null-terminated
Type: enhancement Stage: patch review
Components: Documentation Versions: Python 3.11
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Cryvate, Rosuav, cryvate, docs@python, josh.r, rfk
Priority: normal Keywords: easy, patch

Created on 2012-06-04 00:32 by rfk, last changed 2021-06-18 04:13 by josh.r.

File name Uploaded Description Edit
PyLong_FromString-doc.patch rfk, 2012-06-04 00:32 review
Pull Requests
URL Status Linked Edit
PR 26774 open Cryvate, 2021-06-17 18:51
Messages (3)
msg162242 - (view) Author: Ryan Kelly (rfk) Date: 2012-06-04 00:32
PyLong_FromString will raise a ValueError if the given string doesn't contain a null byte after the digits.  For example, this will result in a ValueError

   char *pend;
   PyLong_FromString("1234 extra", &pend, 10)

While this will successfully read the number and set the pointer to the extra data:

   char *pend;
   PyLong_FromString("1234\0extra", &pend, 10)

The requirement for a null-terminated string of digits is not clear from the docs.  Suggested re-wording attached.
msg212559 - (view) Author: Chris Angelico (Rosuav) * Date: 2014-03-02 15:23
Patch doesn't apply to current Python trunk (18 months later). Do you know what version you wrote this against? The current wording is different.
msg396032 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2021-06-18 04:13
The description is nonsensical as is; not sure the patch goes far enough. C-style strings are *defined* to end at the NUL terminator; if it really needs a NUL after the int, saying it "points to the first character which follows the representation of the number" is highly misleading; the NUL isn't logically a character in the C-string way of looking at things.

The patch is also wrong; the digits need not end in a NUL byte (trailing whitespace is allowed).

AFAICT, the function really uses pend for two purposes:

1. If it succeeds in parsing, then pend reports the end of the string, nothing else
2. If it fails, because the string is not a legal input (contains non-numeric, or non-leading/terminal whitespace or whatever), pend tells you where the first violation character that couldn't be massaged to meet the rules for int() occurred.

#1 is a mostly useless bit of info (strlen would be equally informative, and if the value parsed, you rarely care how long it was anyway), so pend is, practically speaking, solely for error-checking/reporting.

The rewrite should basically say what is allowed (making it clear anything beyond the single parsable integer value with optional leading/trailing whitespace is illegal), and making it clear that pend always points to the end of the string on success (not just after the representation of the number, it's after the trailing whitespace too), and on failure indicates where parsing failed.
Date User Action Args
2021-06-18 04:13:42josh.rsetnosy: + josh.r
messages: + msg396032
2021-06-17 18:51:27Cryvatesetnosy: + Cryvate

pull_requests: + pull_request25360
stage: patch review
2021-06-17 13:37:54cryvatesetnosy: + cryvate
2021-06-17 13:05:26iritkatrielsetkeywords: + easy
type: enhancement
versions: + Python 3.11
2014-03-02 15:23:13Rosuavsetnosy: + Rosuav
messages: + msg212559
2012-06-04 00:32:53rfkcreate