classification
Title: PyLong_FromString documentation wrong on numbers with leading zero and base=0
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Mariatta, cheryl.sabella, cubinator, docs@python, mark.dickinson, martin.panter, terry.reedy
Priority: normal Keywords:

Created on 2017-03-07 21:27 by cubinator, last changed 2017-04-24 09:33 by cheryl.sabella. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 915 merged cheryl.sabella, 2017-03-30 19:51
PR 1266 merged Mariatta, 2017-04-24 03:56
PR 1267 merged Mariatta, 2017-04-24 03:56
PR 1268 merged Mariatta, 2017-04-24 04:02
Messages (13)
msg289188 - (view) Author: Cubi (cubinator) Date: 2017-03-07 21:27
Calling PyLong_FromString(str, NULL, 0) fails, if str is a string containing a decimal number with leading zeros, even though such strings should be parsed as decimal numbers according to the documentation:

"If base is 0, the radix will be determined based on the leading characters of str: if str starts with '0x' or '0X', radix 16 will be used; if str starts with '0o' or '0O', radix 8 will be used; if str starts with '0b' or '0B', radix 2 will be used; otherwise radix 10 will be used"

Examples:
PyLong_FromString("15", NULL, 0); // Returns int(15) (Correct)
PyLong_FromString("0xF", NULL, 0); // Returns int(15) (Correct)
PyLong_FromString("015", NULL, 0); // Should return int(15), but raises ValueError: invalid literal for int() with base 0: '015'

Version information:
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32
msg289190 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-03-07 21:47
My guess is this is supposed to emulate (or is actually the implementation of) the "int" constructor and the Python syntax. In these cases, numbers with leading zeros are disallowed. This was to help with Python 2 porting, where a leading zero specified an octal number.

>>> 010
    010
      ^
SyntaxError: invalid token
>>> int("010", 0)
ValueError: invalid literal for int() with base 0: '010'

Maybe it is better to fix the documentation.
msg289219 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2017-03-08 08:47
Yes, PyLong_FromString is directly used by the implementation of int, and is also used in parsing of numeric integer literals in source:

https://github.com/python/cpython/blob/cb41b2766de646435743b6af7dd152751b54e73f/Python/ast.c#L4084

So I agree that this is a documentation bug. There's also no mention of the support for underscores in the documentation.
msg290877 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2017-03-30 19:52
I have a pull request ready for the documentation, but I didn't understand the underscore usage, so I couldn't add that.

If you explain it, then I can try to add it.
msg290879 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-03-30 20:29
String arguments to int are quoted int literals.  From
https://docs.python.org/3/reference/lexical_analysis.html#literals
'Underscores are ignored for determining the numeric value of the literal. They can be used to group digits for enhanced readability. One underscore can occur between digits, and after base specifiers like 0x.'

For your patch, I would summarize this by expanding 'Leading spaces are ignored.' to the following (in patch comment also).
"Leading spaces and single underscores after a base specifier and between digits are ignored."
msg290981 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2017-04-01 12:02
Thank you.  I've added that change.

For the backporting, I think that would only be applicable to 3.6 and 3.7?
msg291019 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-04-02 03:20
Underscores are only applicable to 3.6+, but the original concern about leading zeros applies to 3.5.

On Git Hub I suggested dropping the details and just referring to the Lexical Analysis section <https://docs.python.org/3.5/reference/lexical_analysis.html#integer-literals> for the details.

FWIW here is my understanding of integer literals (with base=0):

* 0x10 => hexadecimal
* 0b10 => binary
* 0o10 => octal (corresponds to 010 in Python 2)
* 01 => illegal (avoids conflict with Python 2)
* 00 => zero (special case; was treated as octal zero in Python 2)
* 10 => decimal (must not start with digit 0)

If you want to spell out the rules, in my mind there are four special prefixes, 0x, 0b, 0o and 0, and the default is decimal if none of those prefixes apply.
msg292189 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-04-24 03:54
New changeset 26896f2832324dde85cdd63d525571ca669f6f0b by Mariatta (csabella) in branch 'master':
bpo-29751: Improve PyLong_FromString documentation (GH-915)
https://github.com/python/cpython/commit/26896f2832324dde85cdd63d525571ca669f6f0b
msg292190 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-04-24 04:02
New changeset d51d093b9bbca108f59bad0f1730c48ebf5b2e14 by Mariatta in branch '3.5':
[3.5] bpo-29751: Improve PyLong_FromString documentation (GH-915) (#1267)
https://github.com/python/cpython/commit/d51d093b9bbca108f59bad0f1730c48ebf5b2e14
msg292191 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-04-24 04:05
New changeset ea0efa3bc1d0b832da75519c6f85d767ae44feda by Mariatta in branch '3.6':
[3.6] bpo-29751: Improve PyLong_FromString documentation (GH-915) (#1266)
https://github.com/python/cpython/commit/ea0efa3bc1d0b832da75519c6f85d767ae44feda
msg292192 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-04-24 04:05
New changeset 9eb5ca0774f94215be48442100c829db2484e146 by Mariatta in branch 'master':
bpo-29751: add Cheryl Sabella to Misc/ACKS (GH-1268)
https://github.com/python/cpython/commit/9eb5ca0774f94215be48442100c829db2484e146
msg292193 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2017-04-24 04:06
I merged the PR, backported it to 3.5 and 3.6, and added Cheryl to Misc/ACKS.

Thanks everyone :)
msg292214 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2017-04-24 09:33
Oh, I didn't expect that.  That is so cool!  Thanks Mariatta.  :-)
History
Date User Action Args
2017-04-24 09:33:13cheryl.sabellasetmessages: + msg292214
2017-04-24 04:06:38Mariattasetstatus: open -> closed
resolution: fixed
messages: + msg292193

stage: backport needed -> resolved
2017-04-24 04:05:21Mariattasetmessages: + msg292192
2017-04-24 04:05:03Mariattasetmessages: + msg292191
2017-04-24 04:02:32Mariattasetmessages: + msg292190
2017-04-24 04:02:00Mariattasetpull_requests: + pull_request1380
2017-04-24 03:56:23Mariattasetpull_requests: + pull_request1379
2017-04-24 03:56:20Mariattasetpull_requests: + pull_request1378
2017-04-24 03:56:13Mariattasetstage: patch review -> backport needed
2017-04-24 03:54:11Mariattasetnosy: + Mariatta
messages: + msg292189
2017-04-02 03:20:32martin.pantersetmessages: + msg291019
2017-04-01 12:02:46cheryl.sabellasetmessages: + msg290981
2017-03-30 20:29:01terry.reedysetnosy: + terry.reedy
messages: + msg290879
2017-03-30 20:06:28Mariattasetstage: patch review
versions: + Python 3.6, Python 3.7
2017-03-30 19:52:32cheryl.sabellasetnosy: + cheryl.sabella
messages: + msg290877
2017-03-30 19:51:35cheryl.sabellasetpull_requests: + pull_request815
2017-03-09 18:37:52brett.cannonsettitle: PyLong_FromString fails on decimals with leading zero and base=0 -> PyLong_FromString documentation wrong on numbers with leading zero and base=0
2017-03-08 08:47:57mark.dickinsonsetmessages: + msg289219
2017-03-07 21:53:10serhiy.storchakasetassignee: docs@python

type: behavior -> enhancement
components: + Documentation, - Interpreter Core
nosy: + mark.dickinson, docs@python
2017-03-07 21:47:26martin.pantersetnosy: + martin.panter
messages: + msg289190
2017-03-07 21:27:48cubinatorcreate