This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Numeric Literals vs string "1_1" on input int() or float() or literal_eval
Type: behavior Stage: resolved
Components: Versions: Python 3.9, Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: mark.dickinson, rhettinger, serhiy.storchaka, zd nex
Priority: normal Keywords:

Created on 2020-03-13 17:08 by zd nex, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (4)
msg364112 - (view) Author: zd nex (zd nex) Date: 2020-03-13 17:08
So currently if python code contains 1_1 it is handled as number 11. When user uses int("1_1") it also creates 11 and when ast.literal_eval is used it is also created instead of string. How can user get SyntaxError input on int or literal_eval with obviously wrong input (some keyboards have . next to _) like int(input()) in REPL? In python2.7 this was checked, but now even string is handled as number. Is there some reason? 

I understand reasoning behind PEP515, that int(1_1) can create 11, but why int("1_1") creates it also? Previously users used literal_eval for safe check of values, but now user can put 1_1 and it is transferred as number. Is there some plan to be able control behavior of these functions? I was now with some students, which used python2.7 and they find it also confusing. Most funny thing is that when they do same thing in JavaScript parseInt("1_1") they get 1, in old python this was error and now we give them 11. 

I would suggest that it would be possible to strictly check strings, as it was in old Python2.7. This way user would be able to use _ in code to arrange numbers, but it would also allow checks on wrong inputs of users which were meant something else, for example if you use it in try/except in console.
msg364115 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-03-13 18:17
I prefer what we have now.  The language is consistent across script input, int(), and literal_eval().  

In my courses, I haven't encountered any issues with Python allowing "1_1".  The actual problem my learners encounter is the need to strip commas from input.
msg364117 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-13 18:29
You can validate the input before using it.

    if '_' in s: raise ValueError

or

    if not re.fullmatch('[0-9]+', s): raise ValueError

Do you want to accept "۱۲۳۴" or "       12       "? If not then validate the input before using int().

Also, do not use ast.literal_eval() with untrusted input without validation. It is not a "safe eval" and may even crash the interpreter.
msg364149 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-03-14 10:41
[Raymond]

> I prefer what we have now.  The language is consistent [...]

Agreed. I don't see value in having two different sets of rules, one for numeric literals and one for explicit str-to-int conversions. And if we *were* to adopt a different set of rules for str-to-int conversions, what would those rules be? There are a lot of fairly arbitrary choices to make (whitespace before/after/between sign and digits, digit sets, leading zeros, characters permitted as signs, permissible digit separators).
The decision would be easier if there were a widespread standard that could help us choose a particular ruleset, but I'm not aware of any such standard.

Much cleaner and simpler to have the rules for str-to-int match those for numeric literal parsing. (And similarly for floats.)

[zd nex]
> I would suggest that it would be possible to strictly check strings [...]

As Serhiy pointed out, it already is possible, in a variety of ways. If you're arguing for something like `int("+123", strict=True)`, you'd need to say exactly what "strict=True" should mean, make a case that your particular choice is sufficiently standard and useful to others to make it worth adding to core Python, consider how it would interact with the "base" argument, and a whole lot more. If you want to take that forward, I think that's something you'd need to bring up on the python-ideas mailing list for further discussion. I'll close here.
History
Date User Action Args
2022-04-11 14:59:28adminsetgithub: 84137
2020-03-14 10:41:48mark.dickinsonsetstatus: open -> closed

nosy: + mark.dickinson
messages: + msg364149

resolution: rejected
stage: resolved
2020-03-13 18:29:31serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg364117
2020-03-13 18:17:26rhettingersetnosy: + rhettinger
messages: + msg364115
2020-03-13 17:08:02zd nexcreate