classification
Title: PEP 515: Tokenizer: allow underscores for grouping in numeric literals
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: brett.cannon, encukou, eric.smith, ethan.furman, georg.brandl, mark.dickinson, ned.deily, python-dev, rhettinger, scoder, serhiy.storchaka, skrah, xiang.zhang, yselivanov
Priority: release blocker Keywords: patch

Created on 2016-02-10 17:50 by georg.brandl, last changed 2016-09-11 16:20 by georg.brandl. This issue is now closed.

Files
File name Uploaded Description Edit
numeric_underscores_strict.patch serhiy.storchaka, 2016-02-11 10:06 review
numeric_underscores_v4_full.diff georg.brandl, 2016-02-11 12:41 review
numeric_underscores_final_v6.diff georg.brandl, 2016-05-15 06:57 review
numeric_underscores_final_v7.diff georg.brandl, 2016-05-18 04:52 review
numeric_underscores_final_v8.diff georg.brandl, 2016-05-22 07:35 review
Messages (41)
msg260026 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-02-10 17:50
As discussed on python-ideas: https://mail.python.org/pipermail/python-ideas/2016-February/038354.html

The rules are: 
Underscores are allowed anywhere in numeric literals, except:

* at the beginning of a literal (obviously)
* at the end of a literal
* directly after a dot (since the underscore could start an attribute name)
* directly after a sign in exponents (for consistency with leading signs)
* in the middle of the "0x", "0o" or "0b" base specifiers

Currently this only touches literals, not the inputs of int() or float().  Whether they should accept this syntax is debatable (I'd vote no).

Otherwise missing: doc updates.

Review question: is PyMem_RawStrdup/RawFree the right API to use here?
msg260029 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-10 18:25
I prefer simpler and more strict rule:

* Underscores are allowed only between digits in numeric literals.

Thus 1__2, 12_, 1_.2, 1_e2, 1e_2, 1_j, 0x_12 are not allowed.

It is easier to make the rule more lenient later if it will be needed.
msg260031 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-02-10 18:43
It sure is more strict, but I don't think it's simpler (and it's definitely not simpler to implement).

(Also 1_j is pretty nice, I wouldn't want to lose that.)

We can also check what other languages do.

* Rust: very much like this, but trailing underscores allowed.
* Perl 5: same as here, but underscores after dot and trailing underscores allowed.
* Ruby: only between digits.

* Swift: the grammar productions say it's basically the same as Rust.  The textual description says "between digits".
msg260033 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-10 19:46
* Java: only between digits. [1]
* Julia: only between digits. [2] (not well specified)
* C# 7.0 (proposal): only between digits, but adjacent underscores allowed. [3]
* Ada: only between digits. [4] (strong but very simple rules)
* D: very much like proposed patch, but trailing underscores allowed. [5]
* Perl 5: only between digits as documented (23__500 is not legal), but actually more lenient. [6]

[1] https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html
[2] http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/
[3] https://github.com/dotnet/roslyn/issues/216
[4] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4
[5] http://dlang.org/spec/lex.html#integerliteral
[6] http://perldoc.perl.org/perldata.html#Scalar-value-constructors
msg260036 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-02-10 20:25
> I prefer simpler and more strict rule:
> * Underscores are allowed only between digits in numeric literals.

+1.  But in any case we need a PEP for this change.
msg260037 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-10 20:52
C++14 uses the same strict rule as Ada, but uses apostrphes instead of underscores. [1]

Thus there are two groups of languages, implementing strict or lenient rules:

* Strict: Ada, C++, Java, C#, Ruby, Julia, Perl (as documented), Swift (textual description).
* Lenient: D, Rust, Perl (actually), Swift (grammar productions).

[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html
msg260046 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-02-10 22:21
PEP 515 is written up and posted to python-dev.
msg260057 - (view) Author: Petr Viktorin (encukou) * Date: 2016-02-10 23:09
Regarding the patch: if trailing underscores are not allowed, `0 if 1_____else 1` should be illegal.
msg260077 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-02-11 08:08
New patch matching revision of PEP.
msg260081 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-11 10:06
Proposed patch implements strict underscore rules. The implementation is not more complex.
msg260093 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-02-11 11:40
This patch includes int(), float(), complex() operations, as well as _pydecimal.
msg260102 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-02-11 12:41
New patch with minimal doc updates.
msg260229 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-02-13 10:27
I like the feature for literals, but I'm not sure about conversions from string. It slows down the conversion for (IMO) a very small benefit.

Other languages allow it, but I've never attempted to use the feature:

$ ocaml
        OCaml version 4.02.1

# float_of_string "__1____2____.___e___101_";;
- : float = 1.2e+102
msg260230 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-02-13 11:21
It's mostly for consistency. For example, ``int(x, 0)`` is defined by the docs as "interpret x as in a literal".  Other bases have special cases as well, e.g. "0x" is accepted by base 16.

In the current version of the conversions, the string is scanned for "_" before doing the more expensive allocation+copy.
msg260231 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-02-13 11:39
If the string conversions stay, may I suggest two functions:

  1) PyUnicode_NumericAsAscii()
  2) PyUnicode_NumericAsAsciiWS()

The first one eliminates only underscores, the second one both
underscores and leading/trailing whitespace.

Decimal must support both:

  https://hg.python.org/cpython/file/default/Modules/_decimal/_decimal.c#l1890
msg260232 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-02-13 11:40
Correction: The explanation of the functions should be reversed.
msg260233 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-02-13 11:47
Thanks, I hadn't looked at cdecimal yet - I was planning to ask you to do the necessary changes there :)

But there are a few versions of this (e.g. converting unicode digits to ASCII) scattered throughout the codebase, it would make sense to consolidate on this occasion.
msg260235 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-02-13 12:05
> Georg Brandl added the comment:
> 
> Thanks, I hadn't looked at cdecimal yet - I was planning to ask you to do the necessary changes there :)

Oh, well. :)

> But there are a few versions of this (e.g. converting unicode digits to ASCII) scattered throughout the codebase, it would make sense to consolidate on this occasion.

Yes, actually I have to look at the _decimal version again, it contains
some optimizations that may only work for _decimal:

  https://hg.python.org/cpython/file/default/Modules/_decimal/_decimal.c#l1943

I *did* optimize it for speed at the time, I hope general functions won't be
slower.
msg260239 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-02-13 13:28
I still wonder about the complexity of all this for decimal. We now have two grammars on top of each other, this being the actual one for decimal:

  http://speleotrove.com/decimal/daconvs.html


For string conversions I'd prefer a lax way (similar to OCaml) that would somehow be specified in terms of preprocessing, same as the leading/trailing whitespace removal. Short of "ignore all underscores" it isn't easy though.
msg260240 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-02-13 13:52
Hm. On the one hand there is a spec, so it can be argued that underscores don't belong to Decimal.

On the other hand, if we get Decimal literals at one point, there will be a strong argument for allowing underscores in them as in all other number literals.

Although supporting them in strings can also be added at that time.
msg260241 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-02-13 13:53
Raymond, you've also worked on Decimal - do you have an opinion on allowing underscores in Decimal(string) conversions?
msg262037 - (view) Author: Stefan Behnel (scoder) * Date: 2016-03-19 10:38
Nice one. While reimplementing it for Cython, I noticed that the grammar described in the PEP isn't exactly as it's implemented, though. The grammar says

    digit (["_"] digit)*

whereas the latest patch (v4) says

    `digit` (`digit` | "_")*

and also implements it that way. The former doesn't allow underscores at the end of a literal.

And the regexes in tokenize.py seem happy to accept "0x___", for example. Is that intended?
msg262043 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-03-19 12:32
The last patch isn't up to date with the PEP; Serhiy's patch is the closest one.
msg262050 - (view) Author: Stefan Behnel (scoder) * Date: 2016-03-19 15:07
Ah, thanks. Here's my implementation then:

https://github.com/cython/cython/pull/499/files

It seems that tests for valid complex literals are missing. I've added these to the end of the list:

    '1_00_00.5j',
    '1_00_00.5e5',
    '1_00_00j',
    '1_00_00e5_1',
    '1e1_0',
    '.1_4',
    '.1_4e1',
    '.1_4j',
msg265587 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-05-15 06:41
New patch; implements the accepted version of the PEP. I added the additional tests, thanks Stefan!
msg265589 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-05-15 07:01
Note: the changes for format()ting ("_" as thousands separator) are still missing. Eric, would you consider doing this part?
msg265616 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2016-05-15 13:57
Yes, I'll read PEP 515 and work on the formatting.
msg265804 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-05-18 04:52
Thanks Eric!

Serhiy, do you want to do a review? The v6/v7 patches are based on your "strict" patch with the constructor changes adapted from v4.

New version v7 addresses the review comments from Stefan and Martin.
msg265821 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-18 08:21
Added comments on Rietveld.
msg265825 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-05-18 10:59
Thanks, Georg! The decimal parts look good to me. I understand that
people wonder about the relaxed rules for Decimal -- we have discussed
that here:

    https://mail.python.org/pipermail/python-dev/2016-March/143557.html


I don't think that it will be a problem in practice.
msg266024 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2016-05-21 20:38
I've created issue 27080 to track the formatting part of this.
msg266058 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-05-22 07:35
Thanks for the detailed review, Serhiy! Next try incoming.
msg272834 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-08-16 05:05
@Serhiy/anyone: can I get another review, so that we can commit this in time for beta? Thanks!
msg273384 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-08-22 17:02
Hi Georg, I left several comments on Rietveld. Hope it helps.
msg275148 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-09-08 20:44
Georg, do you think you will be able to get this in for 3.6b1? If not I can commit it while I'm at the core sprint. I'll wait until tomorrow to see if you reply, otherwise I'm just going to address patch comments and then commit it on your behalf.
msg275262 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-09-09 05:21
Please go ahead. Thanks for taking care of this!
msg275364 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-09-09 18:04
I'll get this committed today (patch still applies and passes the tests, so it should only take addressing the review comments and the What's New entry).
msg275461 - (view) Author: Roundup Robot (python-dev) Date: 2016-09-09 21:57
New changeset 8a881dafe335 by Brett Cannon in branch 'default':
Issue #26331: Implement the parsing part of PEP 515.
https://hg.python.org/cpython/rev/8a881dafe335
msg275463 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-09-09 21:58
All applied! And Eric said he will handle the patch for format() which should cover the other half of PEP 515. Once Eric's side is done I guess we can mark PEP 515 as final.
msg275551 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2016-09-10 03:08
I'm done with the formatting (issue 27080), so PEP 515 can be marked as final.
msg275804 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2016-09-11 16:20
Thanks Brett!
History
Date User Action Args
2016-09-11 16:20:07georg.brandlsetmessages: + msg275804
2016-09-10 03:08:34eric.smithsetmessages: + msg275551
2016-09-09 21:58:37brett.cannonsetstatus: open -> closed
resolution: fixed
messages: + msg275463

stage: patch review -> resolved
2016-09-09 21:57:18python-devsetnosy: + python-dev
messages: + msg275461
2016-09-09 18:04:15brett.cannonsetassignee: brett.cannon
messages: + msg275364
2016-09-09 05:21:21georg.brandlsetmessages: + msg275262
2016-09-08 21:28:09brett.cannonsetpriority: deferred blocker -> release blocker
nosy: + ned.deily
2016-09-08 21:27:56brett.cannonsetpriority: normal -> deferred blocker
2016-09-08 20:44:39brett.cannonsetnosy: + brett.cannon
messages: + msg275148
2016-08-22 17:02:01xiang.zhangsetnosy: + xiang.zhang
messages: + msg273384
2016-08-17 10:59:09hayposettitle: Tokenizer: allow underscores for grouping in numeric literals -> PEP 515: Tokenizer: allow underscores for grouping in numeric literals
2016-08-16 05:05:32georg.brandlsetmessages: + msg272834
2016-05-22 07:35:48georg.brandlsetfiles: + numeric_underscores_final_v8.diff

messages: + msg266058
2016-05-21 20:38:30eric.smithsetmessages: + msg266024
2016-05-18 10:59:54skrahsetmessages: + msg265825
2016-05-18 08:21:05serhiy.storchakasetmessages: + msg265821
2016-05-18 04:52:42georg.brandlsetfiles: + numeric_underscores_final_v7.diff

messages: + msg265804
2016-05-15 13:57:06eric.smithsetmessages: + msg265616
2016-05-15 07:01:42georg.brandlsetnosy: + eric.smith
messages: + msg265589
2016-05-15 06:58:06georg.brandlsetfiles: - numeric_underscores_v2.diff
2016-05-15 06:58:00georg.brandlsetfiles: - numeric_underscores.diff
2016-05-15 06:57:55georg.brandlsetfiles: - numeric_underscores_v3_full.diff
2016-05-15 06:57:42georg.brandlsetfiles: + numeric_underscores_final_v6.diff
2016-05-15 06:57:34georg.brandlsetfiles: - numeric_underscores_final_v5.diff
2016-05-15 06:41:49georg.brandlsetfiles: + numeric_underscores_final_v5.diff

messages: + msg265587
2016-03-19 15:07:20scodersetmessages: + msg262050
2016-03-19 12:32:11georg.brandlsetmessages: + msg262043
2016-03-19 10:38:34scodersetnosy: + scoder
messages: + msg262037
2016-02-13 13:53:30georg.brandlsetnosy: + rhettinger
messages: + msg260241
2016-02-13 13:52:48georg.brandlsetmessages: + msg260240
2016-02-13 13:28:33skrahsetmessages: + msg260239
2016-02-13 12:05:06skrahsetmessages: + msg260235
2016-02-13 11:47:46georg.brandlsetmessages: + msg260233
2016-02-13 11:40:36skrahsetmessages: + msg260232
2016-02-13 11:39:09skrahsetmessages: + msg260231
2016-02-13 11:21:19georg.brandlsetmessages: + msg260230
2016-02-13 10:27:52skrahsetnosy: + skrah
messages: + msg260229
2016-02-11 17:50:07mark.dickinsonsetnosy: + mark.dickinson
2016-02-11 12:41:53georg.brandlsetfiles: + numeric_underscores_v4_full.diff

messages: + msg260102
2016-02-11 11:40:14georg.brandlsetfiles: + numeric_underscores_v3_full.diff

messages: + msg260093
2016-02-11 10:06:35serhiy.storchakasetfiles: + numeric_underscores_strict.patch

messages: + msg260081
2016-02-11 08:08:41georg.brandlsetfiles: + numeric_underscores_v2.diff

messages: + msg260077
2016-02-10 23:09:42encukousetnosy: + encukou
messages: + msg260057
2016-02-10 22:21:43georg.brandlsetmessages: + msg260046
2016-02-10 20:52:18serhiy.storchakasetmessages: + msg260037
2016-02-10 20:25:04yselivanovsetnosy: + yselivanov
messages: + msg260036
2016-02-10 19:46:55serhiy.storchakasetmessages: + msg260033
2016-02-10 18:43:40georg.brandlsetmessages: + msg260031
2016-02-10 18:25:53serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg260029
2016-02-10 17:54:50ethan.furmansetnosy: + ethan.furman
2016-02-10 17:50:24georg.brandlcreate