This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Make Fraction(string) handle non-ascii slashes
Type: enhancement Stage: resolved
Components: Library (Lib), Unicode Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, frederic.grosshans, gregory.p.smith, mark.dickinson, rhettinger, serhiy.storchaka, terry.reedy, weightwatchers-carlanderson
Priority: normal Keywords:

Created on 2021-03-16 18:09 by weightwatchers-carlanderson, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (16)
msg388865 - (view) Author: Carl Anderson (weightwatchers-carlanderson) Date: 2021-03-16 18:09
Fraction works with a regular slash:

>>> from fractions import Fraction
>>> Fraction("1/2")
Fraction(1, 2)

but there are other similar slashes such as (0x2044) in which it throws an error:

>>> Fraction("0⁄2")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.7/fractions.py", line 138, in __new__
    numerator)
ValueError: Invalid literal for Fraction: '0⁄2'


This seems to come from the (?:/(?P<denom>\d+))? section of the regex _RATIONAL_FORMAT in fractions.py
msg388866 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-03-16 18:50
There's a bigger issue here about what characters should be accepted in numeric literals. The Unicode minus sign (U+2212) "−" is also not currently accepted for Fractions or any other built-in numeric type.

> but there are other similar slashes such as (0x2044) in which it throws an error

Do you have a proposal for the set of slashes that should be accepted, or a non-arbitrary rule for determining that set?  U+2044 (FRACTION SLASH), U+2215 (DIVISION SLASH) and U+FF0F (FULLWIDTH SOLIDUS) all seem like potential candidates. Are there others?
msg388867 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-03-16 18:54
Seems worth noting that Unicode fractions like ⅔ produce a FRACTION SLASH character when normalized:

>>> unicodedata.normalize('NFKC', '⅔')
'2⁄3'
>>> list(map(unicodedata.name, unicodedata.normalize('NFKC', '⅔')))
['DIGIT TWO', 'FRACTION SLASH', 'DIGIT THREE']
msg388869 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-03-16 19:04
Related: #6632
msg388884 - (view) Author: Carl Anderson (weightwatchers-carlanderson) Date: 2021-03-16 21:08
from https://en.wikipedia.org/wiki/Slash_(punctuation) there is

U+002F / SOLIDUS
U+2044 ⁄ FRACTION SLASH
U+2215 ∕ DIVISION SLASH
U+29F8 ⧸ BIG SOLIDUS
U+FF0F / FULLWIDTH SOLIDUS (fullwidth version of solidus)
U+1F67C 🙼 VERY HEAVY SOLIDUS

In XML and HTML, the slash can also be represented with the character entity &#47; or &#x2F; or &sol;.[42]

there are a couple more listed here:

https://unicode-search.net/unicode-namesearch.pl?term=SLASH
msg388886 - (view) Author: Carl Anderson (weightwatchers-carlanderson) Date: 2021-03-16 21:20
I guess if we are doing slashes, then the division sign ÷ (U+00F7) should be included too. 

There are at least 2 minus signs too (U+002D, U+02D7).
msg388892 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-16 23:22
I think we should stick the with forward slashes.  That is what the rest of the language does.  Adding more options is recipe for confusion.

>>> 38 / 5
7.6
>>> 38 ∕ 5
SyntaxError: invalid character '∕' (U+2215)
msg389132 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-03-20 00:36
I agree with Raymond, at least for now.  I would expect the string argument to Fraction to be quoted legal Python code.  Without a lot of thought and discussion leading to a change in python design with respect to unicode and operators, this limits  '/' to ascii '/'.

I believe that we accept non-ascii digits in at least some places, but operators are a different case.
msg389141 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-03-20 03:12
Dr Racket supports fraction conversions but insists on a forward slash just like we do.

Welcome to DrRacket, version 7.9.0.17--2020-12-24(f6b7f93/a) [cs].
Language: racket, with debugging; memory limit: 128 MB.
> (/ 1 2)
1/2
> (string->number "3/5")
3/5
> (string->number "2⁄3")
#f
msg389151 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-03-20 08:59
It would be nice to have an utility function in unicodedata to convert Unicode characters to their ASCII equivalents (if they exist). It would allow to explicitly convert all slashes to / (and all digits to 0-9) before passing string to Fraction constructor.

AFAIK there is a special Unicode document and tables for this.
msg389152 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2021-03-20 09:10
Carl: can you say more about the problem that motivated this issue?
msg389158 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-20 14:41
Usually, constructors try to accept format returned by repr(obj), or even str(obj). It's the case for Fraction:

>>> str(fractions.Fraction(1, 2))
'1/2'
>>> fractions.Fraction("1/2")
Fraction(1, 2)

It works as expected.

I dislike the idea of trying to handle more Unicode characters which "look like" "/", or characters like "⅔". It sounds like a can of worm, and I don't think that such feature belongs to the stdlib. You can easily write your helper function accepting string and returning a fraction.

If someone is motivated to accept more character, I would prefer to have an unified proposition covering all Python number types (int, float, Fraction, complex, etc.) and listing all characters. Maybe a PEP would make sense.
msg389186 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-03-20 22:01
The proposal I like is for a unicode numeric normalization functions that return the ascii equivalent to exist.

These ideally belong in a third party PyPI library anyways, as they're the kind of thing that needs updating every time a new unicode revision comes out.  And there are often multiple cultural interpretations for some symbols, despite any standard, so you'd wind up with a variety of functions and options for which behavior to obtain.  That isn't the kind of thing that make for a good stdlib.

Doing this by default within the language syntax itself (and thus stdlib constructors) is potentially dangerous and confusing as everything in existence in the world today that processes Python source code already has baked in single-ascii-token assumptions.  While parsing and tooling could be evolved for that, it'd be a major ecosystem impacting change.
msg389309 - (view) Author: Carl Anderson (weightwatchers-carlanderson) Date: 2021-03-22 12:25
>Carl: can you say more about the problem that motivated this issue?

@mark.dickinson

I was parsing a large corpus of ingredients strings from web-scraped recipes. My code to interpret strings such as "1/2 cup sugar" would fall over every so often due to this issue as they used fraction slash and other visually similar characters
msg389399 - (view) Author: Carl Anderson (weightwatchers-carlanderson) Date: 2021-03-23 18:19
>The proposal I like is for a unicode numeric normalization functions that return the ascii equivalent to exist.

@Gregory P. Smith 
this makes sense to me. That does feel like the cleanest solution. 
I'm currently doing s = s.replace("⁄","/") but it would be good to have a well-maintained normalization method that contained the all the relevant mappings as an independent preprocess step to Fraction would work well.
msg391776 - (view) Author: Frédéric Grosshans-André (frederic.grosshans) Date: 2021-04-24 13:09
@Gregory P. Smith 

unicodedata.numeric, in the sdandard library, already handles non-Ascii fractions in many scripts. The current “problem” is it outputs a float (even for integers):

>>> unicodedata.numeric('⅔')
0.6666666666666666

The UnicodeData.txt file from the Unicode standard it takes its data from, however, contains the corresponding “ascii fractions”. For example, below are two lines of this file for two (very) different ways of encoding two thirds

2154;VULGAR FRACTION TWO THIRDS;No;0;ON;<fraction> 0032 2044 0033;;;2/3;N;FRACTION TWO THIRDS;;;;
1245B;CUNEIFORM NUMERIC SIGN TWO THIRDS DISH;Nl;0;L;;;;2/3;N;;;;;

Adding an exact value extraction to unicodedata should be doable, either via an function or an extra keyword to the unicodedata.numeric function.

The only information that would be lost (but which is unavailable now anyway) would be for the few codepoints which encode reducible fractions. As of unicode 13.0, these codepoints are

* ↉ U+2189 VULGAR FRACTION ZERO THIRDS
* 𐧷 U+109F7 MEROITIC CURSIVE FRACTION TWO TWELFTHS
* 𐧸 U+109F8 MEROITIC CURSIVE FRACTION THREE TWELFTHS
* 𐧹 U+109F9 MEROITIC CURSIVE FRACTION FOUR TWELFTHS
* 𐧻 U+109FB MEROITIC CURSIVE FRACTION SIX TWELFTHS
* 𐧽 U+109FD MEROITIC CURSIVE FRACTION EIGHT TWELFTHS
* 𐧾 U+109FE MEROITIC CURSIVE FRACTION NINE TWELFTHS
* 𐧿 U+109FF MEROITIC CURSIVE FRACTION TEN TWELFTHS
History
Date User Action Args
2022-04-11 14:59:42adminsetgithub: 87686
2021-04-27 14:19:43vstinnersetnosy: - vstinner
2021-04-24 13:09:34frederic.grosshanssetnosy: + frederic.grosshans
messages: + msg391776
2021-03-23 18:19:03weightwatchers-carlandersonsetmessages: + msg389399
2021-03-22 12:25:08weightwatchers-carlandersonsetmessages: + msg389309
2021-03-20 22:02:11gregory.p.smithsetstatus: open -> closed
resolution: rejected
stage: resolved
2021-03-20 22:01:36gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg389186
2021-03-20 14:41:27vstinnersetmessages: + msg389158
2021-03-20 09:10:27mark.dickinsonsetmessages: + msg389152
2021-03-20 08:59:59serhiy.storchakasetnosy: + vstinner, serhiy.storchaka
messages: + msg389151
components: + Unicode
2021-03-20 03:12:23rhettingersetmessages: + msg389141
2021-03-20 00:36:24terry.reedysetnosy: + terry.reedy

messages: + msg389132
title: Fraction only handles regular slashes ("/") and fails with other similar slashes -> Make Fraction(string) handle non-ascii slashes
2021-03-16 23:22:58rhettingersetnosy: + rhettinger
messages: + msg388892
2021-03-16 21:20:10weightwatchers-carlandersonsetmessages: + msg388886
2021-03-16 21:11:22ezio.melottisetnosy: + ezio.melotti
2021-03-16 21:08:59weightwatchers-carlandersonsetmessages: + msg388884
2021-03-16 19:04:47mark.dickinsonsetmessages: + msg388869
2021-03-16 18:54:43mark.dickinsonsetmessages: + msg388867
2021-03-16 18:50:29mark.dickinsonsetnosy: + mark.dickinson
messages: + msg388866
2021-03-16 18:09:04weightwatchers-carlandersoncreate