msg388865 - (view) |
Author: Carl Anderson (weightwatchers-carlanderson) |
Date: 2021-03-16 18:09 |
Fraction works with a regular slash:
>>> from fractions import Fraction
>>> Fraction("1/2")
Fraction(1, 2)
but there are other similar slashes such as (0x2044) in which it throws an error:
>>> Fraction("0⁄2")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/anaconda3/lib/python3.7/fractions.py", line 138, in __new__
numerator)
ValueError: Invalid literal for Fraction: '0⁄2'
This seems to come from the (?:/(?P<denom>\d+))? section of the regex _RATIONAL_FORMAT in fractions.py
|
msg388866 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2021-03-16 18:50 |
There's a bigger issue here about what characters should be accepted in numeric literals. The Unicode minus sign (U+2212) "−" is also not currently accepted for Fractions or any other built-in numeric type.
> but there are other similar slashes such as (0x2044) in which it throws an error
Do you have a proposal for the set of slashes that should be accepted, or a non-arbitrary rule for determining that set? U+2044 (FRACTION SLASH), U+2215 (DIVISION SLASH) and U+FF0F (FULLWIDTH SOLIDUS) all seem like potential candidates. Are there others?
|
msg388867 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2021-03-16 18:54 |
Seems worth noting that Unicode fractions like ⅔ produce a FRACTION SLASH character when normalized:
>>> unicodedata.normalize('NFKC', '⅔')
'2⁄3'
>>> list(map(unicodedata.name, unicodedata.normalize('NFKC', '⅔')))
['DIGIT TWO', 'FRACTION SLASH', 'DIGIT THREE']
|
msg388869 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2021-03-16 19:04 |
Related: #6632
|
msg388884 - (view) |
Author: Carl Anderson (weightwatchers-carlanderson) |
Date: 2021-03-16 21:08 |
from https://en.wikipedia.org/wiki/Slash_(punctuation) there is
U+002F / SOLIDUS
U+2044 ⁄ FRACTION SLASH
U+2215 ∕ DIVISION SLASH
U+29F8 ⧸ BIG SOLIDUS
U+FF0F / FULLWIDTH SOLIDUS (fullwidth version of solidus)
U+1F67C 🙼 VERY HEAVY SOLIDUS
In XML and HTML, the slash can also be represented with the character entity / or / or /.[42]
there are a couple more listed here:
https://unicode-search.net/unicode-namesearch.pl?term=SLASH
|
msg388886 - (view) |
Author: Carl Anderson (weightwatchers-carlanderson) |
Date: 2021-03-16 21:20 |
I guess if we are doing slashes, then the division sign ÷ (U+00F7) should be included too.
There are at least 2 minus signs too (U+002D, U+02D7).
|
msg388892 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2021-03-16 23:22 |
I think we should stick the with forward slashes. That is what the rest of the language does. Adding more options is recipe for confusion.
>>> 38 / 5
7.6
>>> 38 ∕ 5
SyntaxError: invalid character '∕' (U+2215)
|
msg389132 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2021-03-20 00:36 |
I agree with Raymond, at least for now. I would expect the string argument to Fraction to be quoted legal Python code. Without a lot of thought and discussion leading to a change in python design with respect to unicode and operators, this limits '/' to ascii '/'.
I believe that we accept non-ascii digits in at least some places, but operators are a different case.
|
msg389141 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2021-03-20 03:12 |
Dr Racket supports fraction conversions but insists on a forward slash just like we do.
Welcome to DrRacket, version 7.9.0.17--2020-12-24(f6b7f93/a) [cs].
Language: racket, with debugging; memory limit: 128 MB.
> (/ 1 2)
1/2
> (string->number "3/5")
3/5
> (string->number "2⁄3")
#f
|
msg389151 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2021-03-20 08:59 |
It would be nice to have an utility function in unicodedata to convert Unicode characters to their ASCII equivalents (if they exist). It would allow to explicitly convert all slashes to / (and all digits to 0-9) before passing string to Fraction constructor.
AFAIK there is a special Unicode document and tables for this.
|
msg389152 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2021-03-20 09:10 |
Carl: can you say more about the problem that motivated this issue?
|
msg389158 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2021-03-20 14:41 |
Usually, constructors try to accept format returned by repr(obj), or even str(obj). It's the case for Fraction:
>>> str(fractions.Fraction(1, 2))
'1/2'
>>> fractions.Fraction("1/2")
Fraction(1, 2)
It works as expected.
I dislike the idea of trying to handle more Unicode characters which "look like" "/", or characters like "⅔". It sounds like a can of worm, and I don't think that such feature belongs to the stdlib. You can easily write your helper function accepting string and returning a fraction.
If someone is motivated to accept more character, I would prefer to have an unified proposition covering all Python number types (int, float, Fraction, complex, etc.) and listing all characters. Maybe a PEP would make sense.
|
msg389186 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2021-03-20 22:01 |
The proposal I like is for a unicode numeric normalization functions that return the ascii equivalent to exist.
These ideally belong in a third party PyPI library anyways, as they're the kind of thing that needs updating every time a new unicode revision comes out. And there are often multiple cultural interpretations for some symbols, despite any standard, so you'd wind up with a variety of functions and options for which behavior to obtain. That isn't the kind of thing that make for a good stdlib.
Doing this by default within the language syntax itself (and thus stdlib constructors) is potentially dangerous and confusing as everything in existence in the world today that processes Python source code already has baked in single-ascii-token assumptions. While parsing and tooling could be evolved for that, it'd be a major ecosystem impacting change.
|
msg389309 - (view) |
Author: Carl Anderson (weightwatchers-carlanderson) |
Date: 2021-03-22 12:25 |
>Carl: can you say more about the problem that motivated this issue?
@mark.dickinson
I was parsing a large corpus of ingredients strings from web-scraped recipes. My code to interpret strings such as "1/2 cup sugar" would fall over every so often due to this issue as they used fraction slash and other visually similar characters
|
msg389399 - (view) |
Author: Carl Anderson (weightwatchers-carlanderson) |
Date: 2021-03-23 18:19 |
>The proposal I like is for a unicode numeric normalization functions that return the ascii equivalent to exist.
@Gregory P. Smith
this makes sense to me. That does feel like the cleanest solution.
I'm currently doing s = s.replace("⁄","/") but it would be good to have a well-maintained normalization method that contained the all the relevant mappings as an independent preprocess step to Fraction would work well.
|
msg391776 - (view) |
Author: Frédéric Grosshans-André (frederic.grosshans) |
Date: 2021-04-24 13:09 |
@Gregory P. Smith
unicodedata.numeric, in the sdandard library, already handles non-Ascii fractions in many scripts. The current “problem” is it outputs a float (even for integers):
>>> unicodedata.numeric('⅔')
0.6666666666666666
The UnicodeData.txt file from the Unicode standard it takes its data from, however, contains the corresponding “ascii fractions”. For example, below are two lines of this file for two (very) different ways of encoding two thirds
2154;VULGAR FRACTION TWO THIRDS;No;0;ON;<fraction> 0032 2044 0033;;;2/3;N;FRACTION TWO THIRDS;;;;
1245B;CUNEIFORM NUMERIC SIGN TWO THIRDS DISH;Nl;0;L;;;;2/3;N;;;;;
Adding an exact value extraction to unicodedata should be doable, either via an function or an extra keyword to the unicodedata.numeric function.
The only information that would be lost (but which is unavailable now anyway) would be for the few codepoints which encode reducible fractions. As of unicode 13.0, these codepoints are
* ↉ U+2189 VULGAR FRACTION ZERO THIRDS
* 𐧷 U+109F7 MEROITIC CURSIVE FRACTION TWO TWELFTHS
* 𐧸 U+109F8 MEROITIC CURSIVE FRACTION THREE TWELFTHS
* 𐧹 U+109F9 MEROITIC CURSIVE FRACTION FOUR TWELFTHS
* 𐧻 U+109FB MEROITIC CURSIVE FRACTION SIX TWELFTHS
* 𐧽 U+109FD MEROITIC CURSIVE FRACTION EIGHT TWELFTHS
* 𐧾 U+109FE MEROITIC CURSIVE FRACTION NINE TWELFTHS
* 𐧿 U+109FF MEROITIC CURSIVE FRACTION TEN TWELFTHS
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:42 | admin | set | github: 87686 |
2021-04-27 14:19:43 | vstinner | set | nosy:
- vstinner
|
2021-04-24 13:09:34 | frederic.grosshans | set | nosy:
+ frederic.grosshans messages:
+ msg391776
|
2021-03-23 18:19:03 | weightwatchers-carlanderson | set | messages:
+ msg389399 |
2021-03-22 12:25:08 | weightwatchers-carlanderson | set | messages:
+ msg389309 |
2021-03-20 22:02:11 | gregory.p.smith | set | status: open -> closed resolution: rejected stage: resolved |
2021-03-20 22:01:36 | gregory.p.smith | set | nosy:
+ gregory.p.smith messages:
+ msg389186
|
2021-03-20 14:41:27 | vstinner | set | messages:
+ msg389158 |
2021-03-20 09:10:27 | mark.dickinson | set | messages:
+ msg389152 |
2021-03-20 08:59:59 | serhiy.storchaka | set | nosy:
+ vstinner, serhiy.storchaka messages:
+ msg389151 components:
+ Unicode
|
2021-03-20 03:12:23 | rhettinger | set | messages:
+ msg389141 |
2021-03-20 00:36:24 | terry.reedy | set | nosy:
+ terry.reedy
messages:
+ msg389132 title: Fraction only handles regular slashes ("/") and fails with other similar slashes -> Make Fraction(string) handle non-ascii slashes |
2021-03-16 23:22:58 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg388892
|
2021-03-16 21:20:10 | weightwatchers-carlanderson | set | messages:
+ msg388886 |
2021-03-16 21:11:22 | ezio.melotti | set | nosy:
+ ezio.melotti
|
2021-03-16 21:08:59 | weightwatchers-carlanderson | set | messages:
+ msg388884 |
2021-03-16 19:04:47 | mark.dickinson | set | messages:
+ msg388869 |
2021-03-16 18:54:43 | mark.dickinson | set | messages:
+ msg388867 |
2021-03-16 18:50:29 | mark.dickinson | set | nosy:
+ mark.dickinson messages:
+ msg388866
|
2021-03-16 18:09:04 | weightwatchers-carlanderson | create | |