Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_decimal: Implement the previously rejected changes from #7442. #79878

Closed
vstinner opened this issue Jan 9, 2019 · 13 comments
Closed

_decimal: Implement the previously rejected changes from #7442. #79878

vstinner opened this issue Jan 9, 2019 · 13 comments
Assignees
Labels
3.8 only security fixes stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

vstinner commented Jan 9, 2019

BPO 35697
Nosy @rhettinger, @vstinner, @skrah
PRs
  • bpo-35697, decimal: Fix locale formatting #11474
  • bpo-35697, decimal: Fix locale formatting #11474
  • bpo-35697, decimal: Fix locale formatting #11474
  • Files
  • decimal_locale.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/skrah'
    closed_at = <Date 2019-04-16.15:36:25.707>
    created_at = <Date 2019-01-09.11:31:48.426>
    labels = ['3.8', 'library']
    title = '_decimal: Implement the previously rejected changes from python/cpython#51691.'
    updated_at = <Date 2019-04-16.15:36:25.706>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2019-04-16.15:36:25.706>
    actor = 'vstinner'
    assignee = 'skrah'
    closed = True
    closed_date = <Date 2019-04-16.15:36:25.707>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2019-01-09.11:31:48.426>
    creator = 'vstinner'
    dependencies = []
    files = ['48038']
    hgrepos = []
    issue_num = 35697
    keywords = ['patch', 'patch', 'patch']
    message_count = 13.0
    messages = ['333302', '333303', '333304', '333306', '333307', '333310', '333311', '333313', '333316', '333317', '333321', '333542', '340348']
    nosy_count = 3.0
    nosy_names = ['rhettinger', 'vstinner', 'skrah']
    pr_nums = ['11474', '11474', '11474']
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue35697'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2019

    The decimal module support formatting a number in the "n" formatting type if the LC_NUMERIC locale uses a different encoding than the LC_CTYPE locale.

    Example with attached decimal_locale.py on Fedora 29 with Python 3.7.2:

    $ python3 decimal_locale.py 
    LC_NUMERIC locale: uk_UA.koi8u
    decimal_point: ',' = ',' = U+002c
    thousands_sep: '\xa0' = '\xa0' = U+00a0
    Traceback (most recent call last):
      File "/home/vstinner/decimal_locale.py", line 16, in <module>
        text = format(num, "n")
    ValueError: invalid decimal point or unsupported combination of LC_CTYPE and LC_NUMERIC

    Attached PR modify the _decimal module to support this corner case.

    Note: I already wrote PR 5191 last year, but I abandoned the PR in the meanwhile.

    --

    Supporting non-ASCII decimal point and thousands separator has a long history and a list of now fixed issues:

    I even wrote an article about these bugs :-)
    #5191

    Python 3.7.2 now supports different encodings for LC_NUMERIC, LC_MONETARY and LC_CTYPE locales. format(int, "n") sets temporarily LC_CTYPE to LC_NUMERIC to decode decimal_point and thousands_sep from the correct encoding. The LC_CTYPE locale is only changed if it's different than LC_NUMERIC locale and if the decimal point and/or thousands separator is non-ASCII. It's implemented in this function:

    int
    _Py_GetLocaleconvNumeric(struct lconv *lc,
                             PyObject **decimal_point, PyObject **thousands_sep)

    Function used by locale.localeconv() and format() (for "n" type).

    I decided to fix the bug when I was fixing other locale bugs because we now got enough bug reports.

    Copy of my msg309980:

    """

    I would not consider this a bug in Python, but rather in the locale settings passed to setlocale().

    Past 10 years, I repeated to every single user I met that "Python 3 is right, your system setup is wrong". But that's a waste of time. People continue to associate Python3 and Unicode to annoying bugs, because they don't understand how locales work.

    Instead of having to repeat to each user that "hum, maybe your config is wrong", I prefer to support this non convential setup and work as expected ("it just works"). With my latest implementation, setlocale() is only done when LC_CTYPE and LC_NUMERIC are different, which is the corner case which "shouldn't occur in practice".
    """

    @vstinner vstinner added 3.8 only security fixes stdlib Python modules in the Lib dir labels Jan 9, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2019

    Oh, I was wrong: bpo-25812 has not been fixed yet.

    @skrah skrah mannequin self-assigned this Jan 9, 2019
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 9, 2019

    Since bpo-7442 (again, *I* discovered this and it is *mentioned* in the
    _decimal sources), there have been zero bug reports about decimal.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2019

    Ok, I wrote PR 11474. Correct result with this PR:

    $ ./python decimal_locale.py 
    LC_NUMERIC locale: uk_UA.koi8u
    decimal_point: ',' = ',' = U+002c
    thousands_sep: '\xa0' = '\xa0' = U+00a0
    format: '1\xa0200,5' = 1 200,5 = U+0031 U+00a0 U+0032 U+0030 U+0030 U+002c U+0035

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2019

    Stefan Krah:

    Since bpo-7442 (again, *I* discovered this and it is *mentioned* in the
    _decimal sources), ...

    I closed bpo-7442 in the meanwhile, so I opened this new issue specific to the decimal module.

    ... there have been zero bug reports about decimal.

    Well, here you have :-) A bug report about decimal.

    I let you decide what to do with this bug. I wrote a fix. It's up to you to merge it, reject it or do nothing :-)

    I wanted to make sure that the bug is fixed in all parts of the Python stdlib.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 9, 2019

    Don't you find it strange to close bpo-7442 in mutual agreement and now
    mention the word "bug" 50 times?

    @skrah skrah mannequin changed the title decimal: formatter error if LC_NUMERIC uses a different encoding than LC_CTYPE _decimal: Implement the previously rejected changes from #7442. Jan 9, 2019
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 9, 2019

    Also Marc-Andre does not consider this a bug in bpo-31900. The
    presentation of this issue is increasingly bizarre.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2019

    Stefan Krah:

    Don't you find it strange to close bpo-7442 in mutual agreement and now
    mention the word "bug" 50 times?

    We agreed to close the bug in 2014. In the meanwhile, more and more people reported the same bug (multiple similar bug reports and more and more frequent messages). So I decided to fix the bug instead of explaining to users that they must not do that :-)

    Also Marc-Andre does not consider this a bug in bpo-31900. The
    presentation of this issue is increasingly bizarre.

    Aha, maybe I misunderstood him when he wrote (msg309981):

    "Sounds like a good compromise :-)"

    --

    I'm not sure of what you are asking here. You are the maintainer of the decimal module. If you consider that it's not worth it, just close the issue. It's up to you ;-)

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2019

    Oh, I forgot to mention the context: I reported this issue as follow-up to discussions on bpo-35638: "Introduce fixed point locale aware format type for floating point numbers".

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 9, 2019

    Extract of Stefan Krah's msg333296 of bpo-35638:

    So why would you think that I'm not aware of that issue?

    Oh, I never said that.

    I wrote "By the way, the decimal module doesn't support properly the following corner case: LC_NUMERIC using an encoding different than LC_CTYPE encoding. I wrote #5191 but I abandonned my change." as a reminder for myself. I found again the bug when I wrote my article, and I realized that I abandoned my PR when I had my burnout, and not because the fix was "officially" rejected.

    So I opened this issue to get an official statement :-)

    It has low priority for me and I hesitate to depend on the official locale functions in decimal because I don't want to be involved in additional issue reports in that area.

    As I wrote in my PR, I'm not very happy of my proposed implementation. But let's discuss options to fix it :-)

    I don't understand "I don't want to be involved in additional issue reports in that area". If the bug is fixed, why do you expect more bug reports? You wrote that nobody reported any issue related to formatting a decimal number using the locale since 2009.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jan 9, 2019

    I mean issue reports like bpo-33954 or bpo-35195. These are just
    two examples, and I'm NOT claiming that they are related.

    But if functions like _PyUnicode_InsertThousandsGrouping()
    *were* used in _decimal, I'd feel compelled to investigate.

    Now I don't have to. I'd investigate bpo-35195 for example, perhaps
    just to find out that it has an entirely different cause.

    Sometimes not being dependent on API functions is a virtue if
    the code does not change very often.

    @rhettinger
    Copy link
    Contributor

    Sometimes not being dependent on API functions is a virtue if the code does not change very often.

    That is a reasonable position to take.

    @vstinner
    Copy link
    Member Author

    I'm no longer interested to rewrite my patch to avoid _Py_GetLocaleconvNumeric() which comes from the internal API, so I close my PR.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants