Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pygettext3.7 Does Not Recognize gettext Calls Within fstrings #80491

Closed
AllieFitter mannequin opened this issue Mar 16, 2019 · 8 comments
Closed

pygettext3.7 Does Not Recognize gettext Calls Within fstrings #80491

AllieFitter mannequin opened this issue Mar 16, 2019 · 8 comments
Labels
3.10 only security fixes type-bug An unexpected behavior, bug, or error

Comments

@AllieFitter
Copy link
Mannequin

AllieFitter mannequin commented Mar 16, 2019

BPO 36310
Nosy @ericvsmith, @abadger, @isidentical, @jack1142
PRs
  • bpo-36310: Allow pygettext.py to detect calls to gettext in f-strings. #19875
  • Files
  • f-string-gettext.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-11-09.22:55:09.217>
    created_at = <Date 2019-03-16.02:07:27.696>
    labels = ['type-bug', '3.10']
    title = 'pygettext3.7 Does Not Recognize gettext Calls Within fstrings'
    updated_at = <Date 2020-11-09.22:55:09.216>
    user = 'https://bugs.python.org/AllieFitter'

    bugs.python.org fields:

    activity = <Date 2020-11-09.22:55:09.216>
    actor = 'BTaskaya'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-11-09.22:55:09.217>
    closer = 'BTaskaya'
    components = ['Demos and Tools']
    creation = <Date 2019-03-16.02:07:27.696>
    creator = 'Allie Fitter'
    dependencies = []
    files = ['48504']
    hgrepos = []
    issue_num = 36310
    keywords = ['patch']
    message_count = 8.0
    messages = ['338049', '341984', '341988', '341991', '341993', '342031', '348417', '380622']
    nosy_count = 5.0
    nosy_names = ['eric.smith', 'a.badger', 'BTaskaya', 'Allie Fitter', 'jack1142']
    pr_nums = ['19875']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue36310'
    versions = ['Python 3.10']

    @AllieFitter
    Copy link
    Mannequin Author

    AllieFitter mannequin commented Mar 16, 2019

    pygettext can't see gettext functions calls when they're inside of an fstring:

    foo.py

        from gettext import gettext as _
        
        foo = f'{_("foo bar baz")}'

    Running pygettext3.7 -kgt -d message -D -v -o locales/message.pot foo.py results in:

    locale/message.pot
    # SOME DESCRIPTIVE TITLE.
    # Copyright (C) YEAR ORGANIZATION
    # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
    #
    msgid ""
    msgstr ""
    "Project-Id-Version: PACKAGE VERSION\n"
    "POT-Creation-Date: 2019-03-15 21:02-0500\n"
    "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
    "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
    "Language-Team: LANGUAGE <LL@li.org>\n"
    "MIME-Version: 1.0\n"
    "Content-Type: text/plain; charset=UTF-8\n"
    "Content-Transfer-Encoding: 8bit\n"
    "Generated-By: pygettext.py 1.5\n"

    Change foo.py to:

        from gettext import gettext as _
        
        foo = f'' + _("foo bar baz") + ''

    Results in:

    # SOME DESCRIPTIVE TITLE.
    # Copyright (C) YEAR ORGANIZATION
    # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
    #
    msgid ""
    msgstr ""
    "Project-Id-Version: PACKAGE VERSION\n"
    "POT-Creation-Date: 2019-03-15 21:05-0500\n"
    "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
    "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
    "Language-Team: LANGUAGE <LL@li.org>\n"
    "MIME-Version: 1.0\n"
    "Content-Type: text/plain; charset=UTF-8\n"
    "Content-Transfer-Encoding: 8bit\n"
    "Generated-By: pygettext.py 1.5\n"
    
    
    #: foo.py:3
    msgid "foo bar baz"
    msgstr ""
    

    Running on Ubuntu 18.04.

    @AllieFitter AllieFitter mannequin added 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Mar 16, 2019
    @abadger
    Copy link
    Mannequin

    abadger mannequin commented May 9, 2019

    Eric, I'm CC'ing you on this issue because I'm not sure if you've considered f-strings and gettext and figured out a way to make them work together. If you have, I can look into adding support for extracting the strings to pygettext but at the moment, I'm not sure if it's a style that we want to propogate or not.

    The heart of the problem is that the gettext function has to run before string interpolation occurs. With .format() and the other formatting methods in Python, this is achievable rather naturally. For instance:

        from gettext import gettext as _
    
        first = "foo"
        last = "baz"
        foo = _("{first}, bar, and {last}").format(**globals())

    will lead to the string first being gettext substituted like:

    "{first}, bar, y {last}"
    

    and then interpolated:

    "foo, bar, y baz"
    

    However, trying to do the same with f-strings translates more like this:

        foo = _(f"{first}, bar, and {last}") 
        foo = _("{first}, bar, and {last}".format(**globals()))  # This is the equivalent of the f-string

    So the interpolation happens first:

    "foo, bar, and baz"
    

    Then, when gettext substitution is tried, it won't be able to find the string it knows to look for ("{first}, bar, and {last}") so no translation will occur.

    Allie Fitter's code corrects this ordering problem but introduces other issues. Taking the sample string:

        foo = f'{_("{first}, bar, and {last}")}

    f-string interpolation runs first, but it sees that it has to invoke the _() function so the f-string machinery itself runs gettext:

    f'{"{first}, bar, y {last}"}'
    

    The machinery then simply returns that string so we end up with:

    '{first}, bar, y {last}'

    which is not quite right but can be fixed by nesting f-strings:

        foo = f'{_(f"{first}, bar, and {last}")}

    which results in:

    f'{f"{first}, bar, y {last}"}
    

    which results in:

    f'{"foo, bar, y baz"}'
    

    And finally:

    "foo, bar, y baz"
    

    So, that recipe works but is that what we want to tell people to do? It seems quite messy that we have to run the gettext function within the command and use nested f-strings so is there/should there be a different way to make this work?

    @ericvsmith
    Copy link
    Member

    Thanks for adding me, Toshio.

        foo = f'{_(f"{first}, bar, and {last}")}'

    Wow, that's extremely creative.

    I agree that this isn't the best we can do. PEP-501 has some ideas, but it might be too general purpose and powerful for this. Let me think about the nested f-string above and see if I can't think of a better way.

    As an aside, this code:

    foo = _("{first}, bar, and {last}").format(**globals())

    Is better written with format_map():

    foo = _("{first}, bar, and {last}").format_map(globals())

    It does not create a new dict like the ** version does.

    @AllieFitter
    Copy link
    Mannequin Author

    AllieFitter mannequin commented May 9, 2019

    Just as context, my use case for this is interpolating translated strings into HTML.

        html = f'''\
        <h1>{_("Some Title")}</h1>
        <p>{_("Some longer text")}</p>
        '''

    @ericvsmith
    Copy link
    Member

    I was going to say "use eval()", but maybe we need some sort of "eval_fstring()" that basically only understood f-strings and produced a callable that captured all of the variables (like a closure), maybe that would help.

    @ericvsmith
    Copy link
    Member

    Of course, this wouldn't be any safer than eval'ing arbitrary user provided code.

    @ericvsmith
    Copy link
    Member

    I've put some more thought in to this, and this is the best I can come up with, using today's Python.

    The basic idea is that you have a function _f(), which takes a normal (non-f) string. It does a lookup to find the translated string (again, a non-fstring), turns that into an f-string, then compiles it and returns the code object. Then the caller evals the returned code object to convert it to a string.

    The ugly part, of course, is the eval. You can't just say:
    _f("{val}")
    you have to say:
    eval(_f("{val}"))
    You can't reduce this to a single function call: the eval() has to take place right here. It is possible to play games with stack frames, but that doesn't always work (see PEP-498 for details, where it talks about locals() and globals(), which is part of the same problem).

    But I don't see much choice. Since a translated f-string can do anything (like f'{subprocess.run("script to rm all files")'), I'm not sure it's the eval that's the worst thing here. The translated text absolutely has to be trusted: that's the worst thing. Even an eval_fstring(), that only understood how to exec code objects that are f-strings, would still be exposed to arbitrary expressions and side effects in the translated strings.

    The advantage of compiling it and caching is that you get most of the performance advantages of f-strings, after the first time a string is used. The code generation still has to happen, though. It's just the parsing that's being saved. I can't say how significant that is.

    See the sample code in the attached file.

    @isidentical
    Copy link
    Sponsor Member

    New changeset bfc6b63 by jack1142 in branch 'master':
    bpo-36310: Allow pygettext.py to detect calls to gettext in f-strings. (GH-19875)
    bfc6b63

    @isidentical isidentical added 3.10 only security fixes and removed 3.7 (EOL) end of life labels Nov 9, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants