classification
Title: pygettext3.7 Does Not Recognize gettext Calls Within fstrings
Type: behavior Stage: resolved
Components: Demos and Tools Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Allie Fitter, BTaskaya, a.badger, eric.smith, jack1142
Priority: normal Keywords: patch

Created on 2019-03-16 02:07 by Allie Fitter, last changed 2020-11-09 22:55 by BTaskaya. This issue is now closed.

Files
File name Uploaded Description Edit
f-string-gettext.py eric.smith, 2019-07-24 23:43
Pull Requests
URL Status Linked Edit
PR 19875 merged jack1142, 2020-05-03 01:24
Messages (8)
msg338049 - (view) Author: Allie Fitter (Allie Fitter) Date: 2019-03-16 02:07
pygettext can't see gettext functions calls when they're inside of an fstring:

foo.py

    from gettext import gettext as _
    
    foo = f'{_("foo bar baz")}'

Running `pygettext3.7 -kgt -d message -D -v -o locales/message.pot foo.py` results in:

locale/message.pot
    # SOME DESCRIPTIVE TITLE.
    # Copyright (C) YEAR ORGANIZATION
    # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
    #
    msgid ""
    msgstr ""
    "Project-Id-Version: PACKAGE VERSION\n"
    "POT-Creation-Date: 2019-03-15 21:02-0500\n"
    "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
    "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
    "Language-Team: LANGUAGE <LL@li.org>\n"
    "MIME-Version: 1.0\n"
    "Content-Type: text/plain; charset=UTF-8\n"
    "Content-Transfer-Encoding: 8bit\n"
    "Generated-By: pygettext.py 1.5\n"

Change foo.py to:

    from gettext import gettext as _
    
    foo = f'' + _("foo bar baz") + ''


Results in:

    # SOME DESCRIPTIVE TITLE.
    # Copyright (C) YEAR ORGANIZATION
    # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
    #
    msgid ""
    msgstr ""
    "Project-Id-Version: PACKAGE VERSION\n"
    "POT-Creation-Date: 2019-03-15 21:05-0500\n"
    "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
    "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
    "Language-Team: LANGUAGE <LL@li.org>\n"
    "MIME-Version: 1.0\n"
    "Content-Type: text/plain; charset=UTF-8\n"
    "Content-Transfer-Encoding: 8bit\n"
    "Generated-By: pygettext.py 1.5\n"
    
    
    #: foo.py:3
    msgid "foo bar baz"
    msgstr ""


Running on Ubuntu 18.04.
msg341984 - (view) Author: Toshio Kuratomi (a.badger) * Date: 2019-05-09 15:31
Eric, I'm CC'ing you on this issue because I'm not sure if you've considered f-strings and gettext and figured out a way to make them work together.  If you have, I can look into adding support for extracting the strings to pygettext but at the moment, I'm not sure if it's a style that we want to propogate or not.

The heart of the problem is that the gettext function has to run before string interpolation occurs.  With .format() and the other formatting methods in Python, this is achievable rather naturally.  For instance:

    from gettext import gettext as _

    first = "foo"
    last = "baz"
    foo = _("{first}, bar, and {last}").format(**globals())

will lead to the string first being gettext substituted like:

    "{first}, bar, y {last}"

and then interpolated:

    "foo, bar, y baz"

However, trying to do the same with f-strings translates more like this:

    foo = _(f"{first}, bar, and {last}") 
    foo = _("{first}, bar, and {last}".format(**globals()))  # This is the equivalent of the f-string

So the interpolation happens first:

    "foo, bar, and baz"

Then, when gettext substitution is tried, it won't be able to find the string it knows to look for ("{first}, bar, and {last}")  so no translation will occur.

Allie Fitter's code corrects this ordering problem but introduces other issues.  Taking the sample string:

    foo = f'{_("{first}, bar, and {last}")}

f-string interpolation runs first, but it sees that it has to invoke the _() function so the f-string machinery itself runs gettext:

    f'{"{first}, bar, y {last}"}'

The machinery then simply returns that string so we end up with:

   '{first}, bar, y {last}'

which is not quite right but can be fixed by nesting f-strings:

    foo = f'{_(f"{first}, bar, and {last}")}

which results in:

    f'{f"{first}, bar, y {last}"}

which results in:

    f'{"foo, bar, y baz"}'

And finally:

    "foo, bar, y baz"

So, that recipe works but is that what we want to tell people to do?  It seems quite messy that we have to run the gettext function within the command and use nested f-strings so is there/should there be a different way to make this work?
msg341988 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-05-09 17:44
Thanks for adding me, Toshio.

    foo = f'{_(f"{first}, bar, and {last}")}'

Wow, that's extremely creative.

I agree that this isn't the best we can do. PEP 501 has some ideas, but it might be too general purpose and powerful for this. Let me think about the nested f-string above and see if I can't think of a better way.

As an aside, this code:

foo = _("{first}, bar, and {last}").format(**globals())

Is better written with format_map():

foo = _("{first}, bar, and {last}").format_map(globals())

It does not create a new dict like the ** version does.
msg341991 - (view) Author: Allie Fitter (Allie Fitter) Date: 2019-05-09 18:22
Just as context, my use case for this is interpolating translated strings into HTML.

    html = f'''\
    <h1>{_("Some Title")}</h1>
    <p>{_("Some longer text")}</p>
    '''
msg341993 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-05-09 18:33
I was going to say "use eval()", but maybe we need some sort of "eval_fstring()" that basically only understood f-strings and produced a callable that captured all of the variables (like a closure), maybe that would help.
msg342031 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-05-10 01:25
Of course, this wouldn't be any safer than eval'ing arbitrary user provided code.
msg348417 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-07-24 23:43
I've put some more thought in to this, and this is the best I can come up with, using today's Python.

The basic idea is that you have a function _f(), which takes a normal (non-f) string. It does a lookup to find the translated string (again, a non-fstring), turns that into an f-string, then compiles it and returns the code object. Then the caller evals the returned code object to convert it to a string.

The ugly part, of course, is the eval. You can't just say:
_f("{val}")
you have to say:
eval(_f("{val}"))
You can't reduce this to a single function call: the eval() has to take place right here. It is possible to play games with stack frames, but that doesn't always work (see PEP 498 for details, where it talks about locals() and globals(), which is part of the same problem).

But I don't see much choice. Since a translated f-string can do anything (like f'{subprocess.run("script to rm all files")'), I'm not sure it's the eval that's the worst thing here. The translated text absolutely has to be trusted: that's the worst thing. Even an eval_fstring(), that only understood how to exec code objects that are f-strings, would still be exposed to arbitrary expressions and side effects in the translated strings.

The advantage of compiling it and caching is that you get most of the performance advantages of f-strings, after the first time a string is used. The code generation still has to happen, though. It's just the parsing that's being saved. I can't say how significant that is.

See the sample code in the attached file.
msg380622 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-11-09 22:50
New changeset bfc6b63102d37ccb58a71711e2342143cd9f4d86 by jack1142 in branch 'master':
bpo-36310: Allow pygettext.py to detect calls to gettext in f-strings. (GH-19875)
https://github.com/python/cpython/commit/bfc6b63102d37ccb58a71711e2342143cd9f4d86
History
Date User Action Args
2020-11-09 22:55:09BTaskayasetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.10, - Python 3.7
2020-11-09 22:50:54BTaskayasetnosy: + BTaskaya
messages: + msg380622
2020-05-03 01:24:21jack1142setkeywords: + patch
stage: patch review
pull_requests: + pull_request19186
2020-05-02 02:20:01jack1142setnosy: + jack1142
2019-07-24 23:43:49eric.smithsetfiles: + f-string-gettext.py

messages: + msg348417
2019-05-10 01:25:22eric.smithsetmessages: + msg342031
2019-05-09 18:33:01eric.smithsetmessages: + msg341993
2019-05-09 18:22:59Allie Fittersetmessages: + msg341991
2019-05-09 17:44:22eric.smithsetmessages: + msg341988
2019-05-09 15:31:37a.badgersetnosy: + eric.smith, a.badger
messages: + msg341984
2019-03-16 02:07:27Allie Fittercreate