Title: pygettext3.7 Does Not Recognize gettext Calls Within fstrings
Messages (8)
Author: Allie Fitter (Allie Fitter) Date: 2019-03-16 02:07
pygettext can't see gettext functions calls when they're inside of an fstring:

    from gettext import gettext as _
    foo = f'{_("foo bar baz")}'

Running `pygettext3.7 -kgt -d message -D -v -o locales/message.pot` results in:

    # Copyright (C) YEAR ORGANIZATION
    msgid ""
    msgstr ""
    "Project-Id-Version: PACKAGE VERSION\n"
    "POT-Creation-Date: 2019-03-15 21:02-0500\n"
    "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
    "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
    "Language-Team: LANGUAGE <>\n"
    "MIME-Version: 1.0\n"
    "Content-Type: text/plain; charset=UTF-8\n"
    "Content-Transfer-Encoding: 8bit\n"
    "Generated-By: 1.5\n"

Change to:

    from gettext import gettext as _
    foo = f'' + _("foo bar baz") + ''

Results in:

    # Copyright (C) YEAR ORGANIZATION
    msgid ""
    msgstr ""
    "Project-Id-Version: PACKAGE VERSION\n"
    "POT-Creation-Date: 2019-03-15 21:05-0500\n"
    "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
    "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
    "Language-Team: LANGUAGE <>\n"
    "MIME-Version: 1.0\n"
    "Content-Type: text/plain; charset=UTF-8\n"
    "Content-Transfer-Encoding: 8bit\n"
    "Generated-By: 1.5\n"
    msgid "foo bar baz"
    msgstr ""

Running on Ubuntu 18.04.
Author: Toshio Kuratomi (a.badger) Date: 2019-05-09 15:31
Eric, I'm CC'ing you on this issue because I'm not sure if you've considered f-strings and gettext and figured out a way to make them work together.  If you have, I can look into adding support for extracting the strings to pygettext but at the moment, I'm not sure if it's a style that we want to propogate or not.

The heart of the problem is that the gettext function has to run before string interpolation occurs.  With .format() and the other formatting methods in Python, this is achievable rather naturally.  For instance:

    from gettext import gettext as _

    first = "foo"
    last = "baz"
    foo = _("{first}, bar, and {last}").format(**globals())

will lead to the string first being gettext substituted like:

    "{first}, bar, y {last}"

and then interpolated:

    "foo, bar, y baz"

However, trying to do the same with f-strings translates more like this:

    foo = _(f"{first}, bar, and {last}") 
    foo = _("{first}, bar, and {last}".format(**globals()))  # This is the equivalent of the f-string

So the interpolation happens first:

    "foo, bar, and baz"

Then, when gettext substitution is tried, it won't be able to find the string it knows to look for ("{first}, bar, and {last}")  so no translation will occur.

Allie Fitter's code corrects this ordering problem but introduces other issues.  Taking the sample string:

    foo = f'{_("{first}, bar, and {last}")}

f-string interpolation runs first, but it sees that it has to invoke the _() function so the f-string machinery itself runs gettext:

    f'{"{first}, bar, y {last}"}'

The machinery then simply returns that string so we end up with:

   '{first}, bar, y {last}'

which is not quite right but can be fixed by nesting f-strings:

    foo = f'{_(f"{first}, bar, and {last}")}

which results in:

    f'{f"{first}, bar, y {last}"}

which results in:

    f'{"foo, bar, y baz"}'

And finally:

    "foo, bar, y baz"

So, that recipe works but is that what we want to tell people to do?  It seems quite messy that we have to run the gettext function within the command and use nested f-strings so is there/should there be a different way to make this work?
Author: Eric V. Smith (eric.smith) Date: 2019-05-09 17:44
Thanks for adding me, Toshio.

    foo = f'{_(f"{first}, bar, and {last}")}'

Wow, that's extremely creative.

I agree that this isn't the best we can do. PEP 501 has some ideas, but it might be too general purpose and powerful for this. Let me think about the nested f-string above and see if I can't think of a better way.

As an aside, this code:

foo = _("{first}, bar, and {last}").format(**globals())

Is better written with format_map():

foo = _("{first}, bar, and {last}").format_map(globals())

It does not create a new dict like the ** version does.
Author: Allie Fitter (Allie Fitter) Date: 2019-05-09 18:22
Just as context, my use case for this is interpolating translated strings into HTML.

    html = f'''\
    <h1>{_("Some Title")}</h1>
    <p>{_("Some longer text")}</p>
Author: Eric V. Smith (eric.smith) Date: 2019-05-09 18:33
I was going to say "use eval()", but maybe we need some sort of "eval_fstring()" that basically only understood f-strings and produced a callable that captured all of the variables (like a closure), maybe that would help.
Author: Eric V. Smith (eric.smith) Date: 2019-05-10 01:25
Of course, this wouldn't be any safer than eval'ing arbitrary user provided code.
Author: Eric V. Smith (eric.smith) Date: 2019-07-24 23:43
I've put some more thought in to this, and this is the best I can come up with, using today's Python.

The basic idea is that you have a function _f(), which takes a normal (non-f) string. It does a lookup to find the translated string (again, a non-fstring), turns that into an f-string, then compiles it and returns the code object. Then the caller evals the returned code object to convert it to a string.

The ugly part, of course, is the eval. You can't just say:
you have to say:
You can't reduce this to a single function call: the eval() has to take place right here. It is possible to play games with stack frames, but that doesn't always work (see PEP 498 for details, where it talks about locals() and globals(), which is part of the same problem).

But I don't see much choice. Since a translated f-string can do anything (like f'{"script to rm all files")'), I'm not sure it's the eval that's the worst thing here. The translated text absolutely has to be trusted: that's the worst thing. Even an eval_fstring(), that only understood how to exec code objects that are f-strings, would still be exposed to arbitrary expressions and side effects in the translated strings.

The advantage of compiling it and caching is that you get most of the performance advantages of f-strings, after the first time a string is used. The code generation still has to happen, though. It's just the parsing that's being saved. I can't say how significant that is.

See the sample code in the attached file.
Author: Batuhan Taskaya (BTaskaya) Date: 2020-11-09 22:50
New changeset bfc6b63102d37ccb58a71711e2342143cd9f4d86 by jack1142 in branch 'master':
bpo-36310: Allow to detect calls to gettext in f-strings. (GH-19875)
