classification
Title: Make il8n tools available from `python -m`
Type: Stage:
Components: Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: a.badger, barry, bbkane
Priority: normal Keywords:

Created on 2019-05-07 16:23 by bbkane, last changed 2019-05-11 15:26 by a.badger.

Messages (5)
msg341771 - (view) Author: Benjamin Kane (bbkane) * Date: 2019-05-07 16:23
Localizing a Python application involves using the `gettext` standard library module to read .mo files. There are three scripts to assist with this in https://github.com/bbkane/cpython/tree/master/Tools/i18n :
- makelocalealias.py : Convert the X11 locale.alias file into a mapping dictionary suitable for locale.py.
- msgfmt.py : Generate binary message catalog from textual translation description
- pygettext.py : Generate .pot files identical to what GNU xgettext[2]
generates for C and C++ code (these can be translated by msgfmt.py)

I recently wrote a tutorial to localize a Python Script ( https://github.com/bbkane/arcade/blob/bbkane/add_localization_example/doc/examples/text_loc_example.rst ) and I had to tell my users (a student audience) to download these scripts from GitHub. I would have been much happier to ask them to use a builtin Python tool available from the `-m` switch (similar to `python -m json.tool`), so this issue is to add that.

The docs ( https://docs.python.org/3/library/gettext.html#internationalizing-your-programs-and-modules ) mention these scripts, but do not provide any information on how to get them.

Possible solutions:

- turn gettext.py into a package and put these scripts into a tool subpackage (similar to json.tool)
- Add a separate package (il8n perhaps) and put these scripts into there
- Add links to these scripts and instructions to use them in the docs.
msg341876 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-05-08 14:24
One other suggestion: put the bulk of Tools/i18n/pygettext.py into Lib/_pygettext.py, then import its main() in both Lib/gettext.py and Tools/i18n/pygettext.py.  Then just call that main().
msg341999 - (view) Author: Toshio Kuratomi (a.badger) * Date: 2019-05-09 21:47
Note, I've been doing some tests of how our gettext module differs from GNU gettext and run into a few bugs and lack of features which make msgfmt unusable and limit pygettext's usefulness.

* msgfmt doesn't seem to store the charset from the .po file into the .mo file.  I think this might have been okay for the lgettext() and gettext() methods under Python2 as those probably passed the byte strings from the .mo files through verbatim.  Under Python3, however, we have to decode the byte strings to text and we can't do that without knowing the charset.  This leads to a UnicodeDecodeError on any .mo file which contains non-ascii characters (which is going to be the majority of them)

* So far, I have found that pygettext doesn't understand how to extract strings from ngettext().  This means that your code can't use plural forms if you want to use pygettext to extract the strings.

These deficiencies are probably things that need to be fixed if we're going to continue to promote these tools in the documentation.
msg342005 - (view) Author: Toshio Kuratomi (a.badger) * Date: 2019-05-09 23:09
A note about the msgfmt problem.  It looks like GNU gettext's msgfmt has a similar problem but the msgfmt from pybabel does not.  This may mean that we need to change the gettext *Translation objects to be more tolerant of non-ascii encodings (perhaps defaulting to utf-8 instead of ascii).
msg342198 - (view) Author: Toshio Kuratomi (a.badger) * Date: 2019-05-11 15:26
Scratch what I said in https://bugs.python.org/issue36837?@ok_message=msg%20342005%20created%0Aissue%2036837%20message_count%2C%20messages%20edited%20ok&@template=item#msg342005

GNU msgfmt does extract the charset correctly.  (My previous test failed to write any output so it was using the .mo file I had written out with msgfmt.py.  I realized that this morning when I figured out why my C test program wasn't finding any message catalog.

For reference the three ways to extract strings with the three tools are:
* pygettext.py test.py
* pybabel extract -o messages.pot test.py
* xgettext test.py -o messages.pot test.py

and the three ways to generate catalogs via the three tools are:
* msgfmt3.7.py  es_MX/LC_MESSAGES/domain.po
* msgfmt es_MX/LC_MESSAGES/testc.po -o es_MX/LC_MESSAGES/testc.mo
* pybabel compile -D test -d . [--use-fuzzy]
History
Date User Action Args
2019-05-11 15:26:52a.badgersetmessages: + msg342198
2019-05-09 23:09:26a.badgersetmessages: + msg342005
2019-05-09 21:47:11a.badgersetnosy: + a.badger
messages: + msg341999
2019-05-08 14:24:28barrysetnosy: + barry
messages: + msg341876
2019-05-07 16:23:26bbkanecreate