New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gettext: GNUTranslations doesn't parse properly comments in description #80420
Comments
When a translation .po file contains a comment in headers, it's kept when compiled as .mo by msgfmt. Example with test.po: Compile it with "msgfmt". Parse the output file messages.mo using test.py script: import gettext, pprint
with open("messages.mo", "rb") as fp:
t = gettext.GNUTranslations()
t._parse(fp)
pprint.pprint(t._info) Output on Python 3.7.2: Output of Fedora Python 2.7.15 which contains a fix: I'm not sure that keeping the comment as part of plural forms is correct. Comments should not be ignored? I made my test on Fedora 29: msgfmt 0.19.8.1, Python 3.7.2. Links:
Fedora has a patch since 2007 to ignore comments: I can easily convert the patch to a PR, maybe with a test. The question is more if the fix is correct or not. |
Attached files:
|
After some research I found a few comments around comments being marked as starting by #-#-#-#-# and ending with #-#-#-#-#, not just starting with #. In gettext-0.19.8.1 sources for example: $ grep -r '#-#-#-#-' | head
gettext-tools/misc/po-mode.el:#-#-#-#-# file name reference #-#-#-#-#
gettext-tools/misc/po-mode.el: (let* ((marker-regex "^#-#-#-#-# \\(.*\\) #-#-#-#-#\n")
gettext-tools/src/msgl-cat.c: char *id = xasprintf ("#-#-#-#-# %s #-#-#-#-#", Or more precisly in # Verify msgcat of two files, when the header entries have different comments I'm however surprised not to find much of "#-#-#-#-#" in the source code, like if they are just looking a single # like you do here. Not sure which one is the better, eliminating lines with a pair of #-#-#-#-# or lines starting with a #, both looks OK to me (we're only speaking about the header here, not the msgstr, so it won't have much impact). Personally I'd go for eliminating #-#-#-#-# as this is the only case we've seen, and is the "documented" one in the GNU gettext test cases. |
I found a .po file with "#" in headers on the Internet, Sympa mailing list project: # #-#-#-#-# blank_web_help_et.po (sympa) #-#-#-#-# # #-#-#-#-# tmp_web_help_et.po (et) #-#-#-#-# #, fuzzy They are 2 headers starting with >"#-#-#-#-# < and ending with > #-#-#-#-#\n"<. |
I hacked gettext.py to parse all files of my system. I found 3 .mo files which contain "#" in headers: /usr/share/locale/fa/LC_MESSAGES/digikam.mo: {'content-transfer-encoding': '8bit\n' /usr/share/locale/ia/LC_MESSAGES/akonadicontact5-serializer.mo: {'content-transfer-encoding': '8bit\n' /usr/share/locale/ml/LC_MESSAGES/ktraderclient5.mo: {'content-transfer-encoding': '8bit', |
The 'last-translator': '# ANI PETER|അനി പീറ്റര്\u200d <peter.ani@gmail.com>', case does not looks like an issue, it does *not* starts with #, it's in the middle of the line, the line starts with "Last-Translator". |
/usr/share/locale/fa/LC_MESSAGES/digikam.mo: I downloaded the .po file using: svn cat svn://anonsvn.kde.org/home/kde/trunk/l10n-kf5/fa/messages/extragear-graphics/digikam.po > fa_digikam.po It contains many comments in headers. Extract: (...) |
/usr/share/locale/ml/LC_MESSAGES/ktraderclient5.mo: svn cat svn://anonsvn.kde.org/home/kde/trunk/l10n-kf5/ml/messages/kde-workspace/ktraderclient5.po > ml_ktraderclient5.po Extract: msgid "" |
That's literally sick þ Looks like we have to trust the "\n", not the file wrapping, but this means that: msgstr "" is valid, too? I have to try it! HAHA it is: $ cat ~/clones/python-docs-fr/glossary.po | head -n 20
# Copyright (C) 2001-2018, Python Software Foundation
# For licence information, see README file.
#
msgid ""
msgstr ""
"Pr"
"oj"
"ec"
"t-"
"Id"
"-V"
"er"
"si"
"on"
":"
" P"
"ython 3.6\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-12-21 09:48+0100\n"
"PO-Revision-Date: 2019-03-08 14:48+0100\n"
$ msgcat ~/clones/python-docs-fr/glossary.po | head -n 20
# Copyright (C) 2001-2018, Python Software Foundation
# For licence information, see README file.
#
msgid ""
msgstr ""
"Project-Id-Version: Python 3.6\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-12-21 09:48+0100\n"
"PO-Revision-Date: 2019-03-08 14:48+0100\n"
"Last-Translator: Jules Lasne <jules.lasne@gmail.com>\n"
"Language-Team: FRENCH <traductions@lists.afpy.org>\n"
"Language: fr\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Poedit 2.0.2\n"
"# Pouette\n" |
I tested further, and when we have this horrible mess in the po files: msgstr "" We have a clean string in the .mo file. So there is no fear to have of: "Plural-Forms: nplurals=1; plural=0;\n" It will be nicely stored in the mo as: Plural-Forms: nplurals=1; plural=0; So you can safely remove lines starting and ending with #-#-#-#-#. |
Julien: Why not fixing Python 3.7? You approved #13218 (Python 3.7 backport) but then you closed it. Only Azure Pipelines PR failed on "ERROR: test_drain_raises (test.test_asyncio.test_streams.StreamTests)" which is unrelated. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: