classification
Title: gettext: GNUTranslations doesn't parse properly comments in description
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: mdk, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2019-03-08 13:58 by vstinner, last changed 2019-05-09 22:24 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
parse.py vstinner, 2019-03-08 13:58
comments.po vstinner, 2019-03-08 13:58
messages.mo vstinner, 2019-03-08 13:59
Pull Requests
URL Status Linked Edit
PR 12255 merged mdk, 2019-03-09 22:53
PR 13218 closed miss-islington, 2019-05-09 14:23
Messages (12)
msg337476 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-08 13:58
When a translation .po file contains a comment in headers, it's kept when compiled as .mo by msgfmt.

Example with test.po:
---
msgid ""
msgstr ""
"Content-Type: text/plain; charset=UTF-8\n"
"Plural-Forms:  nplurals=2; plural=(n != 1);\n"
"#-#-#-#-#  plo.po (PACKAGE VERSION)  #-#-#-#-#\n"
---

Compile it with "msgfmt". Parse the output file messages.mo using test.py script:
---
import gettext, pprint
with open("messages.mo", "rb") as fp:
    t = gettext.GNUTranslations()
    t._parse(fp)
    pprint.pprint(t._info)
---

Output on Python 3.7.2:
---
{'content-type': 'text/plain; charset=UTF-8',
 'plural-forms': 'nplurals=2; plural=(n != 1);\n'
                 '#-#-#-#-#  plo.po (PACKAGE VERSION)  #-#-#-#-#'}
---

Output of Fedora Python 2.7.15 which contains a fix:
---
{'content-type': 'text/plain; charset=UTF-8',
 'plural-forms': 'nplurals=2; plural=(n != 1);'}
---

I'm not sure that keeping the comment as part of plural forms is correct. Comments should not be ignored?

I made my test on Fedora 29: msgfmt 0.19.8.1, Python 3.7.2.

Links:

* https://bugs.python.org/issue1448060#msg27754
* https://bugs.python.org/issue1475523
* https://bugzilla.redhat.com/show_bug.cgi?id=252136

Fedora has a patch since 2007 to ignore comments:
https://src.fedoraproject.org/rpms/python2/blob/master/f/python-2.5.1-plural-fix.patch

I can easily convert the patch to a PR, maybe with a test. The question is more if the fix is correct or not.
msg337477 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-08 13:59
Attached files:

* comments.po: PO file with a comment in headers
* messages.mo: comments.po compiled with msgfmt
* parse.py: Python script to parse messages.mo
msg337486 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2019-03-08 14:43
After some research I found a few comments around comments being marked as starting by #-#-#-#-# and ending with #-#-#-#-#, not just starting with #.

In gettext-0.19.8.1 sources for example:

$ grep -r '#-#-#-#-' | head
gettext-tools/misc/po-mode.el:#-#-#-#-#  file name reference  #-#-#-#-#
gettext-tools/misc/po-mode.el:  (let* ((marker-regex "^#-#-#-#-#  \\(.*\\)  #-#-#-#-#\n")
gettext-tools/src/msgl-cat.c:                  char *id = xasprintf ("#-#-#-#-#  %s  #-#-#-#-#",

Or more precisly in `gettext-tools/tests/msgcat-10`:

# Verify msgcat of two files, when the header entries have different comments
# but the same contents. The resulting header entry is not marked fuzzy,
# because the #-#-#-#-# are only in comments and do not necessarily require
# translator attention; in other words, an msgstr which is valid in both input
# files is also valid in the result.

I'm however surprised not to find much of "#-#-#-#-#" in the source code, like if they are just looking a single # like you do here.

Not sure which one is the better, eliminating lines with a pair of #-#-#-#-# or lines starting with a #, both looks OK to me (we're only speaking about the header here, not the msgstr, so it won't have much impact).

Personally I'd go for eliminating #-#-#-#-# as this is the only case we've seen, and is the "documented" one in the GNU gettext test cases.
msg337490 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-08 15:20
I found a .po file with "#" in headers on the Internet, Sympa mailing list project:
https://www.sympa.org/distribution/sympa-6.0.10/po-wwsympa/et.po:

# #-#-#-#-#  blank_web_help_et.po (sympa)  #-#-#-#-#
# Sympa online help internationalisation.
# Copyright (C) 2007
# This file is distributed under the same license as Sympa.
# FIRST AUTHOR <david.verdin@cru.fr>, 2007.
#
# #-#-#-#-#  tmp_web_help_et.po (et)  #-#-#-#-#
# translation of et.po to 
# translation of et.po to
# #-#-#-#-#  et.po (PACKAGE VERSION)  #-#-#-#-#
# Copyright (C) 2005 Free Software Foundation, Inc.
# #-#-#-#-#  et.po (PACKAGE VERSION)  #-#-#-#-#
# #-#-#-#-#  et.po (PACKAGE VERSION)  #-#-#-#-#
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL>, YEAR.
# Copyright (C) YEAR Free Software Foundation, Inc.
# FIRST AUTHOR <EMAIL>, YEAR.#.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER.
# root <root@vykk.vil.ee>, 2005.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: et\n"
"POT-Creation-Date: 2007-11-13 14:50+0200\n"
"PO-Revision-Date: 2007-10-22 00:03+0200\n"
"Last-Translator: Alar Sing <alar.sing@etv.ee>\n"
"Language-Team: Estonian\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"#-#-#-#-#  blank_web_help_et.po (sympa)  #-#-#-#-#\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
"#-#-#-#-#  tmp_web_help_et.po (et)  #-#-#-#-#\n"
"X-Generator: Pootle 1.0.2\n"

They are 2 headers starting with >"#-#-#-#-# < and ending with >  #-#-#-#-#\n"<.
msg337491 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-08 15:20
I hacked gettext.py to parse all files of my system. I found 3 .mo files which contain "#" in headers:

/usr/share/locale/fa/LC_MESSAGES/digikam.mo:

{'content-transfer-encoding': '8bit\n'
                              '#-#-#-#-#  digikamimageplugin_channelmixer.po '
                              '(digikamimageplugin_channelmixer)  #-#-#-#-#',
 'content-type': 'text/plain; charset=UTF-8',
 'language': 'fa',
 'language-team': 'Farsi (Persian) <>',
 'last-translator': 'Mohammad Reza Mirdamadi <mohi@ubuntu.ir>',
 'mime-version': '1.0',
 'plural-forms': 'nplurals=1; plural=0;',
 'po-revision-date': '2012-01-13 15:00+0330',
 'pot-creation-date': '2018-03-18 03:11+0100',
 'project-id-version': 'digikam',
 'report-msgid-bugs-to': 'http://bugs.kde.org',
 'x-generator': 'KBabel 1.11.4'}

/usr/share/locale/ia/LC_MESSAGES/akonadicontact5-serializer.mo:

{'content-transfer-encoding': '8bit\n'
                              '#-#-#-#-#  akonadi_kalarm_resource.po  '
                              '#-#-#-#-#',
 'content-type': 'text/plain; charset=UTF-8',
 'language': 'ia',
 'language-team': 'Interlingua <kde-i18n-it@kde.org>',
 'last-translator': 'g.sora <g.sora@tiscali.it>',
 'mime-version': '1.0',
 'plural-forms': 'nplurals=2; plural=n != 1;',
 'po-revision-date': '2011-11-29 19:38+0100',
 'pot-creation-date': '2018-11-12 06:56+0100',
 'project-id-version': '',
 'report-msgid-bugs-to': 'http://bugs.kde.org',
 'x-generator': 'Lokalize 1.2'}

/usr/share/locale/ml/LC_MESSAGES/ktraderclient5.mo:

{'content-transfer-encoding': '8bit',
 'content-type': 'text/plain; charset=UTF-8',
 'language': 'ml',
 'language-team': 'Swathanthra|സ്വതന്ത്ര Malayalam|മലയാളം '
                  'Computing|കമ്പ്യൂട്ടിങ്ങ് <smc-discuss@googlegroups.com>',
 'last-translator': '# ANI PETER|അനി പീറ്റര്\u200d <peter.ani@gmail.com>',
 'mime-version': '1.0',
 'plural-forms': 'nplurals=2; plural=(n != 1);',
 'po-revision-date': '2008-07-10 22:04+0530',
 'pot-creation-date': '2018-09-14 06:47+0200',
 'project-id-version': 'ktraderclient',
 'report-msgid-bugs-to': 'http://bugs.kde.org',
 'x-generator': 'KBabel 1.11.4'}
msg337492 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2019-03-08 15:27
The

 'last-translator': '# ANI PETER|അനി പീറ്റര്\u200d <peter.ani@gmail.com>',

case does not looks like an issue, it does *not* starts with #, it's in the middle of the line, the line starts with "Last-Translator".
msg337493 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-08 15:30
/usr/share/locale/fa/LC_MESSAGES/digikam.mo:

I downloaded the .po file using:

svn cat svn://anonsvn.kde.org/home/kde/trunk/l10n-kf5/fa/messages/extragear-graphics/digikam.po > fa_digikam.po

It contains many comments in headers. Extract:

(...)
# MaryamSadat Razavi <razavi@itland.ir>, 2007.
# Nasim Daniarzadeh <daniarzadeh@itland.ir>, 2007.
# Nazanin Kazemi <kazemi@itland.ir>, 2007.
# Mohammad Reza Mirdamadi <mohi@ubuntu.ir>, 2011, 2012.
msgid ""
msgstr ""
"Project-Id-Version: digikam\n"
"Report-Msgid-Bugs-To: http://bugs.kde.org\n"
"POT-Creation-Date: 2019-03-08 03:08+0100\n"
"PO-Revision-Date: 2012-01-13 15:00+0330\n"
"Last-Translator: Mohammad Reza Mirdamadi <mohi@ubuntu.ir>\n"
"Language-Team: Farsi (Persian) <>\n"
"Language: fa\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"#-#-#-#-#  digikamimageplugin_channelmixer.po "
"(digikamimageplugin_channelmixer)  #-#-#-#-#\n"
"X-Generator: Lokalize 1.2\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"#-#-#-#-#  digikamimageplugin_refocus.po (digikamimageplugin_refocus)  #-#-#-"
"#-#\n"
"X-Generator: KBabel 1.11.4\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"#-#-#-#-#  digikamimageplugin_oilpaint.po (digikamimageplugin_oilpaint)  #-#-"
"#-#-#\n"
"X-Generator: KBabel 1.11.4\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"#-#-#-#-#  digikamimageplugin_perspective.po "
"(digikamimageplugin_perspective)  #-#-#-#-#\n"
"X-Generator: KBabel 1.11.4\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"#-#-#-#-#  digikamimageplugin_freerotation.po "
"(digikamimageplugin_freerotation)  #-#-#-#-#\n"
"X-Generator: KBabel 1.11.4\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"#-#-#-#-#  digikamimageplugins.po (digikamimageplugins)  #-#-#-#-#\n"
"X-Generator: KBabel 1.11.4\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"#-#-#-#-#  digikamimageplugin_raindrop.po (digikamimageplugin_raindrop)  #-#-"
"#-#-#\n"
"X-Generator: KBabel 1.11.4\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"#-#-#-#-#  digikamimageplugin_blowup.po (digikamimageplugin_blowup)  #-#-#-#-"
"#\n"
"X-Generator: KBabel 1.11.4\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"#-#-#-#-#  digikamimageplugin_charcoal.po (digikamimageplugin_charcoal)  #-#-"
"#-#-#\n"
(...)
msg337494 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-08 15:38
/usr/share/locale/ml/LC_MESSAGES/ktraderclient5.mo:

svn cat svn://anonsvn.kde.org/home/kde/trunk/l10n-kf5/ml/messages/kde-workspace/ktraderclient5.po > ml_ktraderclient5.po

Extract:

msgid ""
msgstr ""
"Project-Id-Version: ktraderclient\n"
"Report-Msgid-Bugs-To: http://bugs.kde.org\n"
"POT-Creation-Date: 2018-08-16 09:14+0200\n"
"PO-Revision-Date: 2008-07-10 22:04+0530\n"
"Last-Translator: # ANI PETER|അനി പീറ്റര്<200d> <peter.ani@gmail.com>\n"
"Language-Team: Swathanthra|സ്വതന്ത്ര Malayalam|മലയാളം Computing|കമ്പ്യൂട്ടിങ്ങ് <smc-"
"discuss@googlegroups.com>\n"
"Language: ml\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: KBabel 1.11.4\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
msg337495 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2019-03-08 15:38
That's literally sick þ Looks like we have to trust the "\n", not the file wrapping, but this means that:

msgstr ""
"Pro"
"jec"
"t-I"
"d-V"
"ers"
"ion"
": "
"dig"
"ika"
"m\n"
"Report-Msgid-Bugs-To: http://bugs.kde.org\n"

is valid, too? I have to try it!

HAHA it is:

$ cat ~/clones/python-docs-fr/glossary.po | head -n 20
# Copyright (C) 2001-2018, Python Software Foundation
# For licence information, see README file.
#
msgid ""
msgstr ""
"Pr"
"oj"
"ec"
"t-"
"Id"
"-V"
"er"
"si"
"on"
":"
" P"
"ython 3.6\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-12-21 09:48+0100\n"
"PO-Revision-Date: 2019-03-08 14:48+0100\n"

$ msgcat ~/clones/python-docs-fr/glossary.po | head -n 20
# Copyright (C) 2001-2018, Python Software Foundation
# For licence information, see README file.
#
msgid ""
msgstr ""
"Project-Id-Version: Python 3.6\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-12-21 09:48+0100\n"
"PO-Revision-Date: 2019-03-08 14:48+0100\n"
"Last-Translator: Jules Lasne <jules.lasne@gmail.com>\n"
"Language-Team: FRENCH <traductions@lists.afpy.org>\n"
"Language: fr\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: Poedit 2.0.2\n"
"# Pouette\n"
msg337497 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2019-03-08 15:56
I tested further, and when we have this horrible mess in the po files:

msgstr ""
"Pro"
"jec"
"t-I"
"d-V"
"ers"
"ion"
": "
"dig"
"ika"
"m\n"

We have a clean string in the .mo file.

So there is no fear to have of:

"Plural-Forms: nplurals=1; plural=0;\n"
"#-#-#-#-#  digikamimageplugin_raindrop.po (digikamimageplugin_raindrop)  #-#-"
"#-#-#\n"
"X-Generator: KBabel 1.11.4\n"

It will be nicely stored in the mo as:

Plural-Forms: nplurals=1; plural=0;
#-#-#-#-#  digikamimageplugin_raindrop.po (digikamimageplugin_raindrop)  #-#-#-#-#
X-Generator: KBabel 1.11.4

So you can safely remove lines starting and ending with #-#-#-#-#.
msg341981 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2019-05-09 14:22
New changeset afd1e6d2f0f5aaf4030d13342809ec0915dedf81 by Julien Palard in branch 'master':
bpo-36239: Skip comments in gettext infos (GH-12255)
https://github.com/python/cpython/commit/afd1e6d2f0f5aaf4030d13342809ec0915dedf81
msg342002 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-09 22:24
Julien: Why not fixing Python 3.7?

You approved https://github.com/python/cpython/pull/13218 (Python 3.7 backport) but then you closed it. Only Azure Pipelines PR failed on "ERROR: test_drain_raises (test.test_asyncio.test_streams.StreamTests)" which is unrelated.
History
Date User Action Args
2019-05-09 22:24:38vstinnersetmessages: + msg342002
2019-05-09 19:31:19mdksetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-05-09 14:23:00miss-islingtonsetpull_requests: + pull_request13129
2019-05-09 14:22:33mdksetmessages: + msg341981
2019-03-10 12:50:25serhiy.storchakasetnosy: + serhiy.storchaka
2019-03-09 22:53:15mdksetkeywords: + patch
stage: patch review
pull_requests: + pull_request12241
2019-03-08 15:56:14mdksetmessages: + msg337497
2019-03-08 15:38:22mdksetmessages: + msg337495
2019-03-08 15:38:08vstinnersetmessages: + msg337494
2019-03-08 15:30:19vstinnersetmessages: + msg337493
2019-03-08 15:27:47mdksetmessages: + msg337492
2019-03-08 15:20:58vstinnersetmessages: + msg337491
2019-03-08 15:20:04vstinnersetmessages: + msg337490
2019-03-08 14:43:54mdksetmessages: + msg337486
2019-03-08 13:59:53vstinnersetmessages: + msg337477
2019-03-08 13:59:12vstinnersetfiles: + messages.mo
2019-03-08 13:58:57vstinnersetfiles: + comments.po
2019-03-08 13:58:51vstinnersetfiles: + parse.py
2019-03-08 13:58:19vstinnercreate