This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: gettext - Non ascii chars in header
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Michael.Müller, eric.araujo, flipmcf, jwilk
Priority: normal Keywords:

Created on 2013-12-06 11:20 by Michael.Müller, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
messages.po Michael.Müller, 2013-12-07 00:28 Example Messages.po
Messages (7)
msg205361 - (view) Author: Michael Müller (Michael.Müller) Date: 2013-12-06 11:20
When having non ascii chars in the header of an translation file (xxx.po) the following error will be raised:

  File "D:\Python33\lib\gettext.py", line 410, in translation
    t = _translations.setdefault(key, class_(fp))
  File "D:\Python33\lib\gettext.py", line 160, in __init__
    self._parse(fp)
  File "D:\Python33\lib\gettext.py", line 265, in _parse
    item = b_item.decode().strip()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 51: invalid continuation byte

translation file head:

"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2013-12-06 11:47\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE+Mitteleuropäische Zeit\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: utf-8\n"
"Generated-By: pygettext.py 1.5\n"

The problem here exists with the PO-Revision-Date which is followed by the timezone offset in current language (here "Mitteleuropäische Zeit") which is automatically added by pygettext.py.
When removing it the file will work without any problems.

Current pygettext.py code part:
[Line 444] ...
    def write(self, fp):
        options = self.__options
        timestamp = time.strftime('%Y-%m-%d %H:%M+%Z')
        # The time stamp in the header doesn't have the same format as that
        # generated by xgettext...
        print(pot_header % {'time': timestamp, 'version': __version__}, file=fp)
...

To avoid this it would be better to use gmtime and not to append the timezone:

...
    def write(self, fp):
        options = self.__options
        timestamp = time.strftime('%Y-%m-%d %H:%M', time.gmtime())
...
msg205388 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2013-12-06 17:32
It looks like there are two issues here.  First, what would be a correct format for the PO-Revision-Date line? (human-readable string or numerical timezone)  Second, gettext supports UTF-8 for the po file, which should support ä without problem, so maybe pygettext uses the wrong encoding when saving the file.  Can you tell what’s the encoding of your xxx.po?
msg205422 - (view) Author: Michael Müller (Michael.Müller) Date: 2013-12-07 00:28
Used encoding is utf-8.
Testfile I used added to this comment.

Second about the PO-Revision-Date:
It should be human readable. It's unimportant for the program itself - it's used for the translator of the xxx.po file.
Normally the whole header could be removed while compiling (except the encoding of course)
msg205788 - (view) Author: Jakub Wilk (jwilk) Date: 2013-12-10 11:17
See also issue18128.
Date headers are not only for humans; I've seen software that parses them.
msg240785 - (view) Author: Michael McFadden (flipmcf) * Date: 2015-04-13 22:24
I'm having no luck reproducing this issue.

Regarding pygettext.py generating .po files:

I've used the messages.po file provided by the OP, and also using a generated .po file from pygettext.py with the offending header PO-Revision-Date:

As a side note, I'm not quite sure how you can get pygettext.py to generate anything but "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE" in python 3.3+, because it's a static line.

Regarding gettext.py

1: Tools/msgfmt.py -o locales/testlang/LC_MESSAGES/messages.mo locales/testlang/LC_MESSAGES/messages.po

 (all is good)

2: Run this simple script:
  import gettext

  t = gettext.translation('messages', 
                   localedir='mytests/locales', 
                   languages=['testlang'], )


  _ = t.gettext

  print('------------')
  print(_('TestMe'))
  print('------------')

Works without exceptions.

The only way I can recreate this issue is by saving my po file in a non-utf8 format (latin-1) and running the above code.  It fails perfectly.

Can you help me recreate your issue?
msg241123 - (view) Author: Michael McFadden (flipmcf) * Date: 2015-04-15 16:12
issue18128 is not related methinks.

This ticket: POT-Creation-Date 
issue18128: PO-Revision-Date
msg241125 - (view) Author: Michael McFadden (flipmcf) * Date: 2015-04-15 16:17
This might be fixed by issue17156, which would explain why I can't recreate it.

https://github.com/python/cpython/commit/f4273cfd16fa502f0eb8a0a8fd1c537ec63e47db
History
Date User Action Args
2022-04-11 14:57:55adminsetgithub: 64106
2017-11-09 18:14:06serhiy.storchakasetstatus: pending -> closed
resolution: out of date
stage: resolved
2017-09-22 20:02:43serhiy.storchakasetstatus: open -> pending
2015-04-15 16:17:54flipmcfsetmessages: + msg241125
2015-04-15 16:12:44flipmcfsetmessages: + msg241123
2015-04-13 22:24:20flipmcfsetnosy: + flipmcf
messages: + msg240785
2013-12-10 11:17:43jwilksetnosy: + jwilk
messages: + msg205788
2013-12-07 00:28:24Michael.Müllersetfiles: + messages.po

messages: + msg205422
2013-12-06 17:32:27eric.araujosetnosy: + eric.araujo
messages: + msg205388
2013-12-06 11:20:33Michael.Müllercreate