python3 gettext.lgettext sometimes returns bytes, not string #73941

petri · 2017-03-08T09:17:48Z

BPO	29755
Nosy	@loewis, @warsaw, @serhiy-storchaka, @petri
PRs	bpo-29755: Fixed the lgettext() family of functions in the gettext module. #2266 [3.6] bpo-29755: Fixed the lgettext() family of functions in the gettext module. (GH-2266) #2297 [3.5] bpo-29755: Fixed the lgettext() family of functions in the gettext module. (GH-2266) #2298

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2017-06-20.15:13:52.235>
created_at = <Date 2017-03-08.09:17:48.201>
labels = ['3.7', 'type-bug', 'library']
title = 'python3 gettext.lgettext sometimes returns bytes, not string'
updated_at = <Date 2017-06-20.15:13:52.233>
user = 'https://github.com/petri'

bugs.python.org fields:

activity = <Date 2017-06-20.15:13:52.233>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2017-06-20.15:13:52.235>
closer = 'serhiy.storchaka'
components = ['Library (Lib)']
creation = <Date 2017-03-08.09:17:48.201>
creator = 'petri'
dependencies = []
files = []
hgrepos = []
issue_num = 29755
keywords = []
message_count = 7.0
messages = ['289220', '296268', '296275', '296434', '296450', '296451', '296452']
nosy_count = 4.0
nosy_names = ['loewis', 'barry', 'serhiy.storchaka', 'petri']
pr_nums = ['2266', '2297', '2298']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue29755'
versions = ['Python 3.5', 'Python 3.6', 'Python 3.7']

petri · 2017-03-08T09:17:48Z

On Debian stable (Python 3.4), with the LANGUAGE environment variable set to "C" or "en_US.UTF-8", the following produces a string:

d = gettext.textdomain('apt-listchanges')
print(gettext.lgettext("Informational notes"))

However, setting the language, for example fi_FI.UTF-8, it will output a bytes object. Same apparently happens with some other languages, too.

Why is this? The discrepancy is not documented anywhere, AFAIK. Is this a bug or intended behavior depending on some (undocumented) circumstances? Given both the above examples define UTF-8 as the encoding, the result value does not depend directly on the encoding.

The docs say lgettext should merely return the translation in a particular encoding. It does not say the return value will be switched from a string to bytes as well.

I saw this originally in the Debian bug tracker and thought the issue merits at least clarification here as well (link to Debian bug below for reference).

(https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=818728)

No idea if this happens on Python > 3.4 or another platforms. I would guess so, but have not had time to confirm.

serhiy-storchaka · 2017-06-18T11:35:52Z

In Python 2 both gettext() and lgettext() are purposed to return 8-bit strings. The difference between them is only that gettext() encodes the translation back to the encoding of the translation file if the output encoding is not explicitly specified, while lgettext() encodes it to the preferred locale encoding. ugettext() returns Unicode strings.

In Python 3 ugettext() is renamed to gettext() and always returns Unicode strings. lgettext() should return a byte string, as in Python 2. The problem is that if the translation is not found, the untranslated message usually is returned, which is a Unicode string in Python 3. It should be encoded to a byte string, so that lgettext() always returns the same type -- bytes.

PR 2266 fixes lgettext() and related functions, updates the documentation, and adds tests.

Frankly, the usefulness of lgettext() in Python 3 looks questionable to me. gettext() can be used instead, with explicit encoding the result to the desired charset.

warsaw · 2017-06-18T14:59:10Z

I agree with everything @serhiy.storchaka said, including the questionable utility of the l* methods in Python 3. ;)

Thanks also for updating the documentation. Reading the existing docs over now, it's shocking how imprecise "the translation is returned in the preferred system encoding" is.

I have some suggestion about the PR, so I'll comment over there.

serhiy-storchaka · 2017-06-20T14:13:32Z

New changeset 26cb465 by Serhiy Storchaka in branch 'master':
bpo-29755: Fixed the lgettext() family of functions in the gettext module. (bpo-2266)
26cb465

serhiy-storchaka · 2017-06-20T15:06:51Z

New changeset a1115e1 by Serhiy Storchaka in branch '3.6':
[3.6] bpo-29755: Fixed the lgettext() family of functions in the gettext module. (GH-2266) (bpo-2297)
a1115e1

serhiy-storchaka · 2017-06-20T15:07:02Z

New changeset 29c89d0 by Serhiy Storchaka in branch '3.5':
[3.5] bpo-29755: Fixed the lgettext() family of functions in the gettext module. (GH-2266) (bpo-2298)
29c89d0

serhiy-storchaka · 2017-06-20T15:13:52Z

As for the original issue in the Debian bug tracker, lgettext() and ugettext() are two right ways (depending on how you format the output, as 8-bit strings or as Unicode strings) for doing localization in Python 2, but gettext() is the right way in Python 3.

petri mannequin added the type-bug An unexpected behavior, bug, or error label Mar 8, 2017

serhiy-storchaka added stdlib Python modules in the Lib dir 3.7 (EOL) end of life labels Mar 8, 2017

serhiy-storchaka self-assigned this Jun 17, 2017

serhiy-storchaka closed this as completed Jun 20, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python3 gettext.lgettext sometimes returns bytes, not string #73941

python3 gettext.lgettext sometimes returns bytes, not string #73941

petri mannequin commented Mar 8, 2017

petri mannequin commented Mar 8, 2017

serhiy-storchaka commented Jun 18, 2017

warsaw commented Jun 18, 2017

serhiy-storchaka commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

python3 gettext.lgettext sometimes returns bytes, not string #73941

python3 gettext.lgettext sometimes returns bytes, not string #73941

Comments

petri mannequin commented Mar 8, 2017

petri mannequin commented Mar 8, 2017

serhiy-storchaka commented Jun 18, 2017

warsaw commented Jun 18, 2017

serhiy-storchaka commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017

serhiy-storchaka commented Jun 20, 2017