Message 289220 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	petri
Recipients	petri
Date	2017-03-08.09:17:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1488964668.24.0.723245092961.issue29755@psf.upfronthosting.co.za>
In-reply-to

Content
On Debian stable (Python 3.4), with the LANGUAGE environment variable set to "C" or "en_US.UTF-8", the following produces a string: d = gettext.textdomain('apt-listchanges') print(gettext.lgettext("Informational notes")) However, setting the language, for example fi_FI.UTF-8, it will output a bytes object. Same apparently happens with some other languages, too. Why is this? The discrepancy is not documented anywhere, AFAIK. Is this a bug or intended behavior depending on some (undocumented) circumstances? Given both the above examples define UTF-8 as the encoding, the result value does not depend directly on the encoding. The docs say lgettext should merely return the translation in a particular encoding. It does not say the return value will be switched from a string to bytes as well. I saw this originally in the Debian bug tracker and thought the issue merits at least clarification here as well (link to Debian bug below for reference). (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=818728) No idea if this happens on Python > 3.4 or another platforms. I would guess so, but have not had time to confirm.

On Debian stable (Python 3.4), with the LANGUAGE environment variable set to "C" or "en_US.UTF-8", the following produces a string:

d = gettext.textdomain('apt-listchanges')
print(gettext.lgettext("Informational notes"))

However, setting the language, for example fi_FI.UTF-8, it will output a bytes object. Same apparently happens with some other languages, too.

Why is this? The discrepancy is not documented anywhere, AFAIK. Is this a bug or intended behavior depending on some (undocumented) circumstances? Given both the above examples define UTF-8 as the encoding, the result value does not depend directly on the encoding. 

The docs say lgettext should merely return the translation in a particular encoding. It does not say the return value will be switched from a string to bytes as well.

I saw this originally in the Debian bug tracker and thought the issue merits at least clarification here as well (link to Debian bug below for reference).

(https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=818728)

No idea if this happens on Python > 3.4 or another platforms. I would guess so, but have not had time to confirm.

History
Date	User	Action	Args
2017-03-08 09:17:48	petri	set	recipients: + petri
2017-03-08 09:17:48	petri	set	messageid: <1488964668.24.0.723245092961.issue29755@psf.upfronthosting.co.za>
2017-03-08 09:17:48	petri	link	issue29755 messages
2017-03-08 09:17:47	petri	create