Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3 gettext.lgettext sometimes returns bytes, not string #73941

Closed
petri mannequin opened this issue Mar 8, 2017 · 7 comments
Closed

python3 gettext.lgettext sometimes returns bytes, not string #73941

petri mannequin opened this issue Mar 8, 2017 · 7 comments
Assignees
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@petri
Copy link
Mannequin

petri mannequin commented Mar 8, 2017

BPO 29755
Nosy @loewis, @warsaw, @serhiy-storchaka, @petri
PRs
  • bpo-29755: Fixed the lgettext() family of functions in the gettext module. #2266
  • [3.6] bpo-29755: Fixed the lgettext() family of functions in the gettext module. (GH-2266) #2297
  • [3.5] bpo-29755: Fixed the lgettext() family of functions in the gettext module. (GH-2266) #2298
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2017-06-20.15:13:52.235>
    created_at = <Date 2017-03-08.09:17:48.201>
    labels = ['3.7', 'type-bug', 'library']
    title = 'python3 gettext.lgettext sometimes returns bytes, not string'
    updated_at = <Date 2017-06-20.15:13:52.233>
    user = 'https://github.com/petri'

    bugs.python.org fields:

    activity = <Date 2017-06-20.15:13:52.233>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2017-06-20.15:13:52.235>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2017-03-08.09:17:48.201>
    creator = 'petri'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 29755
    keywords = []
    message_count = 7.0
    messages = ['289220', '296268', '296275', '296434', '296450', '296451', '296452']
    nosy_count = 4.0
    nosy_names = ['loewis', 'barry', 'serhiy.storchaka', 'petri']
    pr_nums = ['2266', '2297', '2298']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue29755'
    versions = ['Python 3.5', 'Python 3.6', 'Python 3.7']

    @petri
    Copy link
    Mannequin Author

    petri mannequin commented Mar 8, 2017

    On Debian stable (Python 3.4), with the LANGUAGE environment variable set to "C" or "en_US.UTF-8", the following produces a string:

    d = gettext.textdomain('apt-listchanges')
    print(gettext.lgettext("Informational notes"))

    However, setting the language, for example fi_FI.UTF-8, it will output a bytes object. Same apparently happens with some other languages, too.

    Why is this? The discrepancy is not documented anywhere, AFAIK. Is this a bug or intended behavior depending on some (undocumented) circumstances? Given both the above examples define UTF-8 as the encoding, the result value does not depend directly on the encoding.

    The docs say lgettext should merely return the translation in a particular encoding. It does not say the return value will be switched from a string to bytes as well.

    I saw this originally in the Debian bug tracker and thought the issue merits at least clarification here as well (link to Debian bug below for reference).

    (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=818728)

    No idea if this happens on Python > 3.4 or another platforms. I would guess so, but have not had time to confirm.

    @petri petri mannequin added the type-bug An unexpected behavior, bug, or error label Mar 8, 2017
    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir 3.7 (EOL) end of life labels Mar 8, 2017
    @serhiy-storchaka serhiy-storchaka self-assigned this Jun 17, 2017
    @serhiy-storchaka
    Copy link
    Member

    In Python 2 both gettext() and lgettext() are purposed to return 8-bit strings. The difference between them is only that gettext() encodes the translation back to the encoding of the translation file if the output encoding is not explicitly specified, while lgettext() encodes it to the preferred locale encoding. ugettext() returns Unicode strings.

    In Python 3 ugettext() is renamed to gettext() and always returns Unicode strings. lgettext() should return a byte string, as in Python 2. The problem is that if the translation is not found, the untranslated message usually is returned, which is a Unicode string in Python 3. It should be encoded to a byte string, so that lgettext() always returns the same type -- bytes.

    PR 2266 fixes lgettext() and related functions, updates the documentation, and adds tests.

    Frankly, the usefulness of lgettext() in Python 3 looks questionable to me. gettext() can be used instead, with explicit encoding the result to the desired charset.

    @warsaw
    Copy link
    Member

    warsaw commented Jun 18, 2017

    I agree with everything @serhiy.storchaka said, including the questionable utility of the l* methods in Python 3. ;)

    Thanks also for updating the documentation. Reading the existing docs over now, it's shocking how imprecise "the translation is returned in the preferred system encoding" is.

    I have some suggestion about the PR, so I'll comment over there.

    @serhiy-storchaka
    Copy link
    Member

    New changeset 26cb465 by Serhiy Storchaka in branch 'master':
    bpo-29755: Fixed the lgettext() family of functions in the gettext module. (bpo-2266)
    26cb465

    @serhiy-storchaka
    Copy link
    Member

    New changeset a1115e1 by Serhiy Storchaka in branch '3.6':
    [3.6] bpo-29755: Fixed the lgettext() family of functions in the gettext module. (GH-2266) (bpo-2297)
    a1115e1

    @serhiy-storchaka
    Copy link
    Member

    New changeset 29c89d0 by Serhiy Storchaka in branch '3.5':
    [3.5] bpo-29755: Fixed the lgettext() family of functions in the gettext module. (GH-2266) (bpo-2298)
    29c89d0

    @serhiy-storchaka
    Copy link
    Member

    As for the original issue in the Debian bug tracker, lgettext() and ugettext() are two right ways (depending on how you format the output, as 8-bit strings or as Unicode strings) for doing localization in Python 2, but gettext() is the right way in Python 3.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants