Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string.Formatter returns str for empty unicode template #60155

Closed
AlekseySivokon mannequin opened this issue Sep 16, 2012 · 9 comments
Closed

string.Formatter returns str for empty unicode template #60155

AlekseySivokon mannequin opened this issue Sep 16, 2012 · 9 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@AlekseySivokon
Copy link
Mannequin

AlekseySivokon mannequin commented Sep 16, 2012

BPO 15951
Nosy @ericvsmith, @ezio-melotti, @bitdancer, @cjerdonek
Files
  • issue-15951-test-1.patch
  • issue-15951-2-branch27.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-05-31.12:11:58.608>
    created_at = <Date 2012-09-16.11:55:56.012>
    labels = ['type-bug', 'library']
    title = 'string.Formatter returns str for empty unicode template'
    updated_at = <Date 2020-05-31.12:11:58.607>
    user = 'https://bugs.python.org/AlekseySivokon'

    bugs.python.org fields:

    activity = <Date 2020-05-31.12:11:58.607>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-05-31.12:11:58.608>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2012-09-16.11:55:56.012>
    creator = 'Aleksey.Sivokon'
    dependencies = []
    files = ['27204', '27222']
    hgrepos = []
    issue_num = 15951
    keywords = ['patch']
    message_count = 9.0
    messages = ['170551', '170552', '170555', '170559', '170560', '170562', '170571', '170576', '170693']
    nosy_count = 5.0
    nosy_names = ['eric.smith', 'ezio.melotti', 'r.david.murray', 'chris.jerdonek', 'Aleksey.Sivokon']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue15951'
    versions = ['Python 2.7']

    @AlekseySivokon
    Copy link
    Mannequin Author

    AlekseySivokon mannequin commented Sep 16, 2012

    Expected behavior of string.Formatter() is to return unicode strings for unicode templates, and "byte" strings for str templates. Which is exactly what it does, with one frustrating exception: for empty unicode string it returns byte str. Test follows:

    import string
    template = u""
    result = string.Formatter().format(template)
    assert isinstance(result, unicode)
    # AssertionError

    @AlekseySivokon AlekseySivokon mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Sep 16, 2012
    @cjerdonek
    Copy link
    Member

    Adding failing test. Patch coming next.

    @cjerdonek
    Copy link
    Member

    Here are some related failing cases that I found:

    >>> f = string.Formatter()
    >>> f.format(u"{0}", "")
    ''
    >>> f.format(u"{0}", 1)
    '1'
    >>> f.format(u"{0}", "a")
    'a'
    >>> f.format(u"{0}{1}", "a", "b")
    'ab'
    >>> f.format("{0}", u"a") 
    u'a'

    Note that PEP-3101 says the following:

    "In all cases, the type of the format string dominates - that
    is, the result of the conversion will always result in an object
    that contains the same representation of characters as the
    input format string."

    @cjerdonek
    Copy link
    Member

    Actually, I'm going to defer on creating a patch because this covers more scenarios than I originally thought and so may require more time.

    @bitdancer
    Copy link
    Member

    Format with unicode is a bit of a mess in 2.7. It would be consistent with the rest of python2 for

      >>> f.format("{0}", u"a")
      u'a'

    to be correct.

    See also bpo-7300 and bpo-15276.

    @cjerdonek
    Copy link
    Member

    What about cases like this?

    >>> f.format(u'{0}', '\xe9')
    '\xe9'

    It seems fixing this issue for non-empty strings would cause formerly running cases like this to raise UnicodeDecodeError.

    >>> unicode('\xe9')
     ...
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

    Would that be acceptable?

    @bitdancer
    Copy link
    Member

    Note that I didn't say it was correct, I just said it was consistent :)

    And no, breaking stuff that current works is a non-starter for 2.7.

    @cjerdonek
    Copy link
    Member

    I filed bpo-15952 for the behavior difference between format(value) and value.__format__() and the related lack of documentation re: unicode format strings.

    Given that the expected behavior for the current issue doesn't seem to be documented (aside from PEP-3101, which is probably too late to follow), we should probably agree on what the behavior should be (as well as documenting it) before or while addressing this issue.

    @cjerdonek
    Copy link
    Member

    Attached is a proposed patch.

    Some explanation behind the patch that stems from the above comments:

    The following is an example of Formatter.format() returning str in the current implementation that would break if we made Formatter.format() return unicode whenever format_string is unicode:

    >>> f.format(u"{0}", "\xc3\xa9")  # UTF-8 encoded "e-acute".
    '\xc3\xa9'

    (It would break with a UnicodeDecodeError because 'ascii' is the default encoding.)

    Since we can't change Formatter.format(format_string) to return unicode whenever format_string is unicode without breaking existing code, I believe the best we can do is to document the departure from PEP-3101. Since the caller has to handle return values of type str anyways, I don't think it helps to ensure that more return values are unicode.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants