Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

center, ljust and rjust are inconsistent with unicode parameters #47696

Closed
ignas mannequin opened this issue Jul 25, 2008 · 8 comments
Closed

center, ljust and rjust are inconsistent with unicode parameters #47696

ignas mannequin opened this issue Jul 25, 2008 · 8 comments
Assignees
Labels
stdlib Python modules in the Lib dir topic-unicode type-feature A feature request or enhancement

Comments

@ignas
Copy link
Mannequin

ignas mannequin commented Jul 25, 2008

BPO 3446
Nosy @rhettinger, @abalkin, @vstinner, @ezio-melotti

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/abalkin'
closed_at = <Date 2010-12-15.19:40:58.120>
created_at = <Date 2008-07-25.15:58:22.979>
labels = ['type-feature', 'library', 'expert-unicode']
title = 'center, ljust and rjust are inconsistent with unicode parameters'
updated_at = <Date 2010-12-15.19:40:58.119>
user = 'https://bugs.python.org/ignas'

bugs.python.org fields:

activity = <Date 2010-12-15.19:40:58.119>
actor = 'belopolsky'
assignee = 'belopolsky'
closed = True
closed_date = <Date 2010-12-15.19:40:58.120>
closer = 'belopolsky'
components = ['Library (Lib)', 'Unicode']
creation = <Date 2008-07-25.15:58:22.979>
creator = 'ignas'
dependencies = []
files = []
hgrepos = []
issue_num = 3446
keywords = []
message_count = 8.0
messages = ['70258', '82514', '82660', '83669', '87121', '87122', '87123', '123483']
nosy_count = 5.0
nosy_names = ['rhettinger', 'belopolsky', 'vstinner', 'ezio.melotti', 'ignas']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue3446'
versions = ['Python 2.6', 'Python 2.7']

@ignas
Copy link
Mannequin Author

ignas mannequin commented Jul 25, 2008

Not all combinations of unicode/non-unicode parameters work for ljust,
center and rjust. Passing a unicode character to them as a parameter
when the string is ascii fails with an error.

This doctest fails in 3 places. Though I would expect it to be passing.

def doctest_strings():
    """
  >>> uni = u"a"
  >>> ascii = "a"
      >>> uni.center(5, ascii)
      u'aaaaa'

      >>> uni.center(5, uni)
      u'aaaaa'

      >>> ascii.center(5, ascii)
      'aaaaa'

      >>> ascii.center(5, uni)
      u'aaaaa'

      >>> uni.ljust(5, ascii)
      u'aaaaa'

      >>> uni.ljust(5, uni)
      u'aaaaa'

      >>> ascii.ljust(5, ascii)
      'aaaaa'

      >>> ascii.ljust(5, uni)
      u'aaaaa'

      >>> uni.rjust(5, ascii)
      u'aaaaa'

      >>> uni.rjust(5, uni)
      u'aaaaa'

      >>> ascii.rjust(5, ascii)
      'aaaaa'

      >>> ascii.rjust(5, uni)
      u'aaaaa'
"""

@ignas ignas mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir topic-unicode labels Jul 25, 2008
@ezio-melotti
Copy link
Member

Indeed this behavior doesn't seem to be documented.

When the string is unicode and the fillchar non-unicode Python
implicitly tries to decode the fillchar (and possibly it raises a
TypeError if it's not in range(0,128)):
>>> u'x'.center(5, 'y') # unicode string, non-unicode (str) fillchar
u'yyxyy' # the fillchar is decoded


When the string is non-unicode it only accepts a non-unicode fillchar
(e.g. 'x'.center(5, 'y')) and it raises a TypeError if the fillchar is
unicode:
>>> 'x'.center(5, u'y') # non-unicode (str) string, unicode fillchar
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: center() argument 2 must be char, not unicode

If it tries to decode the fillchar when the string is unicode, it could
also try to encode the unicode fillchar (and possibly raise a TypeError)
when the string is non-unicode.

Py3, instead, seems to have the opposite behavior. It implicitly encodes
unicode fillchars into byte strings when the string is a byte string but
it doesn't decode a byte fillchar if the string is unicode:

>>> b'x'.center(5, 'y') # byte string, unicode fillchar
b'yyxyy' # the fillchar is encoded
>>> 'x'.center(5, b'y') # unicode string, byte fillchar
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: The fill character cannot be converted to Unicode

In the doc 1 there's written that "The methods on bytes and bytearray
objects don’t accept strings as their arguments, just as the methods on
strings don’t accept bytes as their arguments." so b'x'.center(5, 'y')
should probably raise an error on Py3 (I could open a new issue for this).

  • In the note

@rhettinger
Copy link
Contributor

In Py2.x, I think the desired behavior should match str.join(). If
either input in unicode the output is unicode. If both are ascii, ascii
should come out.

For Py3.x, I think the goal was to have str.join() enforce that both
inputs are unicode. If either are bytes, then you have to know the
encoding.

@vstinner
Copy link
Member

About Python3, bytes.center accepts unicode as second argument, which
is an error for me:

>>> b"x".center(5, b"\xe9")
b'\xe9\xe9x\xe9\xe9'
>>> b"x".center(5, "\xe9")
b'\xe9\xe9x\xe9\xe9'

The second example must fail with a TypeError.

str.center has the right behaviour:

>>> "x".center(5, "\xe9")
'ééxéé'
>>> "x".center(5, b"\xe9")
TypeError: The fill character cannot be converted to Unicode

@vstinner
Copy link
Member

vstinner commented May 4, 2009

haypo> About Python3, bytes.center accepts unicode as second argument,
haypo> which is an error for me

Ok, it's fixed thanks by r71013 (issue bpo-5499).

@vstinner
Copy link
Member

vstinner commented May 4, 2009

This issue only concerns Python 2.x, Python 3.x has the right
behaviour: it disallow mixing bytes with characters.

@vstinner
Copy link
Member

vstinner commented May 4, 2009

The question is why str.{ljust,rjust,center} doesn't accept unicode
argument, whereas unicode.{ljust,rjust,center} accept ASCII string.
Other string methods accept unicode argument, like str.count() (encode
the unicode string to bytes using utf8 charset).

To be consistent with other string methods, str.{ljust,rjust,center}
should accept unicode string and convert them to byte string using
utf8, like str.count does. But I hate such implicit conversion (I
prefer Python3 way: disallow mixing bytes and characters), so I will
not contribute to such patch.

Can you write such patch?

--

str.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|c:...", ...)
and getarg('c') which only accepts a string of 1 byte.

unicode.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|
O&:...", ..., convert_uc, ...) where convert_uc looks something like:

  def convert_uc(o):
     try:
        u = unicode(o)
     except:
        raise TypeError("The fill character cannot be converted to 
Unicode")
     if len(u) != 1:
        raise TypeError("The fill character must be exactly one 
character long"))
     return u[0]

convert_uc() accepts an byte string of 1 ASCII.

string_count() uses PyArg_ParseTuple(args, "O...", ...) and then test
the substring type.

@abalkin
Copy link
Member

abalkin commented Dec 6, 2010

As a feature request for 2.x, I think this should be rejected.

Any objections?

The "behavior" part seem to have been fixed.

@abalkin abalkin self-assigned this Dec 6, 2010
@abalkin abalkin added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Dec 6, 2010
@abalkin abalkin closed this as completed Dec 15, 2010
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-unicode type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants