center, ljust and rjust are inconsistent with unicode parameters #47696

ignas · 2008-07-25T15:58:23Z

BPO	3446
Nosy	@rhettinger, @abalkin, @vstinner, @ezio-melotti

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/abalkin'
closed_at = <Date 2010-12-15.19:40:58.120>
created_at = <Date 2008-07-25.15:58:22.979>
labels = ['type-feature', 'library', 'expert-unicode']
title = 'center, ljust and rjust are inconsistent with unicode parameters'
updated_at = <Date 2010-12-15.19:40:58.119>
user = 'https://bugs.python.org/ignas'

bugs.python.org fields:

activity = <Date 2010-12-15.19:40:58.119>
actor = 'belopolsky'
assignee = 'belopolsky'
closed = True
closed_date = <Date 2010-12-15.19:40:58.120>
closer = 'belopolsky'
components = ['Library (Lib)', 'Unicode']
creation = <Date 2008-07-25.15:58:22.979>
creator = 'ignas'
dependencies = []
files = []
hgrepos = []
issue_num = 3446
keywords = []
message_count = 8.0
messages = ['70258', '82514', '82660', '83669', '87121', '87122', '87123', '123483']
nosy_count = 5.0
nosy_names = ['rhettinger', 'belopolsky', 'vstinner', 'ezio.melotti', 'ignas']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue3446'
versions = ['Python 2.6', 'Python 2.7']

ignas · 2008-07-25T15:58:22Z

Not all combinations of unicode/non-unicode parameters work for ljust,
center and rjust. Passing a unicode character to them as a parameter
when the string is ascii fails with an error.

This doctest fails in 3 places. Though I would expect it to be passing.

def doctest_strings():
    """

  >>> uni = u"a"
  >>> ascii = "a"

      >>> uni.center(5, ascii)
      u'aaaaa'

      >>> uni.center(5, uni)
      u'aaaaa'

      >>> ascii.center(5, ascii)
      'aaaaa'

      >>> ascii.center(5, uni)
      u'aaaaa'

      >>> uni.ljust(5, ascii)
      u'aaaaa'

      >>> uni.ljust(5, uni)
      u'aaaaa'

      >>> ascii.ljust(5, ascii)
      'aaaaa'

      >>> ascii.ljust(5, uni)
      u'aaaaa'

      >>> uni.rjust(5, ascii)
      u'aaaaa'

      >>> uni.rjust(5, uni)
      u'aaaaa'

      >>> ascii.rjust(5, ascii)
      'aaaaa'

      >>> ascii.rjust(5, uni)
      u'aaaaa'

"""

ezio-melotti · 2009-02-20T06:13:58Z

Indeed this behavior doesn't seem to be documented.

When the string is unicode and the fillchar non-unicode Python
implicitly tries to decode the fillchar (and possibly it raises a
TypeError if it's not in range(0,128)):
>>> u'x'.center(5, 'y') # unicode string, non-unicode (str) fillchar
u'yyxyy' # the fillchar is decoded


When the string is non-unicode it only accepts a non-unicode fillchar
(e.g. 'x'.center(5, 'y')) and it raises a TypeError if the fillchar is
unicode:
>>> 'x'.center(5, u'y') # non-unicode (str) string, unicode fillchar
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: center() argument 2 must be char, not unicode

If it tries to decode the fillchar when the string is unicode, it could
also try to encode the unicode fillchar (and possibly raise a TypeError)
when the string is non-unicode.

Py3, instead, seems to have the opposite behavior. It implicitly encodes
unicode fillchars into byte strings when the string is a byte string but
it doesn't decode a byte fillchar if the string is unicode:

>>> b'x'.center(5, 'y') # byte string, unicode fillchar
b'yyxyy' # the fillchar is encoded
>>> 'x'.center(5, b'y') # unicode string, byte fillchar
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: The fill character cannot be converted to Unicode

In the doc 1 there's written that "The methods on bytes and bytearray
objects don’t accept strings as their arguments, just as the methods on
strings don’t accept bytes as their arguments." so b'x'.center(5, 'y')
should probably raise an error on Py3 (I could open a new issue for this).

In the note

rhettinger · 2009-02-24T07:19:36Z

In Py2.x, I think the desired behavior should match str.join(). If
either input in unicode the output is unicode. If both are ascii, ascii
should come out.

For Py3.x, I think the goal was to have str.join() enforce that both
inputs are unicode. If either are bytes, then you have to know the
encoding.

vstinner · 2009-03-17T12:10:58Z

About Python3, bytes.center accepts unicode as second argument, which
is an error for me:

>>> b"x".center(5, b"\xe9")
b'\xe9\xe9x\xe9\xe9'
>>> b"x".center(5, "\xe9")
b'\xe9\xe9x\xe9\xe9'

The second example must fail with a TypeError.

str.center has the right behaviour:

>>> "x".center(5, "\xe9")
'ééxéé'
>>> "x".center(5, b"\xe9")
TypeError: The fill character cannot be converted to Unicode

vstinner · 2009-05-04T12:36:41Z

haypo> About Python3, bytes.center accepts unicode as second argument,
haypo> which is an error for me

Ok, it's fixed thanks by r71013 (issue bpo-5499).

vstinner · 2009-05-04T12:38:30Z

This issue only concerns Python 2.x, Python 3.x has the right
behaviour: it disallow mixing bytes with characters.

vstinner · 2009-05-04T12:58:26Z

The question is why str.{ljust,rjust,center} doesn't accept unicode
argument, whereas unicode.{ljust,rjust,center} accept ASCII string.
Other string methods accept unicode argument, like str.count() (encode
the unicode string to bytes using utf8 charset).

To be consistent with other string methods, str.{ljust,rjust,center}
should accept unicode string and convert them to byte string using
utf8, like str.count does. But I hate such implicit conversion (I
prefer Python3 way: disallow mixing bytes and characters), so I will
not contribute to such patch.

Can you write such patch?

--

str.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|c:...", ...)
and getarg('c') which only accepts a string of 1 byte.

unicode.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|
O&:...", ..., convert_uc, ...) where convert_uc looks something like:

  def convert_uc(o):
     try:
        u = unicode(o)
     except:
        raise TypeError("The fill character cannot be converted to 
Unicode")
     if len(u) != 1:
        raise TypeError("The fill character must be exactly one 
character long"))
     return u[0]

convert_uc() accepts an byte string of 1 ASCII.

string_count() uses PyArg_ParseTuple(args, "O...", ...) and then test
the substring type.

abalkin · 2010-12-06T18:25:14Z

As a feature request for 2.x, I think this should be rejected.

Any objections?

The "behavior" part seem to have been fixed.

ignas mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir topic-unicode labels Jul 25, 2008

abalkin self-assigned this Dec 6, 2010

abalkin added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Dec 6, 2010

abalkin closed this as completed Dec 15, 2010

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

center, ljust and rjust are inconsistent with unicode parameters #47696

center, ljust and rjust are inconsistent with unicode parameters #47696

ignas mannequin commented Jul 25, 2008

ignas mannequin commented Jul 25, 2008

ezio-melotti commented Feb 20, 2009

rhettinger commented Feb 24, 2009

vstinner commented Mar 17, 2009

vstinner commented May 4, 2009

vstinner commented May 4, 2009

vstinner commented May 4, 2009

abalkin commented Dec 6, 2010

Navigation Menu

center, ljust and rjust are inconsistent with unicode parameters #47696

center, ljust and rjust are inconsistent with unicode parameters #47696

Comments

ignas mannequin commented Jul 25, 2008

ignas mannequin commented Jul 25, 2008

ezio-melotti commented Feb 20, 2009

rhettinger commented Feb 24, 2009

vstinner commented Mar 17, 2009

vstinner commented May 4, 2009

vstinner commented May 4, 2009

vstinner commented May 4, 2009

abalkin commented Dec 6, 2010