classification
Title: center, ljust and rjust are inconsistent with unicode parameters
Type: enhancement Stage:
Components: Library (Lib), Unicode Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: belopolsky, ezio.melotti, ignas, rhettinger, vstinner
Priority: normal Keywords:

Created on 2008-07-25 15:58 by ignas, last changed 2010-12-15 19:40 by belopolsky. This issue is now closed.

Messages (8)
msg70258 - (view) Author: Ignas Mikalajūnas (ignas) Date: 2008-07-25 15:58
Not all combinations of unicode/non-unicode parameters work for ljust,
center and rjust. Passing a unicode character to them as a parameter
when the string is ascii fails with an error.

This doctest fails in 3 places. Though I would expect it to be passing.

def doctest_strings():
    """

      >>> uni = u"a"
      >>> ascii = "a"

      >>> uni.center(5, ascii)
      u'aaaaa'

      >>> uni.center(5, uni)
      u'aaaaa'

      >>> ascii.center(5, ascii)
      'aaaaa'

      >>> ascii.center(5, uni)
      u'aaaaa'

      >>> uni.ljust(5, ascii)
      u'aaaaa'

      >>> uni.ljust(5, uni)
      u'aaaaa'

      >>> ascii.ljust(5, ascii)
      'aaaaa'

      >>> ascii.ljust(5, uni)
      u'aaaaa'

      >>> uni.rjust(5, ascii)
      u'aaaaa'

      >>> uni.rjust(5, uni)
      u'aaaaa'

      >>> ascii.rjust(5, ascii)
      'aaaaa'

      >>> ascii.rjust(5, uni)
      u'aaaaa'

    """
msg82514 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-02-20 06:13
Indeed this behavior doesn't seem to be documented.

When the string is unicode and the fillchar non-unicode Python
implicitly tries to decode the fillchar (and possibly it raises a
TypeError if it's not in range(0,128)):
>>> u'x'.center(5, 'y') # unicode string, non-unicode (str) fillchar
u'yyxyy' # the fillchar is decoded


When the string is non-unicode it only accepts a non-unicode fillchar
(e.g. 'x'.center(5, 'y')) and it raises a TypeError if the fillchar is
unicode:
>>> 'x'.center(5, u'y') # non-unicode (str) string, unicode fillchar
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: center() argument 2 must be char, not unicode

If it tries to decode the fillchar when the string is unicode, it could
also try to encode the unicode fillchar (and possibly raise a TypeError)
when the string is non-unicode.

Py3, instead, seems to have the opposite behavior. It implicitly encodes
unicode fillchars into byte strings when the string is a byte string but
it doesn't decode a byte fillchar if the string is unicode:

>>> b'x'.center(5, 'y') # byte string, unicode fillchar
b'yyxyy' # the fillchar is encoded
>>> 'x'.center(5, b'y') # unicode string, byte fillchar
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: The fill character cannot be converted to Unicode

In the doc [1] there's written that "The methods on bytes and bytearray
objects don’t accept strings as their arguments, just as the methods on
strings don’t accept bytes as their arguments." so b'x'.center(5, 'y')
should probably raise an error on Py3 (I could open a new issue for this).

[1]:
http://docs.python.org/3.0/library/stdtypes.html#bytes-and-byte-array-methods
- In the note
msg82660 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-24 07:19
In Py2.x, I think the desired behavior should match str.join().   If
either input in unicode the output is unicode.  If both are ascii, ascii
should come out.

For Py3.x, I think the goal was to have str.join() enforce that both
inputs are unicode.  If either are bytes, then you have to know the
encoding.
msg83669 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-17 12:10
About Python3, bytes.center accepts unicode as second argument, which 
is an error for me:

>>> b"x".center(5, b"\xe9")
b'\xe9\xe9x\xe9\xe9'
>>> b"x".center(5, "\xe9")
b'\xe9\xe9x\xe9\xe9'

The second example must fail with a TypeError.

str.center has the right behaviour:

>>> "x".center(5, "\xe9")
'ééxéé'
>>> "x".center(5, b"\xe9")
TypeError: The fill character cannot be converted to Unicode
msg87121 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-05-04 12:36
haypo> About Python3, bytes.center accepts unicode as second argument,
haypo> which is an error for me

Ok, it's fixed thanks by r71013 (issue #5499).
msg87122 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-05-04 12:38
This issue only concerns Python 2.x, Python 3.x has the right 
behaviour: it disallow mixing bytes with characters.
msg87123 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-05-04 12:58
The question is why str.{ljust,rjust,center} doesn't accept unicode 
argument, whereas unicode.{ljust,rjust,center} accept ASCII string. 
Other string methods accept unicode argument, like str.count() (encode 
the unicode string to bytes using utf8 charset).

To be consistent with other string methods, str.{ljust,rjust,center} 
should accept unicode string and convert them to byte string using 
utf8, like str.count does. But I hate such implicit conversion (I 
prefer Python3 way: disallow mixing bytes and characters), so I will 
not contribute to such patch.

Can you write such patch?

--

str.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|c:...", ...) 
and getarg('c') which only accepts a string of 1 byte.

unicode.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|
O&:...", ..., convert_uc, ...) where convert_uc looks something like:

  def convert_uc(o):
     try:
        u = unicode(o)
     except:
        raise TypeError("The fill character cannot be converted to 
Unicode")
     if len(u) != 1:
        raise TypeError("The fill character must be exactly one 
character long"))
     return u[0]

convert_uc() accepts an byte string of 1 ASCII.

string_count() uses PyArg_ParseTuple(args, "O...", ...) and then test 
the substring type.
msg123483 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-06 18:25
As a feature request for 2.x, I think this should be rejected.

Any objections?

The "behavior" part seem to have been fixed.
History
Date User Action Args
2010-12-15 19:40:58belopolskysetstatus: pending -> closed
nosy: rhettinger, belopolsky, vstinner, ezio.melotti, ignas
2010-12-06 18:25:13belopolskysetstatus: open -> pending

type: behavior -> enhancement
assignee: belopolsky

nosy: + belopolsky
messages: + msg123483
resolution: rejected
2009-05-04 12:58:26vstinnersetmessages: + msg87123
2009-05-04 12:38:30vstinnersetmessages: + msg87122
versions: + Python 2.7, - Python 2.5, Python 2.4, Python 3.0
2009-05-04 12:36:41vstinnersetmessages: + msg87121
2009-03-17 12:10:59vstinnersetnosy: + vstinner
messages: + msg83669
2009-02-24 07:19:36rhettingersetnosy: + rhettinger
messages: + msg82660
2009-02-20 06:13:59ezio.melottisetnosy: + ezio.melotti
messages: + msg82514
versions: + Python 2.6, Python 3.0
2008-07-25 15:58:23ignascreate