Message 87123 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, ignas, rhettinger, vstinner
Date	2009-05-04.12:58:26
SpamBayes Score	7.032194e-07
Marked as misclassified	No
Message-id	<1241441908.38.0.732899725763.issue3446@psf.upfronthosting.co.za>
In-reply-to

Content
The question is why str.{ljust,rjust,center} doesn't accept unicode argument, whereas unicode.{ljust,rjust,center} accept ASCII string. Other string methods accept unicode argument, like str.count() (encode the unicode string to bytes using utf8 charset). To be consistent with other string methods, str.{ljust,rjust,center} should accept unicode string and convert them to byte string using utf8, like str.count does. But I hate such implicit conversion (I prefer Python3 way: disallow mixing bytes and characters), so I will not contribute to such patch. Can you write such patch? -- str.{ljust,rjust,center} use PyArg_ParseTuple(args, "n\|c:...", ...) and getarg('c') which only accepts a string of 1 byte. unicode.{ljust,rjust,center} use PyArg_ParseTuple(args, "n\| O&:...", ..., convert_uc, ...) where convert_uc looks something like: def convert_uc(o): try: u = unicode(o) except: raise TypeError("The fill character cannot be converted to Unicode") if len(u) != 1: raise TypeError("The fill character must be exactly one character long")) return u[0] convert_uc() accepts an byte string of 1 ASCII. string_count() uses PyArg_ParseTuple(args, "O...", ...) and then test the substring type.

The question is why str.{ljust,rjust,center} doesn't accept unicode 
argument, whereas unicode.{ljust,rjust,center} accept ASCII string. 
Other string methods accept unicode argument, like str.count() (encode 
the unicode string to bytes using utf8 charset).

To be consistent with other string methods, str.{ljust,rjust,center} 
should accept unicode string and convert them to byte string using 
utf8, like str.count does. But I hate such implicit conversion (I 
prefer Python3 way: disallow mixing bytes and characters), so I will 
not contribute to such patch.

Can you write such patch?

--

str.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|c:...", ...) 
and getarg('c') which only accepts a string of 1 byte.

unicode.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|
O&:...", ..., convert_uc, ...) where convert_uc looks something like:

  def convert_uc(o):
     try:
        u = unicode(o)
     except:
        raise TypeError("The fill character cannot be converted to 
Unicode")
     if len(u) != 1:
        raise TypeError("The fill character must be exactly one 
character long"))
     return u[0]

convert_uc() accepts an byte string of 1 ASCII.

string_count() uses PyArg_ParseTuple(args, "O...", ...) and then test 
the substring type.

History
Date	User	Action	Args
2009-05-04 12:58:28	vstinner	set	recipients: + vstinner, rhettinger, ezio.melotti, ignas
2009-05-04 12:58:28	vstinner	set	messageid: <1241441908.38.0.732899725763.issue3446@psf.upfronthosting.co.za>
2009-05-04 12:58:26	vstinner	link	issue3446 messages
2009-05-04 12:58:26	vstinner	create