classification
Title: Allow 1-character ASCII unicode where 1-character str is required
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, benjamin.peterson, gvanrossum, pitrou, serhiy.storchaka, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2013-12-18 15:59 by serhiy.storchaka, last changed 2015-02-16 22:35 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
getargs_c_unicode.patch serhiy.storchaka, 2013-12-18 15:59 review
getargs_c_unicode_2.patch serhiy.storchaka, 2014-03-04 14:27 review
Messages (15)
msg206529 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-18 15:59
In most cases when str object required, unicode object is allowed too. "s" and "z" codes (with modifiers) in PyArg_Parse*() accept both str and unicode instances. But "c" code accepts only 1-character str, not unicode. This makes harder writing version-agnostic code with imported unicode_literals (2.7 functions require bytes literals, 3.x functions require unicode literals) and breaks pickle compatibility (see issue13566).

This change will affect:

* str.ljust(), str.rjust() and str.center();
* '%c' % char;
* mmap.write_byte();
* array constructor and item setter for 'c' type;
* datetime.isoformat();
* bsddb.set_re_delim() and bsddb.set_re_pad();
* msvcrt.putch() and msvcrt.ungetch();
* swi.block.padstring().
msg206533 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-12-18 16:02
I don't like the heuristic of "ASCII only" characters. Accepting that may lead to bugs if later you pass a non-ASCII character.

And is it not too late to change that in Python 2.7? Version released 3 years ago and widely used in production.
msg206534 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-12-18 16:04
I guess it makes porting to Python 3 easier, but can we do this in a
stable release?
msg206545 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-18 17:39
> I don't like the heuristic of "ASCII only" characters. Accepting that may
> lead to bugs if later you pass a non-ASCII character.

What behavior you propose for non-ASCII values?

> And is it not too late to change that in Python 2.7? Version released 3
> years ago and widely used in production.

Python 3 released 5 years ago, but many peoples still support and write 
new software on Python 2. While 2.7 in use, new 2.7 versions which 
help porting and interoperability with Python 3 will be desirable.

See also issue19099.
msg206719 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-12-20 23:55
Victor and Stefan are correct. 2.7 is a fixed version of Python. CPython 2.7.z, z >= 1, only gets bug (and build) fixes. A 'new 2.7 version' would be 2.8, which will not happen. The fact that you propose to change the unambiguous doc shows that this is an enhancement, not a bugfix. This change would have had to be done in 2.7.0.
msg206722 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-12-21 10:35
Well, generally I'd be against adding features, but this particular one
could be rationalized in the same way as PEP 414.  So I'm simply unsure
whether the feature should be added, but *if* it's added, it should
be backed by a pronouncement either from the RM or Guido.
msg206725 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-12-21 11:07
PEP414 was about adding a feature to 3.3 well before the first alpha release. 

What Guido has recently said about 2.7 is that after 3 1/2 years we should concentrate on build issues such as came up with the new OSX and de-emphasize or even cease fixing bugs. He thinks that by now, people will have worked around the ones that matter.
msg212614 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-03 07:32
Re-opened due to Python-Dev discussion http://comments.gmane.org/gmane.comp.python.devel/146057.
msg212643 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-03 15:34
This sounds reasonable to me, but the patch lacks tests.
msg212644 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-03 15:36
However, do note that the semantics will end up different from other uses of unicode. e.g.:

>>> "aa".strip(u"b")
u'aa'

In str.strip(), passing an unicode parameter returns an unicode string. In str.ljust(), passing an unicode parameter will return a byte string.
msg212647 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-03-03 18:02
The behavior of str.strip is not what I would expect from reading 
'''str.strip([chars])
    Return a copy of the string with the leading and trailing characters removed.'''
but I guess it is consistent with a general rule that when mixing bytes and unicode (which was not always), the bytes are latin-1 decoded to unicode. However, the 'not always' part (str.strip yes, str.ljust no) made Python a bit inconsistent with itself.

Adding the unicode_literals import made Python more inconsistent with itself. "You can change byte literals to unicode literals, but if you do and you use one of the stdlib text apis that are bytes only, your program breaks."  This patch moves the inconsistency around a bit but does not remove it.  People who stick with 2.7 will have to live with inconsistency one way or another.

The turtle color issue is quite different in that it involve text names ('red') or encodings('#aabbcc') for color tuples that are quoted as text in order to be passed on to tkinter and tk (which wants unicode anyway).
msg212673 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-03 21:28
> However, do note that the semantics will end up different from other uses of 
unicode. e.g.:
> >>> "aa".strip(u"b")
> 
> u'aa'

And this behavior is weird.

>>> print 'À\n'.strip('\n')
À
>>> print 'À\n'.strip(u'\n')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
ordinal not in range(128)

The self argument of str.strip is variable, but the chars argument is almost  
always a literal and affected by unicode_literals future.
msg212721 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-04 14:27
Here is a patch with tests. Not all affected methods are tested because not all methods and modules have tests at all.
msg236099 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-16 10:10
What is your decision Guido and Benjamin?
msg236116 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-02-16 22:26
Let's not do this. The time to meddle with Python 2.7 details is long gone.
History
Date User Action Args
2015-02-16 22:35:33terry.reedysetstage: patch review -> resolved
2015-02-16 22:26:16gvanrossumsetstatus: open -> closed
resolution: wont fix
messages: + msg236116
2015-02-16 10:10:22serhiy.storchakasetnosy: + gvanrossum
messages: + msg236099
2014-05-22 22:06:50skrahsetnosy: - skrah
2014-03-04 14:27:22serhiy.storchakasetfiles: + getargs_c_unicode_2.patch

messages: + msg212721
2014-03-03 21:28:28serhiy.storchakasetmessages: + msg212673
2014-03-03 18:02:59terry.reedysetmessages: + msg212647
2014-03-03 15:36:46pitrousetmessages: + msg212644
2014-03-03 15:34:49pitrousetmessages: + msg212643
2014-03-03 07:32:24serhiy.storchakasetstatus: closed -> open
resolution: not a bug -> (no value)
messages: + msg212614

stage: resolved -> patch review
2013-12-21 11:07:55terry.reedysetmessages: + msg206725
2013-12-21 10:35:14skrahsetmessages: + msg206722
2013-12-20 23:55:56terry.reedysetstatus: open -> closed

type: enhancement

nosy: + terry.reedy
messages: + msg206719
resolution: not a bug
stage: patch review -> resolved
2013-12-18 17:39:56serhiy.storchakasetmessages: + msg206545
2013-12-18 16:04:13skrahsetnosy: + skrah, benjamin.peterson
messages: + msg206534
2013-12-18 16:02:38vstinnersetnosy: + vstinner
messages: + msg206533
2013-12-18 15:59:26serhiy.storchakacreate