classification
Title: Improve %s support for unicode
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: effbot Nosy List: effbot, lemburg, nascheme (3)
Priority: normal Keywords patch

Created on 2005-03-09 01:43 by nascheme, last changed 2005-08-22 20:57 by nascheme.

Files
File name Uploaded Description Edit Remove
unicode_format.txt nascheme, 2005-03-09 01:43
pyobject_text.txt nascheme, 2005-03-10 21:13 version 2 of patch
Messages (8)
msg47901 - (view) Author: Neil Schemenauer (nascheme) Date: 2005-03-09 01:43
"'%s' % unicode_string" produces a unicode result.  I
think the following code should also return a unicode
string:

class Wrapper:
....def __str__(self):
........return unicode_string
'%s' % Wrapper()

That behavior would make it easier to write library
code that can work with either str objects or unicode
objects.

The fix is pretty simple (see that attached patch). 
Perhaps the PyObject_Text function should be called
_PyObject_Text instead.  Alternatively, if the function
is make public then we should document it and perhaps
also provide a builtin function called 'text' that uses it.


msg47902 - (view) Author: Marc-Andre Lemburg (lemburg) Date: 2005-03-09 10:10
Logged In: YES 
user_id=38388

Nice patch. 

Only nit: PyObject_Text() should check that the result of
tp_str() is indeed either a string or unicode instance
(possibly from a subclass). Otherwise, the function wouldn't
be able to guarantee this feature - which is what it's all
about.
msg47903 - (view) Author: Neil Schemenauer (nascheme) Date: 2005-03-10 21:12
Logged In: YES 
user_id=35752

Attaching a better patch.  Add a builtin function called
"text".  Change PyObject_Text to check the return types as
suggested by Mark.  Update the documentation and the tests.
msg47904 - (view) Author: Neil Schemenauer (nascheme) Date: 2005-03-10 21:13
Logged In: YES 
user_id=35752

attempt to attach patch again
msg47905 - (view) Author: Neil Schemenauer (nascheme) Date: 2005-04-20 21:00
Logged In: YES 
user_id=35752

Assigning to effbot for review.  He had mentioned something
about __text__ at one point.
msg47906 - (view) Author: Marc-Andre Lemburg (lemburg) Date: 2005-04-20 21:27
Logged In: YES 
user_id=38388

Looks OK to me; not sure what you mean with __text__ -
__str__ already has taken that role long ago.
msg47907 - (view) Author: Neil Schemenauer (nascheme) Date: 2005-04-20 21:46
Logged In: YES 
user_id=35752

Here's a quote from him:
> I'm beginning to think that we need an extra method
(__text__), that
> can return any kind of string that's compatible with
Python's text model.
>
> (in today's CPython, that's an 8-bit string with ASCII
only, or a Uni-
> code string.  future Python's may support more string
types, at least at
> the C implementation level).
>
> I'm not sure we can change __str__ or __unicode__ without
breaking
> code in really obscure ways (but I'd be happy to be proven
wrong).

My idea is that we can change __str__ without breaking code.
 The reason is that no one should be calling tp_str
directly.  Instead they use PyObject_Str.

I don't know what he meant by "string that's compatible with
Python's text model".  With my change, Python can only deal
with str or unicode instances.  I have no idea how we could
support other string implementations.

I don't want to introduce a text() builtin that calls
__str__ and then later realize that __text__ would be a
useful.  Perhaps this change is big enough to require a PEP.
msg47908 - (view) Author: Neil Schemenauer (nascheme) Date: 2005-08-22 20:57
Logged In: YES 
user_id=35752

Closing in favor of patch 1266570.
History
Date User Action Args
2005-03-09 01:43:16naschemecreate