This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: str.join([ str-subtype-instance ]) misbehaves
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: lemburg, mwh, ncoghlan, niemeyer, rhettinger, terry.reedy, tim.peters, twouters
Priority: normal Keywords:

Created on 2004-07-31 00:08 by twouters, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (14)
msg21859 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2004-07-31 00:08
Joining a list of string subtype instances usually
results in a single string instance:

  >>> class mystr(str): pass
  >>> type("".join([mystr("a"), mystr("b")]))
  <type 'str'>

But if the list only contains one object that is a
string subtype instance, that instance is returned
unchanged:

  >>> type("".join([mystr("a")]))
  <class '__main__.mystr'>

This can have odd effects, for instance when the result
of "".join(lst) is used as the returnvalue of a __str__
hook. "".join should perhaps return the type of the
joining string, but definately vary its type based on
the *number* of items its joining.

msg21860 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2004-08-02 14:25
Logged In: YES 
user_id=6656

What are you asking?  I agree it's a bug.  I'm sure you're 
competent to write a patch :-)
msg21861 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2004-08-04 19:39
Logged In: YES 
user_id=593130

This behavior does not, to me, clearly violate the current doc:
"Return a string which is the concatenation of the strings in 
the sequence seq"
where string is bytestring or Unicodestring.  If one takes
'string' narrowly, then your subclass instances should be 
rejected as input.  If one takes 'string' more broadly as 
isinstance(s,basestring) then your subclass should be equally 
acceptible as input or output.  If neither consistent 
interpretation of 'string' is meant, then there is a doc bug, or 
at least an underspecification.

Workaround 0: if len(seq) == 1: ...
Workaround 1. map(str, seq)) to force str out.

*However*, in playing around (in 2.2), I discovered:

>>> type(''.join((a)))
<type 'str'>
>>> type(''.join([a]))
<class '__main__.ms'>
>>> type(''.join({a:None}))
<class '__main__.ms'>

Having the type of the join of a singleton depend on the type 
(mutability?) of the singleton wrapper is definitely disquieting.

Workaround 2: tuple(seq)
msg21862 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-08-04 20:28
Logged In: YES 
user_id=38388

I agree with Terry. The result type is defined by the
semantics or the list elements and the length of the list:

len(list) > 1:
sep.join(list) := list[0] + sep + ... + sep + list[n]

len(list) == 1:
sep.join(list) := list[0]

len(list) == 0:
sep.join(list) := sep[:0]
msg21863 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2004-08-05 12:04
Logged In: YES 
user_id=6656

A clue for Terry: think about what "(a)" isn't :-)

I initially agreed that this was a bug because, e.g.
str_subclass()[:] returns a str.  Isn't this the same sort
of thing?

msg21864 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2004-08-05 16:10
Logged In: YES 
user_id=593130

Duh, my turn to forget. For any beginners reading this ...
>>> class ms(str): pass
...
>>> a=ms('a')
>>> type(''.join((a,)))
<class '__main__.ms'>

Expanding mhw's second point:

>>> e=ms()
>>> type(e)
<class '__main__.ms'>
>>> import copy
>>> e2=copy.copy(e)
>>> type(e2)
<class '__main__.ms'>
>>> e3=e[:]
>>> type(e3)
<type 'str'>
>>> id(e),id(e2),id(e3)
(9494608, 9009936, 8577440)

so [:] is not exactly an abbreviated synonym for copy().  Is 
this a butg?  (I haven't rechecked the respective docs yet.)

One reason I hesitate to call the OP's original observation a 
bug is that the whole sujbect of operations on subtype 
instances seems not completely baked.  Knowing the result 
types in all cases may require experiments as well as doc 
reading.
msg21865 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2004-08-07 15:48
Logged In: YES 
user_id=7887

If this was considered a bug: 
 
>>> type(ms("a")+ms("b")) 
<type 'str'> 
 
>>> type(ms("a")[:]) 
<type 'str'> 
 
Are these bugs as well? 
 
I belive this is how the implementation was intended to be, even if not 
optimal for subclasses. 
 
I suggest closing this bug as invalid, and writing a PEP about the possible new 
subclass support change (for all classes), if there's enough interest. 
 
msg21866 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2004-08-07 22:17
Logged In: YES 
user_id=34209

The point of the original bugreport is not that some
operations return strings instead of subtypes. The point is
that *one* operation *sometimes* returns subtypes. It's
inconsistent and unexpected behaviour, and since you clearly
don't write 'sep.join(seq)'  for a common case of 'seq'
being a single item, something you will only occasionally
trigger. I don't have an emotional investment in this bug,
it's just something that came up on #python. I also don't
care which way it's fixed -- but treating the
single-element-sequence case the same as the
multiple-element-sequence seems logical to me. Regardless of
how the multiple-element-sequence is handled exactly :)

As for why I didn't write a patch myself, Michael, if I had
time for that, I would've spent it writing a good decorator
proposal >:-)
msg21867 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2004-08-11 07:00
Logged In: YES 
user_id=1038590

Don't know about anyone else, but the shortcut in str.join
that returns a reference to the *original* string in the
case of a single item sequence strikes me as very bad ju-ju:

>>> class mystr(str): pass
...
>>> s1 = mystr('fred')
>>> s1
'fred'
>>> s1.mutable = 42
>>> s1.mutable
42
>>> s2 = ''.join([s1])
>>> s2.mutable
42

When I call join, I expect to get a *new* object back, not a
reference to an old one (this is safe for standard strings,
but not for subclasses which may have mutable state).
msg21868 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2004-08-11 08:04
Logged In: YES 
user_id=1038590

New patch (#1007087) created with a test for this bug, as
well as a fix for it (the fix simply removes the 'sequence
of 1' shortcut).

Checks for the unicode case as well, although unicode didn't
have this bug (due to a different join implementation).
msg21869 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-08-19 17:46
Logged In: YES 
user_id=80475

Was this one fixed?
msg21870 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-08-19 18:22
Logged In: YES 
user_id=31435

I think the patch is still awaiting review.
msg21871 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2004-08-19 21:13
Logged In: YES 
user_id=1038590

I've just assigned the relevant patch to Tim for review. The
latest version should address his concerns with the original
fix (which didn't use the optimised path, even when it was
safe).
msg21872 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-08-23 23:25
Logged In: YES 
user_id=80475

Fixed.
See:
   Objects/stringobject.c 2.225
   Lib/test/test_string.py 1.26

History
Date User Action Args
2022-04-11 14:56:06adminsetgithub: 40662
2004-07-31 00:08:09twouterscreate