Issue460020
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2001-09-09 15:41 by doerwalter, last changed 2022-04-10 16:04 by admin. This issue is now closed.
Messages (23) | |||
---|---|---|---|
msg6461 - (view) | Author: Walter Dörwald (doerwalter) * ![]() |
Date: 2001-09-09 15:41 | |
The unicode constructor returns the object passed in, when an instance of a subclass of unicode is passed in: -- class U(unicode): pass u1 = U(u"foo") print type(u1) u2 = unicode(u1) print type(u2) -- this gives -- <type '__main__.U'> <type '__main__.U'> -- instead of -- <type '__main__.U'> <type 'unicode'> -- as it probably should be (The unicode constructor should construct unicode objects). With the current behaviour it is nearly impossible to construct a unicode object with the value of an instance of a unicode subclass, because most methods are optimized to return the original object if possible, e.g. -- print type(unicode.__getslice__(u1, 0, 3)) print type(unicode.__getslice__(u1, 0, 2)) -- gives -- <type '__main__.U'> <type 'unicode'> -- This should be made consistent, so that either a unicode object is always returned, or all methods use a "virtual constructor", i.e. create an object of the type passed in. This would simplify deriving classes from unicode as far fewer methods have to be overwritten. But first of all the constructor should be fixed, so that the argument is returned unmodified only when it is an instance of unicode and not of a unicode subclass. |
|||
msg6462 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2001-09-10 14:48 | |
Logged In: YES user_id=6380 Good catch! Other types also suffer from this, e.g. int. added to my to-do list. |
|||
msg6463 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-10 20:45 | |
Logged In: YES user_id=31435 Reassigned to me. |
|||
msg6464 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-10 20:57 | |
Logged In: YES user_id=31435 Partially repaired (for int and long) in: Include/intobject.h; new revision: 2.24 Include/longintrepr.h; new revision: 2.12 Include/longobject.h; new revision: 2.24 Lib/test/test_descr.py; new revision: 1.33 Objects/abstract.c; new revision: 2.75 Objects/longobject.c; new revision: 1.104 |
|||
msg6465 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-10 21:29 | |
Logged In: YES user_id=31435 float() also repaired, in Include/floatobject.h; new revision: 2.20 Lib/test/test_descr.py; new revision: 1.34 Objects/abstract.c; new revision: 2.76 |
|||
msg6466 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-10 23:39 | |
Logged In: YES user_id=31435 tuple() repaired, in Include/tupleobject.h; new revision: 2.27 Lib/test/test_descr.py; new revision: 1.36 Objects/abstract.c; new revision: 2.77 |
|||
msg6467 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-11 01:43 | |
Logged In: YES user_id=31435 str() repaired (yes, unicode is next <wink>), in Include/stringobject.h; new revision: 2.31 Lib/test/test_descr.py; new revision: 1.37 Objects/object.c; new revision: 2.146 Objects/stringobject.c; new revision: 2.130 |
|||
msg6468 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-11 03:09 | |
Logged In: YES user_id=31435 unicode() repaired in Include/unicodeobject.h; new revision: 2.33 Lib/test/test_descr.py; new revision: 1.39 Objects/unicodeobject.c; new revision: 2.111 |
|||
msg6469 - (view) | Author: Walter Dörwald (doerwalter) * ![]() |
Date: 2001-09-11 11:31 | |
Logged In: YES user_id=89016 Thanks for the quick fix, but the second problem still remains: --- class U(unicode): pass u = U(u"foo") print type(u[0:3]) print type(u[0:2]) --- This gives: --- <type '__main__.U'> <type 'unicode'> --- I think this should be changed to either always return a unicode object, or to always return an instance of the real class passed in. (This should be done for all unicode methods that return a new unicode object). The second solution would simplify creating derived classes, because all the methods that return unicode objects would automatically return the derived type, so these methods don't have to be overwritten. |
|||
msg6470 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2001-09-11 12:01 | |
Logged In: YES user_id=6380 You're asking for the impossible though. I don't think any other OO language supports this automatically (although I could be wrong). The problem is, what to do with a subclass of unicode like this: class U(unicode): def __init__(self, arg): self.orig = arg How is U("foobar")[0:3] going to know what argument to pass in to __init__? The base class simply can't know what additional invariants the subclass imposes. |
|||
msg6471 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2001-09-11 12:04 | |
Logged In: YES user_id=6380 Apologies. I missed half of what you were asking. It's impossible for U(...)[0:2] to return a U instance, but I agree that then at least then it should *always* return a unicode instance. So this is still open. For Tim: the problem is that a slice (or other) operation may decide to return the original object unchanged; this should (probably?) only be done when the original object is exactly a unicode instance. I'm afraid that we'll have to systematically look through all 144 Unicode methods to see where they exhibit this behavior. |
|||
msg6472 - (view) | Author: Walter Dörwald (doerwalter) * ![]() |
Date: 2001-09-11 14:03 | |
Logged In: YES user_id=89016 > You're asking for the impossible though. > I don't think any other OO language supports > this automatically (although I > could be wrong). Python uses it, e.g. in Lib/UserString.py: def rstrip(self): return self.__class__(self.data.rstrip ()) So if someone derives a new class X from UserString, calling X("y ").rstrip() returns an X object. The only assumption that UserString makes, is that the derived class has a constructor that can handle at least the same arguments as UserString.__init__. This "virtual constructor" is used in several places: grep -l "self.__class__(" `find -name '*.py' | grep -v Mac` returns: ./dist/src/Lib/UserString.py ./dist/src/Lib/copy.py ./dist/src/Lib/MimeWriter.py ./dist/src/Lib/test/test_descr.py ./dist/src/Lib/xml/sax/xmlreader.py ./dist/src/Lib/UserList.py ./dist/src/Demo/pdist/rcvs.py |
|||
msg6473 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2001-09-11 14:49 | |
Logged In: YES user_id=6380 > Python uses it, e.g. in Lib/UserString.py: [and other cases] Yes, and I'm no longer comfortable with such code, for exactly the reason I mentioned, unless it's an explicit and intentional part of the class specification. :-( Doing this consistenyly for all built-in types would cause too much upheaval -- we'd have to change every single built-in operation. But the other interpretation stands: unicode (and other) operations should only optimize by returning "self" when self is a strict instance of the type. |
|||
msg6474 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-11 16:56 | |
Logged In: YES user_id=31435 Trying to change Resolution to something sensible ("Accepted" doesn't make sense). |
|||
msg6475 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-11 16:59 | |
Logged In: YES user_id=31435 Oh well -- it's stuck at "Accepted". |
|||
msg6476 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-11 19:50 | |
Logged In: YES user_id=31435 Here we go again. For tuples, hunted down and disabled t [:], t*0 and t*1 optimizations when t is of a tuple subclass type: Lib/test/test_descr.py; new revision: 1.41 Objects/tupleobject.c; new revision: 2.60 More later (this is time-consuming work). |
|||
msg6477 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-11 21:45 | |
Logged In: YES user_id=31435 For I a subclass of int, disabled the +I(whatever) I(0) << whatever I(0) >> whatever I(whatever) << 0 I(whatever) >> 0 optimizations, in Lib/test/test_descr.py; new revision: 1.42 Objects/intobject.c; new revision: 2.74 |
|||
msg6478 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-11 21:55 | |
Logged In: YES user_id=31435 For F a subclass of float, disabled the +F(whatever) optimization, in Lib/test/test_descr.py; new revision: 1.43 Objects/floatobject.c; new revision: 2.98 |
|||
msg6479 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-11 22:32 | |
Logged In: YES user_id=31435 A number of similar long optimizations were disabled for long subclasses, in Lib/test/test_descr.py; new revision: 1.44 Objects/longobject.c; new revision: 1.105 |
|||
msg6480 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-12 02:19 | |
Logged In: YES user_id=31435 Lots of str optimizations inhibited ("the usual", + replace, translate, ljust, rjust, center, strip), in Lib/test/test_descr.py; new revision: 1.45 Objects/stringobject.c; new revision: 2.131 |
|||
msg6481 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-12 03:04 | |
Logged In: YES user_id=31435 And lots of unicode optimizations (on subclass instances) were disabled in Lib/test/test_descr.py; new revision: 1.46 Objects/unicodeobject.c; new revision: 2.112 |
|||
msg6482 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-12 08:11 | |
Logged In: YES user_id=31435 Additional patches: + Repaired hash() applied to str and unicode subclass instances (was always returning 0, with baffling consequences for dict operations). + Ensured that interning an object of a str subclass interned a genuine string (w/ the same value) instead. The complex type got overlooked in all this, so keeping this open until that's done too. |
|||
msg6483 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2001-09-12 19:14 | |
Logged In: YES user_id=31435 Similar changes also completed for the complex type, and closing this bug report as Fixed again: Include/complexobject.h; new revision: 2.9 Lib/test/test_descr.py; new revision: 1.49 Objects/complexobject.c; new revision: 2.45 Objects/floatobject.c; new revision: 2.99 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:04:25 | admin | set | github: 35142 |
2001-09-09 15:41:14 | doerwalter | create |