This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author kristjan.jonsson
Recipients kristjan.jonsson, pitrou
Date 2012-11-16.15:39:37
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1353080377.76.0.102275796947.issue16475@psf.upfronthosting.co.za>
In-reply-to
Content
Basically, reuse of strings (and preservation of their internment status) fell by the wayside somewhere in the 3.x transition.  Strings have been reused, and interned strings re-interned, since protocol version 1 in 2.x.  This patch adds that feature back, and uses that mechanism to reuse not only strings, but also any other multiply-referenced object.

It is not desirable to simply intern all strings that are read from marshaled data.  Only selected strings are interned by python during compilation and we want to keep it that way.  Also, 2.x reuses not only interned strings but other strings as well.

Generalizing reuse of strings to other objects is trivial, and a logical step forward.  This allows optimizations to be made on code objects where common data are identified and instanced, and those code objects to be saved and reloaded with that instancing intact.

But even without such code-object optimization, the changes are significant:
The sizes of the marshaled code object of lib/test/test_marshal drops from 24093 bytes in version 2 to 17841 bytes with version 3, without any additional massaging of the module code object.
History
Date User Action Args
2012-11-16 15:39:37kristjan.jonssonsetrecipients: + kristjan.jonsson, pitrou
2012-11-16 15:39:37kristjan.jonssonsetmessageid: <1353080377.76.0.102275796947.issue16475@psf.upfronthosting.co.za>
2012-11-16 15:39:37kristjan.jonssonlinkissue16475 messages
2012-11-16 15:39:37kristjan.jonssoncreate