Author mrabarnett
Recipients Arfrever, ezio.melotti, jkloth, lemburg, mrabarnett, pitrou, r.david.murray, tchrist, terry.reedy
Date 2011-08-15.11:30:37
SpamBayes Score 1.32049e-05
Marked as misclassified No
Message-id <1313407838.22.0.985651825404.issue12729@psf.upfronthosting.co.za>
In-reply-to
Content
For what it's worth, I've had idea about string storage, roughly based on how *nix stores data on disk.

If a string is small, point to a block of codepoints.

If a string is medium-sized, point to a block of pointers to codepoint blocks.

If a string is large, point to a block of pointers to pointer blocks.

This means that a large string doesn't need a single large allocation.

The level of indirection can be increased as necessary.

For simplicity, all codepoint blocks contain the same number of codepoints, except the final codepoint block, which may contain fewer.

A codepoint block may use the minimum width necessary (1, 2 or 4 bytes) to store all of its codepoints.

This means that there are no surrogates and that different sections of the string can be stored in different widths to reduce memory usage.
History
Date User Action Args
2011-08-15 11:30:38mrabarnettsetrecipients: + mrabarnett, lemburg, terry.reedy, pitrou, jkloth, ezio.melotti, Arfrever, r.david.murray, tchrist
2011-08-15 11:30:38mrabarnettsetmessageid: <1313407838.22.0.985651825404.issue12729@psf.upfronthosting.co.za>
2011-08-15 11:30:37mrabarnettlinkissue12729 messages
2011-08-15 11:30:37mrabarnettcreate