Message255240
On Mon, Nov 23, 2015 at 09:48:46PM +0000, STINNER Victor wrote:
> * the string has a cached UTF-8 byte string (ex: int(s) was called before the resize)
Why do strings cache their UTF-8 encoding?
I presume that some of Python's internals rely on the UTF-8 encoding
rather than the internal Latin-1/UCS-2/UTF-32 representation (PEP 393).
E.g. I infer from the above that int(s) parses the UTF-8 representation
of s rather than the internal representation. Is that right?
Nevertheless, I wonder why the UTF-8 representation is cached. Is it
that expensive to generate that it can't be done on the fly, as needed?
As it stands now, non-ASCII strings may be up to twice as big as they
need be, once you include the UTF-8 cache. And, as this bug painfully
shows, the problem with caches is that you run the risk of the cache
being out of date. |
|
Date |
User |
Action |
Args |
2015-11-24 01:30:49 | steven.daprano | set | recipients:
+ steven.daprano, lemburg, terry.reedy, pitrou, vstinner, larry, benjamin.peterson, ezio.melotti, serhiy.storchaka, eryksun, random832, Árpád Kósa |
2015-11-24 01:30:48 | steven.daprano | link | issue25709 messages |
2015-11-24 01:30:47 | steven.daprano | create | |
|