This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients loewis, serhiy.storchaka, vstinner
Date 2014-09-03.07:10:35
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1409728236.26.0.3865327258.issue22324@psf.upfronthosting.co.za>
In-reply-to
Content
> Will not this cause performance regression? When we hardly work with wchar_t-based API, it looks good to cache encoded value.

Yes, it will be slower. But I prefer slower code with a lower memory footprint. On UNIX, I don't think that anyone will notice the difference.

My concern is that the cache is never released. If the conversion is only needed once at startup, the memory will stay until Python exits. It's not really efficient.

On Windows, conversion to wchar_t* is common because Python uses the Windows wide character API ("W" API vs "A" ANSI code page API). For example, most access to the filesystem use wchar_t* type.

On Python < 3.3, Python was compiled in narrow mode and so Unicode was already using wchar_t* internally to store characters. Since Python 3.3, Python uses a more compact representation. wchar_t* shares Unicode data only if sizeof(wchar_t*) == KIND where KIND is 1, 2 or 4 bytes per character. Examples: "\u20ac" on Windows (16 bits wchar_t) or "\U0010ffff" on Linux (32 bits wchar_t) .
History
Date User Action Args
2014-09-03 07:10:36vstinnersetrecipients: + vstinner, loewis, serhiy.storchaka
2014-09-03 07:10:36vstinnersetmessageid: <1409728236.26.0.3865327258.issue22324@psf.upfronthosting.co.za>
2014-09-03 07:10:36vstinnerlinkissue22324 messages
2014-09-03 07:10:35vstinnercreate