Message 84170 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	loewis
Recipients	loewis, pitrou
Date	2009-03-26.00:43:20
SpamBayes Score	8.601883e-08
Marked as misclassified	No
Message-id	<1238028203.93.0.357743797777.issue5562@psf.upfronthosting.co.za>
In-reply-to

Content
I think the problem is that creation of the Unicode string defaults to UTF-8. It should instead use the locale's encoding. You are right that it could be an issue that there is no Python codec for the locale's encoding. To be robust against this case, I think the locale's mbcs->wcs routines should be used (i.e. mbstowcs). Better yet, use wcsftime in the first place. AFAICT, wcsftime is C99, so not all systems might support it. However, it appears that MSVC has it, so we could assume it exists and wait until someone complains. One issue apparently is that some implementations of wcsftime expect the format as char* (and again, I would defer dealing with that until somebody complains). In either case, you end up with a wchar_t. In principle, the locale might use a non-Unicode wide charset for wchar_t, but these got out of use some time ago, and Python had always assumed that wchar_t is Unicode.

I think the problem is that creation of the Unicode string defaults to 
UTF-8. It should instead use the locale's encoding.

You are right that it could be an issue that there is no Python codec 
for the locale's encoding. To be robust against this case, I think the 
locale's mbcs->wcs routines should be used (i.e. mbstowcs). Better yet, 
use wcsftime in the first place. AFAICT, wcsftime is C99, so not all 
systems might support it. However, it appears that MSVC has it, so we 
could assume it exists and wait until someone complains. One issue 
apparently is that some implementations of wcsftime expect the format as 
char* (and again, I would defer dealing with that until somebody 
complains).

In either case, you end up with a wchar_t. In principle, the locale 
might use a non-Unicode wide charset for wchar_t, but these got out of 
use some time ago, and Python had always assumed that wchar_t is 
Unicode.

History
Date	User	Action	Args
2009-03-26 00:43:24	loewis	set	recipients: + loewis, pitrou
2009-03-26 00:43:23	loewis	set	messageid: <1238028203.93.0.357743797777.issue5562@psf.upfronthosting.co.za>
2009-03-26 00:43:22	loewis	link	issue5562 messages
2009-03-26 00:43:21	loewis	create