Message 123623 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ocean-city
Recipients	belopolsky, brian.curtin, ocean-city
Date	2010-12-08.17:46:23
SpamBayes Score	4.070684e-08
Marked as misclassified	No
Message-id	<1291830389.83.0.471441034195.issue10653@psf.upfronthosting.co.za>
In-reply-to

Content
I think this is locale problem. With "C" locale on windows, wcsftime doesn't return UTF16. (when non ascii characters are contained) It is just like .... char cbuf[] = "...."; /* contains non ascii chars in MBCS / wchar_t wbuf[sizeof(cbuf)]; for (size_t i = 0; i < sizeof(cbuf); ++i) wbuf[i] = cbuf[i]; / just copy it. non ascii chars in MBCS uses two bytes, but should use 1 char space in UTF16. But this case, it uses 2 chars space! (something strange encoding) */ In japanese, wcsftime returns non ascii characters for timezone in this strange encoding. Python converts this with #ifdef HAVE_WCSFTIME ret = PyUnicode_FromWideChar(outbuf, buflen); #else so Unicode object will contain data in this strange encoding. This is cause of problem. I investigated a little about locale, and I learned C standard does not guarantee wchar_t is always UTF16.

I think this is locale problem. With "C" locale on windows,
wcsftime doesn't return UTF16. (when non ascii characters
are contained)

It is just like ....
char cbuf[] = "...."; /* contains non ascii chars in MBCS */
wchar_t wbuf[sizeof(cbuf)];
for (size_t i = 0; i < sizeof(cbuf); ++i)
    wbuf[i] = cbuf[i];
/* just copy it. non ascii chars in MBCS uses two bytes,
   but should use 1 char space in UTF16. But this case,
   it uses 2 chars space! (something strange encoding) */

In japanese, wcsftime returns non ascii characters for
timezone in this strange encoding. Python converts this
with

#ifdef HAVE_WCSFTIME
            ret = PyUnicode_FromWideChar(outbuf, buflen);
#else

so Unicode object will contain data in this strange encoding.
This is cause of problem.

I investigated a little about locale, and I learned C
standard does not guarantee wchar_t is always UTF16.

History
Date	User	Action	Args
2010-12-08 17:46:29	ocean-city	set	recipients: + ocean-city, belopolsky, brian.curtin
2010-12-08 17:46:29	ocean-city	set	messageid: <1291830389.83.0.471441034195.issue10653@psf.upfronthosting.co.za>
2010-12-08 17:46:24	ocean-city	link	issue10653 messages
2010-12-08 17:46:23	ocean-city	create