Author mark.dickinson
Date 2008-11-30.17:32:51
I'm now very confused.

In trying to follow things of type wchar_t* around the Python source, I 
discovered PyUnicode_FromWideChar in unicodebject.c.  For OS X, the 
conversion lands in the following code, where w is the incoming WideChar 
array, declared as wchar_t *.

	register Py_UNICODE *u;
	register Py_ssize_t i;
	u = PyUnicode_AS_UNICODE(unicode);
	for (i = size; i > 0; i--)
	    *u++ = *w++;

But this looks wrong:  on OS X, sizeof(wchar_t) is 4 and I think w is 
encoded in UTF-32.  So I was expecting to see some kind of explicit 
conversion from UTF-32 to UCS-2 here.  Instead, it looks as though the 
incoming values are implicitly truncated from 32 bits to 16.  Doesn't this 
do the wrong thing for characters outside the BMP?

Should I open an issue for this, or am I simply misunderstanding?
