Author certik
Recipients Arfrever, alex, certik, dmalcolm, haypo, loewis, ncoghlan, pitrou, skrah, teoliphant
Date 2012-08-03.00:21:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1343953313.37.0.330986246841.issue15540@psf.upfronthosting.co.za>
In-reply-to
Content
I wrote this initial patch for the issue last week:

https://github.com/numpy/numpy/pull/366

with huge help from Stefan and others.

As far as the unicode issue goes, Travis and I just talked about this and I think I now understand what is going on ---- the unicode type itself (as returned by the PyArray_Scalar() function in NumPy) should *never* have the byte swapped internals.

In other words, the usage of the byte swapping is that if numpy happens to be pointing to a memory with byte swapped data (for example you save some data on big endian and you load it on little endian), let's say you have some strings (unicode). They will always be UCS4 inside numpy, possibly swapped. When the user actually calls things like my_array[1], then the PyArray_Scalar() looks at the memory, does any swapping (if necessary) and returns a valid unicode object on the current platform (with the correct endianness). The returned unicode can have any length (UCS1, UCS2 or UCS4 -- whatever Python likes), that doesn't really matter.

So no changes are necessary to Python itself. As far as NumPy goes -- the tests are obviously wrong, because they happen to create unicode that is invalid. So the NumPy tests need to be fixed.

Otherwise there is no problem. I am now working on a better version of my patch, that doesn't need to be forcing the unicode to be UCS4 so that it can swap its contents.
History
Date User Action Args
2012-08-03 00:21:53certiksetrecipients: + certik, loewis, teoliphant, ncoghlan, pitrou, haypo, Arfrever, alex, skrah, dmalcolm
2012-08-03 00:21:53certiksetmessageid: <1343953313.37.0.330986246841.issue15540@psf.upfronthosting.co.za>
2012-08-03 00:21:52certiklinkissue15540 messages
2012-08-03 00:21:50certikcreate