This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients Arfrever, alex, certik, dmalcolm, loewis, ncoghlan, pitrou, skrah, teoliphant, vstinner
Date 2012-08-03.06:35:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <20120803083522.Horde.FdzmXqGZi1VQG3EqdU-WNkA@webmail.df.eu>
In-reply-to <2BF1B7E4-F115-4C3C-8011-822965DF8A98@gmail.com>
Content
> This is a mis-understanding of what NumPy does and why.    There is  
> a need to byte-swap only when the data is stored on disk in the  
> reverse order from the native machine

So is there ever a need to byte-swap Unicode strings? I can see how *numeric*
data are stored using the internal representation on disk; this is a common
technique. For strings, there is the notion of encodings which makes the
relationship between internal and disk representations. So if NumPy applies
the numeric concept to string data, then this is a flaw.

It may be that people really do store text data in the same memory blob
as numeric data and dump it to a file, but they really should think of this
data as "UTF-16-BE" or "UTF-32-LE" and the like, not in terms of byte  
swapping.
You can use PyUnicode_Decode to create a Unicode object given a void*,
a length, and a codec name. The concept "native Unicode representation"
does not exist - people use all of two-byte, four-byte and UTF-8  
representations
in memory, on a single processor architecture and operating system.

> The byte-swapping must be done prior to conversion to a Python  
> Unicode-Object when selecting data out of the array.

So if the byte swapping is done before the Unicode object is created:
why did Dave and Ondřej run into problems then?
History
Date User Action Args
2012-08-03 06:35:24loewissetrecipients: + loewis, teoliphant, ncoghlan, pitrou, vstinner, Arfrever, certik, alex, skrah, dmalcolm
2012-08-03 06:35:23loewislinkissue15540 messages
2012-08-03 06:35:23loewiscreate