New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sys.sizeof test fails with wide unicode #47348
Comments
test test_sys failed -- Traceback (most recent call last):
File "/temp/python/trunk/Lib/test/test_sys.py", line 549, in
test_specialtypes
size2=basicsize + sys.getsizeof(str(s)))
File "/temp/python/trunk/Lib/test/test_sys.py", line 429, in check_sizeof
self.assertEqual(result, size2, msg + str(size2))
AssertionError: wrong size for <type 'unicode'>: got 28, expected
50.5109328552 |
It was recommended by Georg that you expose Py_UNICODE_SIZE in the |
Are they any buildbots running with the "--enable-unicode=ucs4" option? |
I'm sure there wasn't any a few months ago. |
Do you really need to expose Py_UNICODE_SIZE? There is already |
It is true that sys.maxunicode reflects whether the build is using UCS-2 (Though I don't think we have platforms that actually *do* use sizes |
sys.maxunicode is well defined to be either 0xFFFF for UCS-2 Py_UNICODE_SIZE is set in pyconfig.h to be either 2 or 4 during Thus, it currently is possible to derive Py_UNICODE_SIZE from So here are 2 possible patches, one which exposes Py_UNICODE_SIZE via |
Personally, I prefer the one with _testcapi.Py_UNICODE_SIZE because it |
It's actually very easy: Py_UNICODE is a 2-byte value for UCS-2 builds and 4 byte value for UCS-4 print ((sys.maxunicode < 66000) and 'UCS2' or 'UCS4') tells you which one you have. Note that you should *not* use the exact value of 0x10FFFF for UCS-4 - The above comparison is good enough to detect the number of bytes in a |
BTW: Here's another trick you can use: print 'sizeof(Py_UNICODE) =', len(u'\0'.encode('unicode-internal')) (for Py2.x) |
Hmm, so it seems that in some UCS4 builds, sizeof(Py_UNICODE) could end However, Py_UNICODE.patch is wrong in that it uses Py_UNICODE_SIZE |
On 2008-06-13 21:56, Antoine Pitrou wrote:
AFAIK, only Crays have this problem, but apart from that: I'd consider |
Le vendredi 13 juin 2008 à 20:18 +0000, Marc-Andre Lemburg a écrit :
Perhaps a #error can be added to that effect? #if SIZEOF_INT == 4
typedef unsigned int Py_UCS4;
#elif SIZEOF_LONG == 4
typedef unsigned long Py_UCS4;
#else
#error Could not find a 4-byte integer type for Py_UCS4, aborting
#endif (of course we could also try harder to find an appropriate type, but I'm |
I think you're right that sizeof(Py_UNICODE) is the correct value to Also, len(u'\0'.encode('unicode-internal')) does not work for Py3.0. |
I believe Py_UNICODE_TYPE is set be configure in pyconfig.h. |
Found it, thanks. Wrong use of grep :| |
If I understand configure correctly, PY_UNICODE_TYPE is only set when |
Le dimanche 15 juin 2008 à 13:18 +0000, Robert Schuppenies a écrit :
Buf if PY_UNICODE_TYPE is not set in configure, unicodeobject.h tries to And Py_UCS4 itself will be larger than 4 bytes if the platform's int So if you want to be 100% correct, you should use |
Correct is good, so here is a patch which exposes the size of |
Looks good to me. |
On 2008-06-13 22:32, Antoine Pitrou wrote:
Sounds good !
Python should really try to use uint32_t as fallback solution for We'd have to add an AC_TYPE_INT32_T and AC_TYPE_INT16_T check to http://www.gnu.org/software/autoconf/manual/html_node/Particular-Types.html#Particular-Types and could then use typedef uint32_t Py_UCS4 and typedef uint16_t Py_UCS2 Note that the code for supporting UCS2/UCS4 is not really all that |
On 2008-06-13 21:54, Marc-Andre Lemburg wrote:
... and for Py3.x: print(len(u'\0'.encode('unicode-internal'))) There's really no need to drop to C to get at sizeof(Py_UNICODE). |
I followed Marc's advise and checked-in a corrected test. Besides, I opened a new issue to address the fallback solution for |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: