classification
Title: ctypes.cast(obj,ctypes.c_void_p) invalid return in linux_x64
Type: behavior Stage: resolved
Components: ctypes Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, fooofei
Priority: normal Keywords:

Created on 2017-06-12 08:56 by fooofei, last changed 2017-06-12 10:58 by eryksun. This issue is now closed.

Messages (2)
msg295758 - (view) Author: fooofei (fooofei) Date: 2017-06-12 08:56
module:ctypes
pyversion: 2.7.13
python platform : win32, linux_x86_x64

I use ctypes.cast(v,ctypes.c_void_p).value to get address of 'helloworld' and u'helloworld' internal buffer address.

the result is both right in win32,but not in linux.

'helloworld'  is right, u'helloworld' is invalid.

please see 
https://github.com/fooofei/py_string_address
https://github.com/fooofei/py_string_address/blob/master/issue.py
msg295766 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-06-12 10:58
It's undocumented that cast() should work to directly convert Python strings to pointers. Even when it seems to work, it's a risky thing to depend on because there's no source ctypes data object to reference. Thus there's neither _b_base_ nor anything in _objects to support the reference. If the string has since been deallocated, the pointer is invalid.

What you've uncovered is an implementation detail. Windows has a 16-bit unsigned wchar_t type, so HAVE_USABLE_WCHAR_T is defined when building the default narrow build in Python 2. In this case ctypes can use PyUnicode_AS_UNICODE, which is why you can get the base address of the unicode object's internal buffer on Windows. 

Linux systems define wchar_t as a 4-byte signed value. IIRC it's a typedef for int. Because wchar_t is signed in this case, HAVE_USABLE_WCHAR_T is not defined even for a wide build. ctypes has to temporarily copy the string via PyUnicode_AsWideChar. It references the memory in a capsule object. You can see this by constructing a c_wchar_p instance, for example:

    >>> p = ctypes.c_wchar_p(u'helloworld')
    >>> p._objects
    <capsule object "_ctypes/cfield.c wchar_t buffer from unicode" at 0x7fedb67d5f90>

In your case, by the time you actually look at the address, the capsule has been deallocated, and the memory is no longer valid. For example:

    >>> addr = ctypes.cast(u'helloworld', ctypes.c_void_p).value
    >>> ctypes.wstring_at(addr, 10)
    u'\U0150ccf0\x00\U0150cc00\x00oworld'

It works as expected if one instead casts a c_wchar_p instance, which references the capsule to keep the memory alive:

    >>> addr = ctypes.cast(p, ctypes.c_void_p).value
    >>> ctypes.wstring_at(addr, 10)
    u'helloworld'

However, that's not what you want since we know it's a copy. I think your only option is to use the C API via ctypes.pythonapi. For example:

    ctypes.pythonapi.PyUnicodeUCS4_AsUnicode.argtypes = (ctypes.py_object,)
    ctypes.pythonapi.PyUnicodeUCS4_AsUnicode.restype = ctypes.c_void_p

    s = u'helloworld'
    addr = ctypes.pythonapi.PyUnicodeUCS4_AsUnicode(s)

    >>> ctypes.wstring_at(addr, 10)
    u'helloworld'

On narrow builds this function is exported a PyUnicodeUCS2_AsUnicode.
History
Date User Action Args
2017-06-12 10:58:22eryksunsetstatus: open -> closed

nosy: + eryksun
messages: + msg295766

resolution: not a bug
stage: resolved
2017-06-12 08:56:44fooofeicreate