classification
Title: Support binary symbol names
Type: enhancement Stage: resolved
Components: ctypes Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, belopolsky, eryksun, meador.inge, smurfix
Priority: normal Keywords:

Created on 2018-04-08 21:19 by smurfix, last changed 2018-04-08 23:15 by eryksun. This issue is now closed.

Messages (4)
msg315096 - (view) Author: Matthias Urlichs (smurfix) * Date: 2018-04-08 21:19
ctypes should support binary symbols.

Rationale: There's no requirement that the symbol name in question is encoded as ASCII or UTF-8.

>>> import ctypes
>>> t = type('iface', (ctypes.Structure,), {'_fields_': [(b'c_string_symbol', ctypes.CFUNCTYPE(ctypes.c_uint32))]})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '_fields_' must be a sequence of (name, C type) pairs
msg315097 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-04-08 21:51
Field names define CField descriptor attributes on the class. Attribute names should be strings, not bytes. There's no syntactically clean way to use a bytes name. Consider the example of a generic property on a class:

    >>> T = type('T', (), {b'p': property(lambda s: 0)})
    >>> t = T()
    >>> t.p
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'T' object has no attribute 'p'

    >>> getattr(t, b'p')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: getattr(): attribute name must be string

We'd have to dig into the class dict and manually bind the property:

    >>> vars(T)[b'p'].__get__(t)
    0
msg315098 - (view) Author: Matthias Urlichs (smurfix) * Date: 2018-04-08 22:27
Well, the original problem remains: symbol names aren't constrained to UTF-8 … so if I happen to stumble onto one of those (maybe generated by a code obfuscator), the answer is "don't use Python3 then"?
msg315099 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-04-08 23:15
If you're automatically wrapping a C source file and don't know the source encoding, you could naively decode it as Latin-1. You're still faced with the problem of characters that Python doesn't allow in identifiers. For example, gcc allows "$" in C identifiers (e.g. a field named "egg$"), but Python doesn't allow this character. At least you can use getattr() to access such names. For example:

    >>> s = bytes(range(256)).decode('latin-1')
    >>> T = type('T', (), {s: 0})
    >>> t = T()
    >>> getattr(t, s)
    0
History
Date User Action Args
2018-04-08 23:15:38eryksunsetmessages: + msg315099
2018-04-08 22:27:31smurfixsetmessages: + msg315098
2018-04-08 21:54:15eryksunsetresolution: not a bug -> rejected
2018-04-08 21:51:45eryksunsetstatus: open -> closed

nosy: + eryksun
messages: + msg315097

resolution: not a bug
stage: resolved
2018-04-08 21:32:52ned.deilysetnosy: + amaury.forgeotdarc, belopolsky, meador.inge
2018-04-08 21:19:25smurfixcreate