classification
Title: Switch suitable constants in the socket module to IntEnum
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eli.bendersky Nosy List: barry, christian.heimes, eli.bendersky, ethan.furman, gvanrossum, ncoghlan, neologix, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-08-12 21:10 by eli.bendersky, last changed 2013-08-31 22:21 by eli.bendersky. This issue is now closed.

Files
File name Uploaded Description Edit
socket-intenum-af.1.patch eli.bendersky, 2013-08-12 21:10 review
socket-intenum-af.2.patch eli.bendersky, 2013-08-12 21:55 review
socket-intenum-af.3.patch eli.bendersky, 2013-08-14 13:40 review
socket-intenum-af.4.patch eli.bendersky, 2013-08-16 13:28 review
socket-intenum-af.5.patch eli.bendersky, 2013-08-26 02:32 review
socket-intenum-af-type.6.patch eli.bendersky, 2013-08-30 13:55 review
Messages (30)
msg195018 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-12 21:10
As part of original plan and since issue #18264 has been resolved, it's time to dust off some old patches I have for the socket.* module. The socket.AF_* and socket.SOCK_* constants are good candidates for IntEnum conversion.

I'm attaching an initial patch that handles socket.AF_* (as a proof-of-concept; socket.SOCK_* should get identical treatment); it only touches Lib/socket.py and all regrtest tests pass without changes. Result:

>>> import socket
>>> socket.AF_INET
<AddressFamily.AF_INET: 2>
>>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.family
<AddressFamily.AF_INET: 2>
>>> '{}'.format(s.family)
'AddressFamily.AF_INET'

The code in the patch is pretty well commented, and I also want to point out a couple of decision points that came up:

1. The underlying socketmodule.c has no idea of IntEnums, and neither IMHO it should. These constants are essentially one-way, going from Python into C. They are only exposed back through read-only accessors (e.g. s.family above), at which point the Python code can wrap them back into the enum. The alternative, making socketmodule.c use enums is probably way more complicated than really necessary.
2. As a followup to (1), the constants are actually wrapped into an IntEnum at the Python level and exposed back to the user. My hacking of globals() there may be a bit rough - suggestions for a better way are welcome.
3. A bunch of AF_* constants exported by Python built on my x64 Ubuntu are not documented in socket.rst; I'm still wrapping them all in enums; I basically went over all PyModule_AddIntMacro(m, AF_*) in PyInit__socket.

PTAL
msg195027 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-12 21:55
Attaching updated patch answering Ethan's review. Also added a tiny sanity test that makes sure that AF_INET is indeed an enum. A more comprehensive test probably won't hurt for the final patch.
msg195126 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-14 07:12
What about SOCK_STREAM, SOCK_DGRAM, and SOCK_RAW?
msg195129 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-14 09:54
I'm not really thrilled by the current implementation. The way it is done leads to duplication among socket constants. Also, if I create a socket from a numeric value (because a well-known socket family is known yet known by Python), then socket.family will fail.
msg195145 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-14 12:45
> I'm not really thrilled by the current implementation. The way it is done
> leads to duplication among socket constants.

Can you clarify which duplication you mean specifically? In the code review
tool, Serhiy proposed that instead of listing all the constants explicitly
in the Python code, we can just do:

AddressFamily = IntEnum('AddressFamily',
           {name: value for name, value in _moduledict.items()
            if name.startswith('AF_')})

Is this less duplication?

> Also, if I create a socket from a numeric value (because a well-known
> socket family is known yet known by Python), then socket.family will fail.
>

Well, the documentation of socket.socket says it should be one of the AF_*
constants. One thing we can do is check in the socket.socket constructor
whether it's a known AF_* constants and fail early with a meaningful error
message. Would that be better?
msg195146 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-14 12:46
> What about SOCK_STREAM, SOCK_DGRAM, and SOCK_RAW?
>

This is a proof of concept. As I mentioned in the original message starting
this issue - the SOCK_* constants will get the same treatment as AF_*
constants, once we decide what the right treatment is.
msg195149 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-14 13:11
> Can you clarify which duplication you mean specifically? In the code
> review
> tool, Serhiy proposed that instead of listing all the constants
> explicitly
> in the Python code, we can just do:
> 
> AddressFamily = IntEnum('AddressFamily',
>            {name: value for name, value in _moduledict.items()
>             if name.startswith('AF_')})

Yes, that's better.

> > Also, if I create a socket from a numeric value (because a
> > well-known
> > socket family is known yet known by Python), then socket.family
> > will fail.
> >
> 
> Well, the documentation of socket.socket says it should be one of the
> AF_*
> constants. One thing we can do is check in the socket.socket
> constructor
> whether it's a known AF_* constants and fail early with a meaningful
> error
> message. Would that be better?

Well, no. We should still allow creating sockets with unknown numeric
values. The .family attribute should then return the raw integer.
msg195153 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-14 13:24
I'm surprised that test_repr is not failed. '%i' % socket.AddressFamily.AF_INET returns 'AddressFamily.AF_INET' instead of '2' as I expected. Is not this a bug in IntEnum implementation?
msg195156 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-14 13:40
Attaching a new patch that uses Serhiy's suggestion to avoid the duplication, and uses Antoine's suggestion to keep socket.family working even if the family is an unknown numeric constant.

Note that the latter is challenging to test - I need families the OS knows and socket doesn't. Any suggestions are welcome. For the time being the test I cooked up is an abomination ;-)
msg195157 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-14 13:43
On Wed, Aug 14, 2013 at 6:24 AM, Serhiy Storchaka <report@bugs.python.org>wrote:

>
> Serhiy Storchaka added the comment:
>
> I'm surprised that test_repr is not failed. '%i' %
> socket.AddressFamily.AF_INET returns 'AddressFamily.AF_INET' instead of '2'
> as I expected. Is not this a bug in IntEnum implementation?
>

I did notice it when playing with the implementation. Perhaps %i doesn't
call enum? Need to look into it more carefully...
msg195161 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2013-08-14 13:49
Issue18738 created.
msg195163 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-08-14 13:56
The manual list of AF_ prefixes looks ugly. We are going to run into the same problem for every module. How about an additional constructor that takes a name, dict and string prefix?

class IntEnum(int, Enum):
    @classmethod
    def from_moduledict(cls, name, moddict, prefix):
        members = {name: value for name, value in moddict.items()
                   if name.startswith(prefix)}
        return cls(name, members)

>>> import socket, enum, pprint
>>> AddressFamily = enum.IntEnum.from_moduledict("AddressFamily", socket.__dict__, "AF_")
>>> pprint.pprint(dict(AddressFamily.__members__))
{'AF_APPLETALK': <AddressFamily.AF_APPLETALK: 5>,
 'AF_ASH': <AddressFamily.AF_ASH: 18>,
 'AF_ATMPVC': <AddressFamily.AF_ATMPVC: 8>,
 'AF_ATMSVC': <AddressFamily.AF_ATMSVC: 20>,
 'AF_AX25': <AddressFamily.AF_AX25: 3>,
 'AF_BLUETOOTH': <AddressFamily.AF_BLUETOOTH: 31>,
 'AF_BRIDGE': <AddressFamily.AF_BRIDGE: 7>,
 'AF_CAN': <AddressFamily.AF_CAN: 29>,
 'AF_DECnet': <AddressFamily.AF_DECnet: 12>,
 'AF_ECONET': <AddressFamily.AF_ECONET: 19>,
 'AF_INET': <AddressFamily.AF_INET: 2>,
 'AF_INET6': <AddressFamily.AF_INET6: 10>,
 'AF_IPX': <AddressFamily.AF_IPX: 4>,
 'AF_IRDA': <AddressFamily.AF_IRDA: 23>,
 'AF_KEY': <AddressFamily.AF_KEY: 15>,
 'AF_LLC': <AddressFamily.AF_LLC: 26>,
 'AF_NETBEUI': <AddressFamily.AF_NETBEUI: 13>,
 'AF_NETLINK': <AddressFamily.AF_NETLINK: 16>,
 'AF_NETROM': <AddressFamily.AF_NETROM: 6>,
 'AF_PACKET': <AddressFamily.AF_PACKET: 17>,
 'AF_PPPOX': <AddressFamily.AF_PPPOX: 24>,
 'AF_RDS': <AddressFamily.AF_RDS: 21>,
 'AF_ROSE': <AddressFamily.AF_ROSE: 11>,
 'AF_ROUTE': <AddressFamily.AF_NETLINK: 16>,
 'AF_SECURITY': <AddressFamily.AF_SECURITY: 14>,
 'AF_SNA': <AddressFamily.AF_SNA: 22>,
 'AF_TIPC': <AddressFamily.AF_TIPC: 30>,
 'AF_UNIX': <AddressFamily.AF_UNIX: 1>,
 'AF_UNSPEC': <AddressFamily.AF_UNSPEC: 0>,
 'AF_WANPIPE': <AddressFamily.AF_WANPIPE: 25>,
 'AF_X25': <AddressFamily.AF_X25: 9>}
msg195165 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-14 14:03
Christian, this is an interesting idea - but I suggest we review it once we actually get to convert "many modules". For the time being the manual listing is gone in the recent patch, and for a number of constant families in the same module (i.e. SOCK_* that will be added here) we can combine the lookup into a single loop - so no need walking the module dict multiple times.
msg195166 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-14 14:03
I thing the AddressFamily enum class and the family property should be documented.
msg195167 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-14 14:07
> I thing the AddressFamily enum class and the family property should be
> documented.
>

I'm leaving the documentation to the end when we decide on the
implementation strategy. Once that happens I'll post a patch with .rst
changes, etc.
msg195169 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-14 14:12
Nitpick: If we have grouped this constants in an enum class perhaps it will be good if they will be enumerated in the order of its values. Or no?
msg195316 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-16 10:38
I've just had a quick look at the patch, but from what I can see, socket.getaddrinfo() will return a raw integer for the family, and not an AddressFamily instance. That's unfortunate.
msg195328 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-16 13:28
Great catch, Charles-François, thanks. It's actually an interesting example in favor of the approach - it's very nice to see descriptive family names in the data returned by getaddrinfo, instead of obtuse numeric values.

The attached patch fixes it in a similar vein to what was done for the family accessor: getaddrinfo is overridden in Python and does the necessary conversion. User code is unaffected. It has to rebuild the list (since those are tuples) but I don't think it matters in terms of performance for this specific function. As a bonus, getaddrinfo got a much nicer docstring :)

----

Note again that if this is acceptable, I'm going to similarly convert SOCK_* and post a full patch (with some documentation too).
msg195330 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-16 13:36
Looks nice:

>>> for t in socket.getaddrinfo('www.google.com', 'http'):
...   t
... 
(<AddressFamily.AF_INET: 2>, 1, 6, '', ('74.125.239.112', 80))
(<AddressFamily.AF_INET: 2>, 2, 17, '', ('74.125.239.112', 80))
(<AddressFamily.AF_INET: 2>, 1, 6, '', ('74.125.239.116', 80))
(<AddressFamily.AF_INET: 2>, 2, 17, '', ('74.125.239.116', 80))
(<AddressFamily.AF_INET: 2>, 1, 6, '', ('74.125.239.113', 80))
(<AddressFamily.AF_INET: 2>, 2, 17, '', ('74.125.239.113', 80))
(<AddressFamily.AF_INET: 2>, 1, 6, '', ('74.125.239.115', 80))
(<AddressFamily.AF_INET: 2>, 2, 17, '', ('74.125.239.115', 80))
(<AddressFamily.AF_INET: 2>, 1, 6, '', ('74.125.239.114', 80))
(<AddressFamily.AF_INET: 2>, 2, 17, '', ('74.125.239.114', 80))
(<AddressFamily.AF_INET6: 10>, 1, 6, '', ('2607:f8b0:4005:800::1011', 80, 0, 0))
(<AddressFamily.AF_INET6: 10>, 2, 17, '', ('2607:f8b0:4005:800::1011', 80, 0, 0))
msg195397 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-08-16 19:09
Besides some nitpicks which I have left on Rietveld the patch LGTM.

I guess the _family_converter idiom will be popular enough. Perhaps we will add some helper just to the Enum class.
msg195425 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-16 21:37
> The attached patch fixes it in a similar vein to what was done for the
> family accessor: getaddrinfo is overridden in Python and does the necessary
> conversion.

Am I the only one thinking that wrapping every C function with a
Python counterpart to perform enum conversions is tedious, ugly and
error-prone :-\
msg195426 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-08-16 21:44
I wonder how much of an performance issue these wrappers are going to be.
msg195441 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-16 23:36
Charles-François: unfortunately, the alternative seems to be even more tedious and error prone. Internally, socketmodule.c really uses ints all over the place. Changing that to use IntEnum objects from Python would be very tedious and touch a lot more code.

Christian - I think that practically none. None of the touched places seems like a performance concern (even without realizing that these are sockets and *network operations* we're talking about here)
msg196173 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-26 02:32
Serhiy, patch 5 addresses your latest review. Thanks. As for _family_converter, let's see how common this really becomes. Based on one use-case I wouldn't add it to Enum itself. But if we see it gets repeated in many places, we can.
msg196241 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-26 19:50
Looks good.  Go ahead and convert the rest of the socket constants.  Then we can consider other modules, e.g. select and os.
msg196539 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-30 13:55
[Thanks, Guido]

Attaching patch with SOCK_* constants converted as well. The module globals are currently walked twice, once to extract AF_* and once SOCK_*. This is not a real performance problem, but should we fold it into a single loop that collects two dicts? The code will be less pretty but more efficient.

Another issue is _intenum_converter. Ethan - do you think it belongs in the enum module as a helper function or something of the sort?
msg196554 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-30 14:58
I prefer two prettier loops over one less pretty one in this case. No
opinion about _intenum_converter yet. (IMO refactoring can always be lazy,
i.e. after you have multiple copies of the same code there's time to
consider whether you should unify them and what the pros and cons are.)

On Fri, Aug 30, 2013 at 6:55 AM, Eli Bendersky <report@bugs.python.org>wrote:

>
> Eli Bendersky added the comment:
>
> [Thanks, Guido]
>
> Attaching patch with SOCK_* constants converted as well. The module
> globals are currently walked twice, once to extract AF_* and once SOCK_*.
> This is not a real performance problem, but should we fold it into a single
> loop that collects two dicts? The code will be less pretty but more
> efficient.
>
> Another issue is _intenum_converter. Ethan - do you think it belongs in
> the enum module as a helper function or something of the sort?
>
> ----------
> Added file:
> http://bugs.python.org/file31521/socket-intenum-af-type.6.patch
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue18720>
> _______________________________________
>
msg196678 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-08-31 22:13
New changeset 038543d34166 by Eli Bendersky in branch 'default':
Switch the AF_* and SOCK_* constants in the socket module to IntEnum.
http://hg.python.org/cpython/rev/038543d34166
msg196680 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2013-08-31 22:18
Eli Bendersky added the comment:
>
> Another issue is _intenum_converter. Ethan - do you think it belongs in the enum module as a helper function or something of the sort?

I'm fine with it being a non-public member of enum, although if we had a place for miscellaneous tools that might be a 
better spot for it.
msg196682 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-08-31 22:21
On Sat, Aug 31, 2013 at 3:18 PM, Ethan Furman <report@bugs.python.org>wrote:

>
> Ethan Furman added the comment:
>
> Eli Bendersky added the comment:
> >
> > Another issue is _intenum_converter. Ethan - do you think it belongs in
> the enum module as a helper function or something of the sort?
>
> I'm fine with it being a non-public member of enum, although if we had a
> place for miscellaneous tools that might be a
> better spot for it.
>

For the time being, I'm OK with Guido's suggestion to keep it lazy. As long
as it's just being used in a single module, there's not much to think
about. If we end up needing it when switching other modules, we'll see what
to do.
History
Date User Action Args
2013-08-31 22:21:55eli.benderskysetmessages: + msg196682
2013-08-31 22:18:38ethan.furmansetmessages: + msg196680
2013-08-31 22:13:55python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg196678

resolution: fixed
stage: patch review -> resolved
2013-08-30 14:58:46gvanrossumsetmessages: + msg196554
2013-08-30 13:55:42eli.benderskysetfiles: + socket-intenum-af-type.6.patch

messages: + msg196539
2013-08-26 19:50:23gvanrossumsetmessages: + msg196241
2013-08-26 02:32:37eli.benderskysetfiles: + socket-intenum-af.5.patch

messages: + msg196173
2013-08-16 23:36:25eli.benderskysetmessages: + msg195441
2013-08-16 21:44:58christian.heimessetmessages: + msg195426
2013-08-16 21:37:29neologixsetmessages: + msg195425
2013-08-16 19:09:01serhiy.storchakasetmessages: + msg195397
2013-08-16 13:36:35eli.benderskysetmessages: + msg195330
2013-08-16 13:28:34eli.benderskysetfiles: + socket-intenum-af.4.patch

messages: + msg195328
2013-08-16 10:38:26neologixsetmessages: + msg195316
2013-08-14 14:12:58serhiy.storchakasetmessages: + msg195169
2013-08-14 14:07:01eli.benderskysetmessages: + msg195167
2013-08-14 14:03:27serhiy.storchakasetmessages: + msg195166
2013-08-14 14:03:12eli.benderskysetmessages: + msg195165
2013-08-14 13:56:04christian.heimessetnosy: + christian.heimes
messages: + msg195163
2013-08-14 13:49:59ethan.furmansetmessages: + msg195161
2013-08-14 13:43:11eli.benderskysetmessages: + msg195157
2013-08-14 13:40:47eli.benderskysetfiles: + socket-intenum-af.3.patch

messages: + msg195156
2013-08-14 13:24:13serhiy.storchakasetmessages: + msg195153
2013-08-14 13:11:51pitrousetmessages: + msg195149
2013-08-14 12:46:24eli.benderskysetmessages: + msg195146
2013-08-14 12:45:27eli.benderskysetmessages: + msg195145
2013-08-14 09:58:34pitrousetnosy: + neologix
2013-08-14 09:54:59pitrousetmessages: + msg195129
2013-08-14 07:12:04serhiy.storchakasetmessages: + msg195126
2013-08-12 21:55:41eli.benderskysetfiles: + socket-intenum-af.2.patch

messages: + msg195027
2013-08-12 21:38:42serhiy.storchakasetnosy: + serhiy.storchaka
2013-08-12 21:10:49eli.benderskycreate