New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up ipaddress __contain__ method #69616
Comments
Current check "address in network" is seems a bit odd: import ipaddress
import timeit
class IPv6Network2(ipaddress.IPv6Network):
def __contains__(self, other):
# always false if one is v4 and the other is v6.
if self._version != other._version:
return False
# dealing with another network.
if isinstance(other, ipaddress._BaseNetwork):
return False
else:
# address
return other._ip & self.netmask._ip == self.network_address._ip
class IPv4Network2(ipaddress.IPv4Network):
def __contains__(self, other):
# always false if one is v4 and the other is v6.
if self._version != other._version:
return False
# dealing with another network.
if isinstance(other, ipaddress._BaseNetwork):
return False
# dealing with another address
else:
# address
return other._ip & self.netmask._ip == self.network_address._ip
ipv6_test_net = ipaddress.IPv6Network("::/0")
ipv6_test_net2 = IPv6Network2("::/0")
ipv4_test_net = ipaddress.IPv4Network("0.0.0.0/0")
ipv4_test_net2 = IPv4Network2("0.0.0.0/0")
dataipv6 = list()
dataipv4 = list()
for x in range(2000000):
dataipv6.append(ipaddress.IPv6Address(x))
dataipv4.append(ipaddress.IPv4Address(x))
def test():
for d in dataipv6:
d in ipv6_test_net
def test2():
for d in dataipv6:
d in ipv6_test_net2
def test3():
for d in dataipv4:
d in ipv4_test_net
def test4():
for d in dataipv4:
d in ipv4_test_net2
t = timeit.Timer("test()", "from __main__ import test")
print("ipv6 test origin __contains__", t.timeit(number=1))
t = timeit.Timer("test2()", "from __main__ import test2")
print("ipv6 test new __contains__", t.timeit(number=1))
t = timeit.Timer("test3()", "from __main__ import test3")
print("ipv4 test origin __contains__", t.timeit(number=1))
t = timeit.Timer("test4()", "from __main__ import test4")
print("ipv4 test new __contains__", t.timeit(number=1)) Output: ipv6 test origin __contains__ 4.265904285013676 |
I think this patch can be easily applied. There are no any breaking changes. My colleagues and I assume that ipaddress module would work fast because it is easy to imagine tasks, operate with ton of networks. |
Hi Aleksandr, well, sorry for the detail. We now use GitHub for patch submission, would you like to submit a PR at https://github.com/python/cpython/ (see devguide at https://cpython-devguide.readthedocs.io/ for more information). If you don't want to do so, that's fine, too. A core developer may take care of it for you. |
s/sorry for the detail/sorry for the delay/ |
I confirm that the proposed code is equivalent to the existing code and is faster. The largest part (~70%) of the speed up is caused by replacing int() calls with direct attribute access to _ip. I'm wondering whether it is worth to get rid of int() calls in the rest of the code. The overhead of using int():
|
Though this is out of the scope of the issue I tried converting num_addresses, __hash__, __getitem__ and __eq__ as per Serhiy's idea for IPv6Network replacing the stdlib implementation's int calls with _ip in a custom class. I can see up to 50% speedups as below and no test case failures converting rest of the call sites to use _ip instead of int in stdlib. But some of the methods may not be used as frequently as in this benchmark like thus being not worthy enough of change. Shall I open a new issue for further discussion? $ python3.7 bpo25430_1.py ipv6 test num_addresses with int 1.54065761 $ bpo25430_1.py import ipaddress
import timeit
class IPv6Network2(ipaddress.IPv6Network):
@property
def num_addresses(self):
return self.broadcast_address._ip - self.network_address._ip + 1
def __hash__(self):
return hash(self.network_address._ip ^ self.netmask._ip)
def __eq__(self, other):
try:
return (self._version == other._version and
self.network_address == other.network_address and
self.netmask._ip == other.netmask._ip)
except AttributeError:
return NotImplemented
def __getitem__(self, n):
network = self.network_address._ip
broadcast = self.broadcast_address._ip
if n >= 0:
if network + n > broadcast:
raise IndexError('address out of range')
return self._address_class(network + n)
else:
n += 1
if broadcast + n < network:
raise IndexError('address out of range')
return self._address_class(broadcast + n)
ipv6_test_net = ipaddress.IPv6Network("::/0")
ipv6_test_net2 = IPv6Network2("::/0")
def test1_num_address():
return ipv6_test_net.num_addresses
def test2_num_address():
return ipv6_test_net2.num_addresses
def test1_hash_address():
return hash(ipv6_test_net)
def test2_hash_address():
return hash(ipv6_test_net2)
if __name__ == "__main__":
t = timeit.Timer("test1_num_address()", "from __main__ import test1_num_address")
print("ipv6 test num_addresses with int", t.timeit(number=1000000))
t = timeit.Timer("test2_num_address()", "from __main__ import test2_num_address")
print("ipv6 test num_addresses without int", t.timeit(number=1000000))
t = timeit.Timer("test1_hash_address()", "from __main__ import test1_hash_address")
print("ipv6 test hash with int", t.timeit(number=1000000))
t = timeit.Timer("test2_hash_address()", "from __main__ import test2_hash_address")
print("ipv6 test hash without int", t.timeit(number=1000000))
t = timeit.Timer("ipv6_test_net == ipv6_test_net", "from __main__ import ipv6_test_net")
print("ipv6 test equality with int", t.timeit(number=1000000))
t = timeit.Timer("ipv6_test_net2 == ipv6_test_net2", "from __main__ import ipv6_test_net2")
print("ipv6 test equality without int", t.timeit(number=1000000))
t = timeit.Timer("ipv6_test_net[10000]", "from __main__ import ipv6_test_net")
print("ipv6 test get item with int", t.timeit(number=1000000))
t = timeit.Timer("ipv6_test_net2[10000]", "from __main__ import ipv6_test_net2")
print("ipv6 test get item without int", t.timeit(number=1000000))
|
Closing it as resolved since the optimization was merged in 3.8.0 . Thanks @gescheit for the report and patch. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: