classification
Title: speed up ipaddress __contain__ method
Type: performance Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: gescheit, inada.naoki, ncoghlan, pitrou, pmoody, serhiy.storchaka, xtreak
Priority: normal Keywords: patch

Created on 2015-10-17 11:57 by gescheit, last changed 2019-04-30 07:54 by inada.naoki.

Files
File name Uploaded Description Edit
ipaddress_contains.patch gescheit, 2015-10-17 11:57 review
Pull Requests
URL Status Linked Edit
PR 1785 merged python-dev, 2017-05-24 10:56
Messages (7)
msg253126 - (view) Author: Aleksandr Balezin (gescheit) * Date: 2015-10-17 11:57
Current check "address in network" is seems a bit odd:
int(self.network_address) <= int(other._ip) < int(self.broadcast_address)
This patch make this in bit-operation manner. Perfomace test:

import ipaddress
import timeit


class IPv6Network2(ipaddress.IPv6Network):
    def __contains__(self, other):
        # always false if one is v4 and the other is v6.
        if self._version != other._version:
            return False
        # dealing with another network.
        if isinstance(other, ipaddress._BaseNetwork):
            return False
        else:
            # address
            return other._ip & self.netmask._ip == self.network_address._ip

class IPv4Network2(ipaddress.IPv4Network):
    def __contains__(self, other):
        # always false if one is v4 and the other is v6.
        if self._version != other._version:
            return False
        # dealing with another network.
        if isinstance(other, ipaddress._BaseNetwork):
            return False
        # dealing with another address
        else:
            # address
            return other._ip & self.netmask._ip == self.network_address._ip

ipv6_test_net = ipaddress.IPv6Network("::/0")
ipv6_test_net2 = IPv6Network2("::/0")
ipv4_test_net = ipaddress.IPv4Network("0.0.0.0/0")
ipv4_test_net2 = IPv4Network2("0.0.0.0/0")

dataipv6 = list()
dataipv4 = list()
for x in range(2000000):
    dataipv6.append(ipaddress.IPv6Address(x))
    dataipv4.append(ipaddress.IPv4Address(x))

def test():
    for d in dataipv6:
        d in ipv6_test_net

def test2():
    for d in dataipv6:
        d in ipv6_test_net2

def test3():
    for d in dataipv4:
        d in ipv4_test_net

def test4():
    for d in dataipv4:
        d in ipv4_test_net2

t = timeit.Timer("test()", "from __main__ import test")
print("ipv6 test origin __contains__", t.timeit(number=1))

t = timeit.Timer("test2()", "from __main__ import test2")
print("ipv6 test new __contains__", t.timeit(number=1))

t = timeit.Timer("test3()", "from __main__ import test3")
print("ipv4 test origin __contains__", t.timeit(number=1))

t = timeit.Timer("test4()", "from __main__ import test4")
print("ipv4 test new __contains__", t.timeit(number=1))

Output:

ipv6 test origin __contains__ 4.265904285013676
ipv6 test new __contains__ 1.551749340025708
ipv4 test origin __contains__ 3.689626455074176
ipv4 test new __contains__ 2.0175559649942443
msg294242 - (view) Author: Aleksandr Balezin (gescheit) * Date: 2017-05-23 09:46
I think this patch can be easily applied. There are no any breaking changes. My colleagues and I assume that ipaddress module would work fast because it is easy to imagine tasks, operate with ton of networks.
Also, current comparison implementation works as not so good pattern for other developers.
msg294252 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-05-23 14:58
Hi Aleksandr,

well, sorry for the detail.  We now use GitHub for patch submission, would you like to submit a PR at https://github.com/python/cpython/ (see devguide at https://cpython-devguide.readthedocs.io/ for more information).

If you don't want to do so, that's fine, too. A core developer may take care of it for you.
msg294253 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-05-23 14:58
s/sorry for the detail/sorry for the delay/
msg294442 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-25 07:20
I confirm that the proposed code is equivalent to the existing code and is faster. The largest part (~70%) of the speed up is caused by replacing int() calls with direct attribute access to _ip. I'm wondering whether it is worth to get rid of int() calls in the rest of the code.

The overhead of using int():

* Looking up the "int" name in globals (failed).
* Looking up the "int" name in builtins.
* Calling the C function.
* Calling the Python function.
msg331931 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-12-16 18:00
Though this is out of the scope of the issue I tried converting num_addresses, __hash__, __getitem__ and __eq__ as per Serhiy's idea for IPv6Network replacing the stdlib implementation's int calls with _ip in a custom class. I can see up to 50% speedups as below and no test case failures converting rest of the call sites to use _ip instead of int in stdlib. But some of the methods may not be used as frequently as in this benchmark like thus being not worthy enough of change. Shall I open a new issue for further discussion? 

$ python3.7 bpo25430_1.py

ipv6 test num_addresses with int 1.54065761
ipv6 test num_addresses without int 0.8266360819999998
ipv6 test hash with int 1.320016881
ipv6 test hash without int 0.6266323200000001
ipv6 test equality with int 1.6104001990000008
ipv6 test equality without int 1.0374885390000008
ipv6 test get item with int 2.092343390000001
ipv6 test get item without int 1.5606673410000003

$ bpo25430_1.py

import ipaddress
import timeit

class IPv6Network2(ipaddress.IPv6Network):

    @property
    def num_addresses(self):
        return self.broadcast_address._ip - self.network_address._ip + 1

    def __hash__(self):
        return hash(self.network_address._ip ^ self.netmask._ip)

    def __eq__(self, other):
        try:
            return (self._version == other._version and
                    self.network_address == other.network_address and
                    self.netmask._ip == other.netmask._ip)
        except AttributeError:
            return NotImplemented

    def __getitem__(self, n):
        network = self.network_address._ip
        broadcast = self.broadcast_address._ip
        if n >= 0:
            if network + n > broadcast:
                raise IndexError('address out of range')
            return self._address_class(network + n)
        else:
            n += 1
            if broadcast + n < network:
                raise IndexError('address out of range')
            return self._address_class(broadcast + n)


ipv6_test_net = ipaddress.IPv6Network("::/0")
ipv6_test_net2 = IPv6Network2("::/0")

def test1_num_address():
    return ipv6_test_net.num_addresses

def test2_num_address():
    return ipv6_test_net2.num_addresses

def test1_hash_address():
    return hash(ipv6_test_net)

def test2_hash_address():
    return hash(ipv6_test_net2)

if __name__ == "__main__":
    t = timeit.Timer("test1_num_address()", "from __main__ import test1_num_address")
    print("ipv6 test num_addresses with int", t.timeit(number=1000000))

    t = timeit.Timer("test2_num_address()", "from __main__ import test2_num_address")
    print("ipv6 test num_addresses without int", t.timeit(number=1000000))

    t = timeit.Timer("test1_hash_address()", "from __main__ import test1_hash_address")
    print("ipv6 test hash with int", t.timeit(number=1000000))

    t = timeit.Timer("test2_hash_address()", "from __main__ import test2_hash_address")
    print("ipv6 test hash without int", t.timeit(number=1000000))

    t = timeit.Timer("ipv6_test_net == ipv6_test_net", "from __main__ import ipv6_test_net")
    print("ipv6 test equality with int", t.timeit(number=1000000))

    t = timeit.Timer("ipv6_test_net2 == ipv6_test_net2", "from __main__ import ipv6_test_net2")
    print("ipv6 test equality without int", t.timeit(number=1000000))

    t = timeit.Timer("ipv6_test_net[10000]", "from __main__ import ipv6_test_net")
    print("ipv6 test get item with int", t.timeit(number=1000000))

    t = timeit.Timer("ipv6_test_net2[10000]", "from __main__ import ipv6_test_net2")
    print("ipv6 test get item without int", t.timeit(number=1000000))

    assert test1_num_address() == test2_num_address()
    assert hash(ipv6_test_net2) == hash(ipv6_test_net)
    assert ipv6_test_net2 == ipv6_test_net
    assert ipv6_test_net[10000] == ipv6_test_net2[10000]
msg341138 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-04-30 07:54
New changeset 3bbcc92577f8e616bc94c679040043bacd00ebf1 by Inada Naoki (gescheit) in branch 'master':
bpo-25430: improve performance of IPNetwork.__contains__ (GH-1785)
https://github.com/python/cpython/commit/3bbcc92577f8e616bc94c679040043bacd00ebf1
History
Date User Action Args
2019-04-30 07:54:50inada.naokisetnosy: + inada.naoki
messages: + msg341138
2018-12-16 18:00:02xtreaksetnosy: + xtreak
messages: + msg331931
2018-12-12 20:54:14pitrousetversions: + Python 3.8, - Python 3.7
2017-08-03 06:02:06serhiy.storchakasetversions: + Python 3.7, - Python 3.6
2017-05-25 07:20:02serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg294442
2017-05-24 10:56:22python-devsetpull_requests: + pull_request1866
2017-05-23 14:58:31pitrousetmessages: + msg294253
2017-05-23 14:58:18pitrousetnosy: + pitrou
messages: + msg294252
2017-05-23 09:46:50gescheitsetmessages: + msg294242
2016-07-26 19:00:29berker.peksagunlinkissue25431 dependencies
2015-10-17 18:00:45serhiy.storchakasetnosy: + ncoghlan, pmoody
stage: patch review

versions: - Python 3.4, Python 3.5
2015-10-17 18:00:13serhiy.storchakalinkissue25431 dependencies
2015-10-17 16:44:35gescheitsetversions: + Python 3.4, Python 3.6
2015-10-17 11:57:18gescheitcreate