Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up ipaddress __contain__ method #69616

Closed
gescheit mannequin opened this issue Oct 17, 2015 · 8 comments
Closed

speed up ipaddress __contain__ method #69616

gescheit mannequin opened this issue Oct 17, 2015 · 8 comments
Labels
3.8 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@gescheit
Copy link
Mannequin

gescheit mannequin commented Oct 17, 2015

BPO 25430
Nosy @ncoghlan, @pitrou, @methane, @gescheit, @serhiy-storchaka, @tirkarthi
PRs
  • bpo-25430: improve performance of IPNetwork.__contains__ #1785
  • Files
  • ipaddress_contains.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-10-24.08:45:46.603>
    created_at = <Date 2015-10-17.11:57:18.087>
    labels = ['3.8', 'library', 'performance']
    title = 'speed up ipaddress __contain__ method'
    updated_at = <Date 2019-10-24.08:45:46.601>
    user = 'https://github.com/gescheit'

    bugs.python.org fields:

    activity = <Date 2019-10-24.08:45:46.601>
    actor = 'xtreak'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-10-24.08:45:46.603>
    closer = 'xtreak'
    components = ['Library (Lib)']
    creation = <Date 2015-10-17.11:57:18.087>
    creator = 'gescheit'
    dependencies = []
    files = ['40801']
    hgrepos = []
    issue_num = 25430
    keywords = ['patch']
    message_count = 8.0
    messages = ['253126', '294242', '294252', '294253', '294442', '331931', '341138', '355303']
    nosy_count = 7.0
    nosy_names = ['ncoghlan', 'pitrou', 'pmoody', 'methane', 'gescheit', 'serhiy.storchaka', 'xtreak']
    pr_nums = ['1785']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue25430'
    versions = ['Python 3.8']

    @gescheit
    Copy link
    Mannequin Author

    gescheit mannequin commented Oct 17, 2015

    Current check "address in network" is seems a bit odd:
    int(self.network_address) <= int(other._ip) < int(self.broadcast_address)
    This patch make this in bit-operation manner. Perfomace test:

    import ipaddress
    import timeit
    
    
    class IPv6Network2(ipaddress.IPv6Network):
        def __contains__(self, other):
            # always false if one is v4 and the other is v6.
            if self._version != other._version:
                return False
            # dealing with another network.
            if isinstance(other, ipaddress._BaseNetwork):
                return False
            else:
                # address
                return other._ip & self.netmask._ip == self.network_address._ip
    
    class IPv4Network2(ipaddress.IPv4Network):
        def __contains__(self, other):
            # always false if one is v4 and the other is v6.
            if self._version != other._version:
                return False
            # dealing with another network.
            if isinstance(other, ipaddress._BaseNetwork):
                return False
            # dealing with another address
            else:
                # address
                return other._ip & self.netmask._ip == self.network_address._ip
    
    ipv6_test_net = ipaddress.IPv6Network("::/0")
    ipv6_test_net2 = IPv6Network2("::/0")
    ipv4_test_net = ipaddress.IPv4Network("0.0.0.0/0")
    ipv4_test_net2 = IPv4Network2("0.0.0.0/0")
    
    dataipv6 = list()
    dataipv4 = list()
    for x in range(2000000):
        dataipv6.append(ipaddress.IPv6Address(x))
        dataipv4.append(ipaddress.IPv4Address(x))
    
    def test():
        for d in dataipv6:
            d in ipv6_test_net
    
    def test2():
        for d in dataipv6:
            d in ipv6_test_net2
    
    def test3():
        for d in dataipv4:
            d in ipv4_test_net
    
    def test4():
        for d in dataipv4:
            d in ipv4_test_net2
    
    t = timeit.Timer("test()", "from __main__ import test")
    print("ipv6 test origin __contains__", t.timeit(number=1))
    
    t = timeit.Timer("test2()", "from __main__ import test2")
    print("ipv6 test new __contains__", t.timeit(number=1))
    
    t = timeit.Timer("test3()", "from __main__ import test3")
    print("ipv4 test origin __contains__", t.timeit(number=1))
    
    t = timeit.Timer("test4()", "from __main__ import test4")
    print("ipv4 test new __contains__", t.timeit(number=1))

    Output:

    ipv6 test origin __contains__ 4.265904285013676
    ipv6 test new __contains__ 1.551749340025708
    ipv4 test origin __contains__ 3.689626455074176
    ipv4 test new __contains__ 2.0175559649942443

    @gescheit gescheit mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Oct 17, 2015
    @gescheit
    Copy link
    Mannequin Author

    gescheit mannequin commented May 23, 2017

    I think this patch can be easily applied. There are no any breaking changes. My colleagues and I assume that ipaddress module would work fast because it is easy to imagine tasks, operate with ton of networks.
    Also, current comparison implementation works as not so good pattern for other developers.

    @pitrou
    Copy link
    Member

    pitrou commented May 23, 2017

    Hi Aleksandr,

    well, sorry for the detail. We now use GitHub for patch submission, would you like to submit a PR at https://github.com/python/cpython/ (see devguide at https://cpython-devguide.readthedocs.io/ for more information).

    If you don't want to do so, that's fine, too. A core developer may take care of it for you.

    @pitrou
    Copy link
    Member

    pitrou commented May 23, 2017

    s/sorry for the detail/sorry for the delay/

    @serhiy-storchaka
    Copy link
    Member

    I confirm that the proposed code is equivalent to the existing code and is faster. The largest part (~70%) of the speed up is caused by replacing int() calls with direct attribute access to _ip. I'm wondering whether it is worth to get rid of int() calls in the rest of the code.

    The overhead of using int():

    • Looking up the "int" name in globals (failed).
    • Looking up the "int" name in builtins.
    • Calling the C function.
    • Calling the Python function.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Aug 3, 2017
    @pitrou pitrou added 3.8 only security fixes and removed 3.7 (EOL) end of life labels Dec 12, 2018
    @tirkarthi
    Copy link
    Member

    Though this is out of the scope of the issue I tried converting num_addresses, __hash__, __getitem__ and __eq__ as per Serhiy's idea for IPv6Network replacing the stdlib implementation's int calls with _ip in a custom class. I can see up to 50% speedups as below and no test case failures converting rest of the call sites to use _ip instead of int in stdlib. But some of the methods may not be used as frequently as in this benchmark like thus being not worthy enough of change. Shall I open a new issue for further discussion?

    $ python3.7 bpo25430_1.py

    ipv6 test num_addresses with int 1.54065761
    ipv6 test num_addresses without int 0.8266360819999998
    ipv6 test hash with int 1.320016881
    ipv6 test hash without int 0.6266323200000001
    ipv6 test equality with int 1.6104001990000008
    ipv6 test equality without int 1.0374885390000008
    ipv6 test get item with int 2.092343390000001
    ipv6 test get item without int 1.5606673410000003

    $ bpo25430_1.py
    import ipaddress
    import timeit
    
    class IPv6Network2(ipaddress.IPv6Network):
    
        @property
        def num_addresses(self):
            return self.broadcast_address._ip - self.network_address._ip + 1
    
        def __hash__(self):
            return hash(self.network_address._ip ^ self.netmask._ip)
    
        def __eq__(self, other):
            try:
                return (self._version == other._version and
                        self.network_address == other.network_address and
                        self.netmask._ip == other.netmask._ip)
            except AttributeError:
                return NotImplemented
    
        def __getitem__(self, n):
            network = self.network_address._ip
            broadcast = self.broadcast_address._ip
            if n >= 0:
                if network + n > broadcast:
                    raise IndexError('address out of range')
                return self._address_class(network + n)
            else:
                n += 1
                if broadcast + n < network:
                    raise IndexError('address out of range')
                return self._address_class(broadcast + n)
    
    
    ipv6_test_net = ipaddress.IPv6Network("::/0")
    ipv6_test_net2 = IPv6Network2("::/0")
    
    def test1_num_address():
        return ipv6_test_net.num_addresses
    
    def test2_num_address():
        return ipv6_test_net2.num_addresses
    
    def test1_hash_address():
        return hash(ipv6_test_net)
    
    def test2_hash_address():
        return hash(ipv6_test_net2)
    
    if __name__ == "__main__":
        t = timeit.Timer("test1_num_address()", "from __main__ import test1_num_address")
        print("ipv6 test num_addresses with int", t.timeit(number=1000000))
    
        t = timeit.Timer("test2_num_address()", "from __main__ import test2_num_address")
        print("ipv6 test num_addresses without int", t.timeit(number=1000000))
    
        t = timeit.Timer("test1_hash_address()", "from __main__ import test1_hash_address")
        print("ipv6 test hash with int", t.timeit(number=1000000))
    
        t = timeit.Timer("test2_hash_address()", "from __main__ import test2_hash_address")
        print("ipv6 test hash without int", t.timeit(number=1000000))
    
        t = timeit.Timer("ipv6_test_net == ipv6_test_net", "from __main__ import ipv6_test_net")
        print("ipv6 test equality with int", t.timeit(number=1000000))
    
        t = timeit.Timer("ipv6_test_net2 == ipv6_test_net2", "from __main__ import ipv6_test_net2")
        print("ipv6 test equality without int", t.timeit(number=1000000))
    
        t = timeit.Timer("ipv6_test_net[10000]", "from __main__ import ipv6_test_net")
        print("ipv6 test get item with int", t.timeit(number=1000000))
    
        t = timeit.Timer("ipv6_test_net2[10000]", "from __main__ import ipv6_test_net2")
        print("ipv6 test get item without int", t.timeit(number=1000000))
    assert test1_num_address() == test2_num_address()
    assert hash(ipv6_test_net2) == hash(ipv6_test_net)
    assert ipv6_test_net2 == ipv6_test_net
    assert ipv6_test_net[10000] == ipv6_test_net2[10000]
    

    @methane
    Copy link
    Member

    methane commented Apr 30, 2019

    New changeset 3bbcc92 by Inada Naoki (gescheit) in branch 'master':
    bpo-25430: improve performance of IPNetwork.__contains__ (GH-1785)
    3bbcc92

    @tirkarthi
    Copy link
    Member

    Closing it as resolved since the optimization was merged in 3.8.0 . Thanks @gescheit for the report and patch.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants