Message 322004 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	DemGiran
Recipients	DemGiran
Date	2018-07-20.12:53:50
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1532091230.61.0.56676864532.issue34168@psf.upfronthosting.co.za>
In-reply-to

Content
I have a list of 30 million strings, and I want to run a dns query to all of them. I do not understand how this operation can get memory intensive. I would assume that the threads would exit after the job is done, and there is also a timeout of 1 minute as well ({'dns_request_timeout': 1}). Here is a sneak peek of the machine's resources while running the script: [![enter image description here][1]][1] My code is as follows: # -- coding: utf-8 -- import dns.resolver import concurrent.futures from pprint import pprint from json import json bucket = json.load(open('30_million_strings.json','r')) def _dns_query(target, kwargs): global bucket resolv = dns.resolver.Resolver() resolv.timeout = kwargs['function']['dns_request_timeout'] try: resolv.query(target + '.com', kwargs['function']['query_type']) with open('out.txt', 'a') as f: f.write(target + '\n') except Exception: pass def run(kwargs): global bucket temp_locals = locals() pprint({k: v for k, v in temp_locals.items()}) with concurrent.futures.ThreadPoolExecutor(max_workers=kwargs['concurrency']['threads']) as executor: future_to_element = dict() for element in bucket: future = executor.submit(kwargs['function']['name'], element, **kwargs) future_to_element[future] = element for future in concurrent.futures.as_completed(future_to_element): result = future_to_element[future] run(function={'name': _dns_query, 'dns_request_timeout': 1, 'query_type': 'MX'}, concurrency={'threads': 15}) [1]: https://i.stack.imgur.com/686SW.png

I have a list of 30 million strings, and I want to run a dns query to all of them. I do not understand how this operation can get memory intensive. I would assume that the threads would exit after the job is done, and there is also a timeout of 1 minute as well ({'dns_request_timeout': 1}).

Here is a sneak peek of the machine's resources while running the script:

[![enter image description here][1]][1]

My code is as follows:

    # -*- coding: utf-8 -*-
    import dns.resolver
    import concurrent.futures
    from pprint import pprint
    from json import json

    
    bucket = json.load(open('30_million_strings.json','r'))
    
    
    def _dns_query(target, **kwargs):
        global bucket
        resolv = dns.resolver.Resolver()
        resolv.timeout = kwargs['function']['dns_request_timeout']
        try:
            resolv.query(target + '.com', kwargs['function']['query_type'])
            with open('out.txt', 'a') as f:
                f.write(target + '\n')
        except Exception:
            pass
    
    
    def run(**kwargs):
        global bucket
        temp_locals = locals()
        pprint({k: v for k, v in temp_locals.items()})
    
        with concurrent.futures.ThreadPoolExecutor(max_workers=kwargs['concurrency']['threads']) as executor:
            future_to_element = dict()
    
            for element in bucket:
                future = executor.submit(kwargs['function']['name'], element, **kwargs)
                future_to_element[future] = element
    
            for future in concurrent.futures.as_completed(future_to_element):
                result = future_to_element[future]
    
    
    run(function={'name': _dns_query, 'dns_request_timeout': 1, 'query_type': 'MX'},
        concurrency={'threads': 15})


  [1]: https://i.stack.imgur.com/686SW.png

History
Date	User	Action	Args
2018-07-20 12:53:50	DemGiran	set	recipients: + DemGiran
2018-07-20 12:53:50	DemGiran	set	messageid: <1532091230.61.0.56676864532.issue34168@psf.upfronthosting.co.za>
2018-07-20 12:53:50	DemGiran	link	issue34168 messages
2018-07-20 12:53:50	DemGiran	create