Title: http.client leaks from
Type: crash Stage:
Components: Versions: Python 3.9
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: HynekPetrak
Priority: normal Keywords:

Created on 2021-04-06 07:43 by HynekPetrak, last changed 2021-04-07 07:51 by HynekPetrak.

Messages (3)
msg390287 - (view) Author: Hynek Petrak (HynekPetrak) Date: 2021-04-06 07:43
Hi, I wrote an webcrawler, which is using ThreadPoolExecutor to span multiple thread workers, retrieve content of a web using via http.client and saves it to a file.
After a couple of thousands requests have been processes, the crawler starts to consume memory rapidly, resulting in consumption of all available memory.
tracemalloc shows the memory is not collected from:
/usr/lib/python3.9/http/ size=47.6 MiB, count=6078, average=8221 B
  File "/usr/lib/python3.9/http/", line 468
    s =

I have tested as well with requests and urllib3 and as they use http.client underneath, the result is always the same.

My code around that:
def get_html3(session, url, timeout=10):
    o = urlparse(url)
    if o.scheme == 'http':
        cn = http.client.HTTPConnection(o.netloc, timeout=timeout)
        cn = http.client.HTTPSConnection(o.netloc, context=ctx, timeout=timeout)
    cn.request('GET', o.path, headers=headers)
    r = cn.getresponse()
    log.debug(f'[*] [{url}] Status: {r.status} {r.reason}')
    if r.status not in [400, 403, 404]:
        ret ='utf-8')
        ret = ""
    del r
    del cn
    return ret
msg390290 - (view) Author: Hynek Petrak (HynekPetrak) Date: 2021-04-06 07:44
Python 3.9.2 on Kali Linux.
msg390403 - (view) Author: Hynek Petrak (HynekPetrak) Date: 2021-04-07 07:51
The leak does not seem to occure, when I use 

 ret ='utf-8')

Date User Action Args
2021-04-07 07:51:49HynekPetraksetmessages: + msg390403
2021-04-06 07:44:34HynekPetraksetmessages: + msg390290
2021-04-06 07:43:38HynekPetrakcreate