Message 356586 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	aeros
Recipients	aeros, asvetlov, benjamin.peterson, vstinner, yselivanov
Date	2019-11-14.08:30:01
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1573720202.88.0.990406187175.issue38591@roundup.psfhosted.org>
In-reply-to

Content
> I understand that there's some overhead associated with spawning a new thread, but from my impression it's not substantial enough to make a significant impact in most cases. Although I think this still stands to some degree, I will have to rescind the following: > Each individual instance of threading.Thread is only 64 bytes. The 64 bytes was measured by `sys.getsizeof(threading.Thread())`, which only provides a surface level assessment. I believe this only includes the size of the reference to the thread object. In order to get a better estimate, I implemented a custom get_size() function, that recursively adds the size of the object and all unique objects from gc.get_referents() (ignoring several redundant and/or unnecessary types). For more details, see https://gist.github.com/aeros/632bd035b6f95e89cdf4bb29df970a2a. Feel free to critique it if there are any apparent issues (for the purpose of measuring the size of threads). Then, I used this function on three different threads, to figure how much memory was needed for each one: Python 3.8.0+ (heads/3.8:1d2862a323, Nov 4 2019, 06:59:53) [GCC 9.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import threading >>> from get_size import get_size >>> a = threading.Thread() >>> b = threading.Thread() >>> c = threading.Thread() >>> get_size(a) 3995 >>> get_size(b) 1469 >>> get_size(c) 1469 1469 bytes seems to be roughly the amount of additional memory required for each new thread, at least on Linux kernel 5.3.8 and Python 3.8. I don't know if this is 100% accurate, but it at least provides an improved estimate over sys.getsizeof(). > But it spawns a new Python thread per process which can be a blocker issue if a server memory is limited. What if you want to spawn 100 processes? Or 1000 processes? What is the memory usage? From my understanding, ~1.5KB/thread seems to be quite negligible for most modern equipment. The server's memory would have to be very limited for spawning an additional 1000 threads to be a bottleneck/blocker issue: Python 3.8.0+ (heads/3.8:1d2862a323, Nov 4 2019, 06:59:53) [GCC 9.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import threading >>> from get_size import get_size >>> threads = [] >>> for _ in range(1000): ... th = threading.Thread() ... threads.append(th) ... >>> get_size(threads) 1482435 (~1.5MB) Victor (or anyone else), in your experience, would the additional ~1.5KB per process be an issue for 99% of production servers? If not, it seems to me like the additional maintenance cost of keeping SafeChildWatcher and FastChildWatcher in asyncio's API wouldn't be worthwhile.

> I understand that there's *some* overhead associated with spawning a new thread, but from my impression it's not substantial enough to make a significant impact in most cases.

Although I think this still stands to some degree, I will have to rescind the following:

> Each individual instance of threading.Thread is only 64 bytes.

The 64 bytes was measured by `sys.getsizeof(threading.Thread())`, which only provides a surface level assessment. I believe this only includes the size of the reference to the thread object.

In order to get a better estimate, I implemented a custom get_size() function, that recursively adds the size of the object and all unique objects from gc.get_referents()  (ignoring several redundant and/or unnecessary types). For more details, see https://gist.github.com/aeros/632bd035b6f95e89cdf4bb29df970a2a. Feel free to critique it if there are any apparent issues (for the purpose of measuring the size of threads). 

Then, I used this function on three different threads, to figure how much memory was needed for each one:

Python 3.8.0+ (heads/3.8:1d2862a323, Nov  4 2019, 06:59:53) 
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import threading
>>> from get_size import get_size
>>> a = threading.Thread()
>>> b = threading.Thread()
>>> c = threading.Thread()
>>> get_size(a)
3995
>>> get_size(b)
1469
>>> get_size(c)
1469

1469 bytes seems to be roughly the amount of additional memory required for each new thread, at least on Linux kernel 5.3.8 and Python 3.8. I don't know if this is 100% accurate, but it at least provides an improved estimate over sys.getsizeof().

> But it spawns a new Python thread per process which can be a blocker issue if a server memory is limited. What if you want to spawn 100 processes? Or 1000 processes? What is the memory usage?

From my understanding, ~1.5KB/thread seems to be quite negligible for most modern equipment. The server's memory would have to be very limited for spawning an additional 1000 threads to be a bottleneck/blocker issue:

Python 3.8.0+ (heads/3.8:1d2862a323, Nov  4 2019, 06:59:53) 
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import threading
>>> from get_size import get_size
>>> threads = []
>>> for _ in range(1000):
...     th = threading.Thread()
...     threads.append(th)
... 
>>> get_size(threads)
1482435

(~1.5MB)

Victor (or anyone else), in your experience, would the additional ~1.5KB per process be an issue for 99% of production servers? If not, it seems to me like the additional maintenance cost of keeping SafeChildWatcher and FastChildWatcher in asyncio's API wouldn't be worthwhile.

History
Date	User	Action	Args
2019-11-14 08:30:03	aeros	set	recipients: + aeros, vstinner, benjamin.peterson, asvetlov, yselivanov
2019-11-14 08:30:02	aeros	set	messageid: <1573720202.88.0.990406187175.issue38591@roundup.psfhosted.org>
2019-11-14 08:30:02	aeros	link	issue38591 messages
2019-11-14 08:30:01	aeros	create