This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Greg Price
Recipients Greg Price, benjamin.peterson, ezio.melotti, mcepl, serhiy.storchaka, vstinner
Date 2019-08-15.04:09:42
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1565842183.59.0.500809540632.issue32771@roundup.psfhosted.org>
In-reply-to
Content
> About the RSS memory, I'm not sure how Linux accounts the Unicode databases before they are accessed. Is it like read-only memory loaded on demand when accessed?

It stands for "resident set size", as in "resident in memory"; and it only counts pages of real physical memory. The intention is to count up pages that the process is somehow using.

Where the definition potentially gets fuzzy is if this process and another are sharing some memory.  I don't know much about how that kind of edge case is handled.  But one thing I think it's pretty consistently good at is not counting pages that you've nominally mapped from a file, but haven't actually forced to be loaded physically into memory by actually looking at them.

That is: say you ask for a file (or some range of it) to be mapped into memory for you.  This means it's now there in the address space, and if the process does a load instruction from any of those addresses, the kernel will ensure the load instruction works seamlessly.  But: most of it won't be eagerly read from disk or loaded physically into RAM.  Rather, the kernel's counting on that load instruction causing a page fault; and its page-fault handler will take care of reading from the disk and sticking the data physically into RAM.  So until you actually execute some loads from those addresses, the data in that mapping doesn't contribute to the genuine demand for scarce physical RAM on the machine; and it also isn't counted in the RSS number.


Here's a demo!  This 262392 kiB (269 MB) Git packfile is the biggest file lying around in my CPython directory:

$ du -k .git/objects/pack/pack-0e4acf3b2d8c21849bb11d875bc14b4d62dc7ab1.pack
262392	.git/objects/pack/pack-0e4acf3b2d8c21849bb11d875bc14b4d62dc7ab1.pack


Open it for read -- adds 100 kiB, not sure why:

$ python
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, mmap
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:	    9968 kB
>>> fd = os.open('.git/objects/pack/pack-0e4acf3b2d8c21849bb11d875bc14b4d62dc7ab1.pack', os.O_RDONLY)
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:	   10068 kB


Map it into our address space -- RSS doesn't budge:

>>> m = mmap.mmap(fd, 0, prot=mmap.PROT_READ)
>>> m
<mmap.mmap object at 0x7f185b5379c0>
>>> len(m)
268684419
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:	   10068 kB


Cause the process to actually look at all the data (this takes about ~10s, too)...

>>> sum(len(l) for l in m)
268684419
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:	  271576 kB

RSS goes way up, by 261508 kiB!  Oddly slightly less (by ~1MB) than the file's size.


But wait, there's more. Drop that mapping, and RSS goes right back down (OK, keeps 8 kiB extra):

>>> del m
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:	   10076 kB

... and then map the exact same file again, and it's *still* down:

>>> m = mmap.mmap(fd, 0, prot=mmap.PROT_READ)
>>> os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
VmRSS:	   10076 kB

This last step is interesting because it's a certainty that the data is still physically in memory -- this is my desktop, with plenty of free RAM.  And it's even in our address space.  But because we haven't actually loaded from those addresses, it's still in memory only at the kernel's caching whim, and so apparently our process doesn't get "charged" or "blamed" for its presence there.


In the case of running an executable with a bunch of data in it, I expect that the bulk of the data (and of the code for that matter) winds up treated very much like the file contents we mmap'd in.  It's mapped but not eagerly physically loaded; so it doesn't contribute to the RSS number, nor to the genuine demand for scarce physical RAM on the machine.


That's a bit long :-), but hopefully informative.  In short, I think for us RSS should work well as a pretty faithful measure of the real memory consumption that we want to be frugal with.
History
Date User Action Args
2019-08-15 04:09:43Greg Pricesetrecipients: + Greg Price, vstinner, benjamin.peterson, mcepl, ezio.melotti, serhiy.storchaka
2019-08-15 04:09:43Greg Pricesetmessageid: <1565842183.59.0.500809540632.issue32771@roundup.psfhosted.org>
2019-08-15 04:09:43Greg Pricelinkissue32771 messages
2019-08-15 04:09:42Greg Pricecreate