Message 344494 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dino.viehland
Recipients	brett.cannon, dino.viehland, eric.snow, methane, serhiy.storchaka, skrah
Date	2019-06-03.23:45:51
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1559605551.66.0.732780590111.issue36839@roundup.psfhosted.org>
In-reply-to

Content
The 20MB of savings is actually the amount of byte code that exists in the IG code base. I was just measuring the web site code, and not the other various Python code in the process (e.g. no std lib code, no 3rd party libraries, etc...). The IG code base is pretty monolithic and starting up the site requires about half of the code to get imported. So I think the 20MB per process is a pretty realistic number. I've also created a C extension and the object implementing the buffer protocol looks like: typedef struct { PyObject_HEAD const char* data; size_t size; Py_ssize_t hash; CIceBreaker breaker; size_t exports; PyObject code_obj; /* borrowed reference, the code object keeps us alive */ } CIceBreakerCode; All of the modules are currently getting compiled into a single memory mapped file and then these objects get created which implement the buffer protocol for each function. So the overhead it just takes a byte code w/ 16 opcodes before it breaks even, so it is significantly lighter weight than using a memoryview object. It's certainly true that the byte code isn't the #1 source of memory here (the code objects themselves are pretty big), but in the serialized state it ends up representing 25% of the serialized data. I would expect when you add in ref counts and typing information it's not quite as good, but reducing the overhead of code by 20% is still a pretty nice win. I can't make any promises about open sourcing the import system, but I can certainly look into that as well.

The 20MB of savings is actually the amount of byte code that exists in the IG code base.  I was just measuring the web site code, and not the other various Python code in the process (e.g. no std lib code, no 3rd party libraries, etc...).  The IG code base is pretty monolithic and starting up the site requires about half of the code to get imported.  So I think the 20MB per process is a pretty realistic number.

I've also created a C extension and the object implementing the buffer protocol looks like:

typedef struct {
    PyObject_HEAD
    const char* data;
    size_t size;
    Py_ssize_t hash;
    CIceBreaker *breaker;
    size_t exports;
    PyObject* code_obj; /* borrowed reference, the code object keeps us alive */
} CIceBreakerCode;

All of the modules are currently getting compiled into a single memory mapped file and then these objects get created which implement the buffer protocol for each function.  So the overhead it just takes a byte code w/ 16 opcodes before it breaks even, so it is significantly lighter weight than using a memoryview object.

It's certainly true that the byte code isn't the #1 source of memory here (the code objects themselves are pretty big), but in the serialized state it ends up representing 25% of the serialized data.  I would expect when you add in ref counts and typing information it's not quite as good, but reducing the overhead of code by 20% is still a pretty nice win.

I can't make any promises about open sourcing the import system, but I can certainly look into that as well.

History
Date	User	Action	Args
2019-06-03 23:45:51	dino.viehland	set	recipients: + dino.viehland, brett.cannon, methane, skrah, eric.snow, serhiy.storchaka
2019-06-03 23:45:51	dino.viehland	set	messageid: <1559605551.66.0.732780590111.issue36839@roundup.psfhosted.org>
2019-06-03 23:45:51	dino.viehland	link	issue36839 messages
2019-06-03 23:45:51	dino.viehland	create