Author dino.viehland
Recipients brett.cannon, dino.viehland, eric.snow, methane, serhiy.storchaka, skrah
Date 2019-06-03.23:45:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
The 20MB of savings is actually the amount of byte code that exists in the IG code base.  I was just measuring the web site code, and not the other various Python code in the process (e.g. no std lib code, no 3rd party libraries, etc...).  The IG code base is pretty monolithic and starting up the site requires about half of the code to get imported.  So I think the 20MB per process is a pretty realistic number.

I've also created a C extension and the object implementing the buffer protocol looks like:

typedef struct {
    const char* data;
    size_t size;
    Py_ssize_t hash;
    CIceBreaker *breaker;
    size_t exports;
    PyObject* code_obj; /* borrowed reference, the code object keeps us alive */
} CIceBreakerCode;

All of the modules are currently getting compiled into a single memory mapped file and then these objects get created which implement the buffer protocol for each function.  So the overhead it just takes a byte code w/ 16 opcodes before it breaks even, so it is significantly lighter weight than using a memoryview object.

It's certainly true that the byte code isn't the #1 source of memory here (the code objects themselves are pretty big), but in the serialized state it ends up representing 25% of the serialized data.  I would expect when you add in ref counts and typing information it's not quite as good, but reducing the overhead of code by 20% is still a pretty nice win.

I can't make any promises about open sourcing the import system, but I can certainly look into that as well.
Date User Action Args
2019-06-03 23:45:51dino.viehlandsetrecipients: + dino.viehland, brett.cannon, methane, skrah, eric.snow, serhiy.storchaka
2019-06-03 23:45:51dino.viehlandsetmessageid: <>
2019-06-03 23:45:51dino.viehlandlinkissue36839 messages
2019-06-03 23:45:51dino.viehlandcreate