base64 module of Python 3.4 uses 920 kB of memory #65078

vstinner · 2014-03-10T09:35:28Z

BPO	20879
Nosy	@loewis, @pitrou, @vstinner, @bitdancer, @serhiy-storchaka
Files	base64_mem.py base64_on_demand.patch base64_lazy_init.patch urlparse_lazy_init.patch base64_urlparse_lazy_init-2.patch base64_urlparse_lazy_init-3.patch base64_urlparse_lazy_init-4.patch base64_urlparse_lazy_init-4.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-03-17.21:41:31.816>
created_at = <Date 2014-03-10.09:35:27.787>
labels = ['performance']
title = 'base64 module of Python 3.4 uses 920 kB of memory'
updated_at = <Date 2014-03-17.22:18:31.574>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2014-03-17.22:18:31.574>
actor = 'Arfrever'
assignee = 'none'
closed = True
closed_date = <Date 2014-03-17.21:41:31.816>
closer = 'vstinner'
components = []
creation = <Date 2014-03-10.09:35:27.787>
creator = 'vstinner'
dependencies = []
files = ['34325', '34327', '34328', '34329', '34418', '34421', '34422', '34423']
hgrepos = []
issue_num = 20879
keywords = ['patch']
message_count = 18.0
messages = ['213018', '213023', '213024', '213025', '213026', '213027', '213028', '213029', '213030', '213075', '213564', '213566', '213569', '213574', '213577', '213578', '213902', '213903']
nosy_count = 7.0
nosy_names = ['loewis', 'pitrou', 'vstinner', 'Arfrever', 'r.david.murray', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'resource usage'
url = 'https://bugs.python.org/issue20879'
versions = ['Python 3.5']

vstinner · 2014-03-10T09:35:27Z

According to tracemalloc, "import base64" allocates 920.6 kB of memory. The 3 top locations are:

Lib/base64.py:414: size=420 KiB, count=7226, average=59 B
_b85chars2 = [(a + b) for a in _b85chars for b in _b85chars]
Lib/base64.py:306: size=420 KiB, count=7226, average=59 B
_a85chars2 = [(a + b) for a in _a85chars for b in _a85chars]
Lib/base64.py:142: size=59.8 KiB, count=1025, average=60 B
_b32tab2 = [a + b for a in _b32tab for b in _b32tab]

Compare it to Python 3.3: base64 of Python 3.3 only allocates 10.3 kB. (I installed tracemalloc manually on Python 3.3 to get this number.)

I suggest to initialized these precomputed tables to None, and compute them at the first call to b32encode() / b85encode() / a85encode().

The new Base65 and Ascii85 codecs were added by the issue bpo-17618.

_b32tab2 comes from changeset (1b5ef05d6ced) of issue bpo-17812: "quadratic complexity of base64.b32encode(). Optimize base64.b32encode() and base64.b32decode() (speed up to 3x)." So 3.3 branch is also affected (for the base32 table).

loewis · 2014-03-10T10:17:03Z

It's a classical time-space trade-off. I'm not sure this is a bug; if it is, the reasonable change (IMO) is to just revert the introduction of these *2 tables, and loop through the output character-by-character.

I'd personally be in favor of such a change (drop the tables). If people think that these functions are too slow, they should propose an extension of the binascii module, rather than micro-optimizing the Python code.

loewis · 2014-03-10T10:18:40Z

I'm also -0 on delayed creation of the tables. If these encodings are used, the memory overhead will still occur.

vstinner · 2014-03-10T10:21:13Z

I'd personally be in favor of such a change (drop the tables).

I didn't propose to drop the table, but create a table the first time that it is needed.

I never used ascii85 nor base85 nor base32 in my applications. Maybe because ascii85 and base85 are new in Python 3.4? ;-) Python should not waste memory if a function is never called.

I only import base64 module to the base64 codec :) (I don't care if there are more functions if they don't waste my memory.)

vstinner · 2014-03-10T10:41:21Z

base64_on_demand.patch: create tables the first time they are needed.

b85decode() already uses such trick to build _b85dec.

Note: a85decode() doesn't use a precomputed table.

If people think that these functions are too slow, they should propose an extension of the binascii module, rather than micro-optimizing the Python code.

Yes, a C extension may be even faster. But I'm not interested to work on such enhance, I just want to fix the regression in memory usage right now.

serhiy-storchaka · 2014-03-10T11:26:55Z

I was more interested in the import time, and it has slightly increased. Memory consumption ((32**2+2*85**2)sys.getsizeof(b'xx')/2**10 + sys.getsizeof(dict.fromkeys(range(32**2))) + 2*sys.getsizeof(dict.fromkeys(range(85**2))) + 2(85**2-256)*sys.getsizeof(85**2) < 1MB) was seemed small compared with the overall memory usage of Python (20-25MB), so I were ignored it.

No need for lazy initialization of small tables, and the overhead of the _init_*85_tables() call for every encoding/decoding operation may be too large. Here is simpler patch.

serhiy-storchaka · 2014-03-10T11:41:14Z

And here is similar patch for urllib.parse.

vstinner · 2014-03-10T11:50:57Z

Here is simpler patch.

Your patch only changes _*tab2, not the smaller tables.

serhiy-storchaka · 2014-03-10T12:06:30Z

Yes, because small tables consume two orders less memory.

pitrou · 2014-03-10T19:49:35Z

Victor's approach looks fine to me.

vstinner · 2014-03-14T15:28:52Z

urlparse_lazy_init.patch looks good, but you should add a comment explaining your change. See my review:
http://bugs.python.org/review/20879/

vstinner · 2014-03-14T16:01:08Z

I combined the two patches and I tried to address all comments: base64_urlparse_lazy_init-2.patch.

vstinner · 2014-03-14T16:58:53Z

Serhiy wrote on Rietveld:
"As far as this constant is repeated twice, it wastes memory and errorprone."

Oh, I expected marshal to be smart and use references, but it does not.

Here is a simpler patch which initialize all base85 tables at once using a "b85alphabet" variable which simplify also the initialization of the decoder table.

vstinner · 2014-03-14T17:35:22Z

Sorry, it took me many tries to write the perfect patch :-) The current code has artificial dependencies between variables and has references to variables when they are not needed in fact.

base64_urlparse_lazy_init-4.patch should be even simpler.

serhiy-storchaka · 2014-03-14T17:43:58Z

I don't like base64_urlparse_lazy_init-3.patch at all. It waste memory and time for initializing of non-needed tables in case when only encoding or only decoding is used.

Here is new patch based on base64_urlparse_lazy_init-2.patch. Unlike to my patch, it delays initialization of small tables too. Unlike to Victor's patches it initializes only needed tables and delays initialization of b32 tables.

serhiy-storchaka · 2014-03-14T17:58:14Z

Well, our patches are almost same so your patch LGTM (and it has more comments). But note that my initialization of _b32tab2 is a little faster (bytes() is called 32 times instead of 1024).

python-dev · 2014-03-17T21:40:07Z

New changeset 7093d5758954 by Victor Stinner in branch '3.4':
Issue bpo-20879: Delay the initialization of encoding and decoding tables for
http://hg.python.org/cpython/rev/7093d5758954

New changeset 06d646935c9a by Victor Stinner in branch 'default':
(Merge 3.4) Issue bpo-20879: Delay the initialization of encoding and decoding
http://hg.python.org/cpython/rev/06d646935c9a

vstinner · 2014-03-17T21:41:32Z

Thanks Serhiy, I merged your last patch with mine. This issue should now be fixed.

As Martin wrote, an enhancement would be to reimplement these functions in C without such table.

vstinner added the performance Performance or resource usage label Mar 10, 2014

pitrou added performance Performance or resource usage and removed performance Performance or resource usage labels Mar 10, 2014

vstinner closed this as completed Mar 17, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

base64 module of Python 3.4 uses 920 kB of memory #65078

base64 module of Python 3.4 uses 920 kB of memory #65078

vstinner commented Mar 10, 2014

vstinner commented Mar 10, 2014

loewis mannequin commented Mar 10, 2014

loewis mannequin commented Mar 10, 2014

vstinner commented Mar 10, 2014

vstinner commented Mar 10, 2014

serhiy-storchaka commented Mar 10, 2014

serhiy-storchaka commented Mar 10, 2014

vstinner commented Mar 10, 2014

serhiy-storchaka commented Mar 10, 2014

pitrou commented Mar 10, 2014

vstinner commented Mar 14, 2014

vstinner commented Mar 14, 2014

vstinner commented Mar 14, 2014

vstinner commented Mar 14, 2014

serhiy-storchaka commented Mar 14, 2014

serhiy-storchaka commented Mar 14, 2014

python-dev mannequin commented Mar 17, 2014

vstinner commented Mar 17, 2014

base64 module of Python 3.4 uses 920 kB of memory #65078

base64 module of Python 3.4 uses 920 kB of memory #65078

Comments

vstinner commented Mar 10, 2014

vstinner commented Mar 10, 2014

loewis mannequin commented Mar 10, 2014

loewis mannequin commented Mar 10, 2014

vstinner commented Mar 10, 2014

vstinner commented Mar 10, 2014

serhiy-storchaka commented Mar 10, 2014

serhiy-storchaka commented Mar 10, 2014

vstinner commented Mar 10, 2014

serhiy-storchaka commented Mar 10, 2014

pitrou commented Mar 10, 2014

vstinner commented Mar 14, 2014

vstinner commented Mar 14, 2014

vstinner commented Mar 14, 2014

vstinner commented Mar 14, 2014

serhiy-storchaka commented Mar 14, 2014

serhiy-storchaka commented Mar 14, 2014

python-dev mannequin commented Mar 17, 2014

vstinner commented Mar 17, 2014