Message 387721 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Paweł Miech
Recipients	Paweł Miech, alex, georg.brandl, giampaolo.rodola, gregory.p.smith, rhettinger, santoso.wijaya, serhiy.storchaka, terry.reedy, tshepang, uwinx
Date	2021-02-26.12:05:11
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1614341112.29.0.0138622697197.issue17343@roundup.psfhosted.org>
In-reply-to

Content
Making string.split iterator sounds like an interesting task. I found this issue because recently we talked in project that string.split returns a list and it can cause increased memory usage footprint for some tasks when there is large response to parse. Here is small script, created by my friend Juancarlo Anez, with iterator version of string.split. Compared with default string split it uses much less memory. When running with memory-profiler tool: https://pypi.org/project/memory-profiler/ It creates this output 3299999 Filename: main.py Line # Mem usage Increment Occurences Line Contents ============================================================ 24 39.020 MiB 39.020 MiB 1 @profile 25 def generate_string(): 26 39.020 MiB 0.000 MiB 1 n = 100000 27 49.648 MiB 4.281 MiB 100003 long_string = " ".join([uuid.uuid4().hex.upper() for _ in range(n)]) 28 43.301 MiB -6.348 MiB 1 print(len(long_string)) 29 30 43.301 MiB 0.000 MiB 1 z = isplit(long_string) 31 43.301 MiB 0.000 MiB 100001 for line in z: 32 43.301 MiB 0.000 MiB 100000 continue 33 34 52.281 MiB 0.297 MiB 100001 for line in long_string.split(): 35 52.281 MiB 0.000 MiB 100000 continue You can see that default string.split uses much more memory.

Making string.split iterator sounds like an interesting task. I found this issue because recently we talked in project that string.split returns a list and it can cause increased memory usage footprint for some tasks when there is large response to parse. 

Here is small script, created by my friend Juancarlo Anez, with iterator version of string.split. Compared with default string split it uses much less memory. When running with memory-profiler tool: https://pypi.org/project/memory-profiler/

It creates this output
3299999
Filename: main.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    24   39.020 MiB   39.020 MiB           1   @profile
    25                                         def generate_string():
    26   39.020 MiB    0.000 MiB           1       n = 100000
    27   49.648 MiB    4.281 MiB      100003       long_string = " ".join([uuid.uuid4().hex.upper() for _ in range(n)])
    28   43.301 MiB   -6.348 MiB           1       print(len(long_string))
    29                                         
    30   43.301 MiB    0.000 MiB           1       z = isplit(long_string)
    31   43.301 MiB    0.000 MiB      100001       for line in z:
    32   43.301 MiB    0.000 MiB      100000           continue
    33                                         
    34   52.281 MiB    0.297 MiB      100001       for line in long_string.split():
    35   52.281 MiB    0.000 MiB      100000           continue


You can see that default string.split uses much more memory.

History
Date	User	Action	Args
2021-02-26 12:05:12	Paweł Miech	set	recipients: + Paweł Miech, georg.brandl, rhettinger, terry.reedy, gregory.p.smith, giampaolo.rodola, alex, santoso.wijaya, tshepang, serhiy.storchaka, uwinx
2021-02-26 12:05:12	Paweł Miech	set	messageid: <1614341112.29.0.0138622697197.issue17343@roundup.psfhosted.org>
2021-02-26 12:05:12	Paweł Miech	link	issue17343 messages
2021-02-26 12:05:11	Paweł Miech	create