Issue 14422: Pack PyASCIIObject fields to reduce memory consumption of pure ASCII strings

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/58630

classification

Title:	Pack PyASCIIObject fields to reduce memory consumption of pure ASCII strings
Type:	enhancement	Stage:
Components:	Interpreter Core	Versions:	Python 3.3

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	jcea, loewis, pitrou, r.david.murray, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2012-03-27 11:14 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
pack_pyasciiobject.patch	vstinner, 2012-03-27 11:14		review

Messages (6)
msg156905 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-03-27 11:14
It is possible to reduce PyASCIIObject.state to 8 bits instead of 32, move it to the end (exchange wstr and state) of the structure and pack the structure. As a result, the structure size is reduced by 3 bytes (state type changes from int to char). I expect a low or not overhead on performances because only PyASCIIObject.state field is affected and this field size is 8 bits. See also the issue #14419 which relies on memory alignment (of the ASCII string data) to optimize the ASCII decoder. If I understand correctly, my patch disables the possibility of this optimization. -- Example on Linux 32 bits: $ cat x.c #include <Python.h> int main() { printf("sizeof(PyASCIIObject)=%u bytes\n", sizeof(PyASCIIObject)); printf("sizeof(PyCompactUnicodeObject)=%u bytes\n", sizeof(PyCompactUnicodeObject)); printf("sizeof(PyUnicodeObject)=%u bytes\n", sizeof(PyUnicodeObject)); return 0; } # unpatched $ gcc -I Include/ -I . x.c -o x && ./x sizeof(PyASCIIObject)=24 bytes sizeof(PyCompactUnicodeObject)=36 bytes sizeof(PyUnicodeObject)=40 bytes # pack the 3 structures $ gcc -I Include/ -I . x.c -o x && ./x sizeof(PyASCIIObject)=21 bytes sizeof(PyCompactUnicodeObject)=33 bytes sizeof(PyUnicodeObject)=37 bytes -- We might also pack PyCompactUnicodeObject and PyUnicodeObject but it would have a bad impact on performances because utf8_length, utf8, wstr_length and data would not be aligned anymore.
msg156908 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-03-27 11:23
iobench and stringbench results on unpatched Python: $ ./python Tools/iobench/iobench.py -t Preparing files... Python 3.3.0a1+ (default:51016ff7f8c9, Mar 27 2012, 13:19:52) [GCC 4.6.1] Unicode: PEP 393 Linux-3.0.0-16-generic-pae-i686-with-debian-wheezy-sid Text unit = one character (utf8-decoded) Text input [ 400KB ] read one unit at a time... 5.4 MB/s [ 400KB ] read 20 units at a time... 68 MB/s [ 400KB ] read one line at a time... 174 MB/s [ 400KB ] read 4096 units at a time... 289 MB/s [ 20KB ] read whole contents at once... 315 MB/s [ 400KB ] read whole contents at once... 332 MB/s [ 10MB ] read whole contents at once... 292 MB/s [ 400KB ] seek forward one unit at a time... 0.304 MB/s [ 400KB ] seek forward 1000 units at a time... 312 MB/s Text append [ 20KB ] write one unit at a time... 3.05 MB/s [ 400KB ] write 20 units at a time... 43 MB/s [ 400KB ] write 4096 units at a time... 554 MB/s [ 10MB ] write 1e6 units at a time... 450 MB/s Text overwrite [ 20KB ] modify one unit at a time... 1.18 MB/s [ 400KB ] modify 20 units at a time... 18.9 MB/s [ 400KB ] modify 4096 units at a time... 400 MB/s $ ./python stringbench/stringbench.py stringbench v2.0 3.3.0a1+ (default:51016ff7f8c9, Mar 27 2012, 13:19:52) [GCC 4.6.1] 2012-03-27 13:21:01.217823 bytes unicode (in ms) (in ms) % comment ========== case conversion -- dense 0.37 0.38 97.9 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"10).lower() (1000) 0.38 0.38 99.3 ("where in the world is carmen san deigo?"10).upper() (1000) ========== case conversion -- rare 0.38 0.38 99.9 ("Where in the world is Carmen San Deigo?"10).lower() (1000) 0.43 0.38 113.6 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"10).upper() (1000) ========== concat 20 strings of words length 4 to 15 1.76 1.69 104.2 s1+s2+s3+s4+...+s20 (1000) ========== concat two strings 0.08 0.07 107.7 "Andrew"+"Dalke" (1000) ========== count AACT substrings in DNA example 2.15 2.13 100.7 dna.count("AACT") (10) ========== count newlines 0.65 0.58 110.8 ...text.with.2000.newlines.count("\n") (10) ========== early match, single character 0.20 0.19 107.9 ("A"1000).find("A") (1000) 0.36 0.05 745.8 "A" in "A"1000 (1000) 0.18 0.19 96.4 ("A"1000).index("A") (1000) 0.18 0.21 85.5 ("A"1000).partition("A") (1000) 0.21 0.20 103.6 ("A"1000).rfind("A") (1000) 0.21 0.30 69.8 ("A"1000).rindex("A") (1000) 0.37 0.21 171.7 ("A"1000).rpartition("A") (1000) 0.38 0.39 98.4 ("A"1000).rsplit("A", 1) (1000) 0.37 0.37 100.7 ("A"1000).split("A", 1) (1000) ========== early match, two characters 0.20 0.19 107.7 ("AB"1000).find("AB") (1000) 0.36 0.05 702.1 "AB" in "AB"1000 (1000) 0.18 0.19 96.9 ("AB"1000).index("AB") (1000) 0.20 0.24 83.9 ("AB"1000).partition("AB") (1000) 0.20 0.20 103.6 ("AB"1000).rfind("AB") (1000) 0.20 0.19 102.9 ("AB"1000).rindex("AB") (1000) 0.20 0.23 86.7 ("AB"1000).rpartition("AB") (1000) 0.39 0.40 97.7 ("AB"1000).rsplit("AB", 1) (1000) 0.40 0.42 94.4 ("AB"1000).split("AB", 1) (1000) ========== endswith multiple characters 0.17 0.19 92.6 "Andrew".endswith("Andrew") (1000) ========== endswith multiple characters - not! 0.17 0.18 95.2 "Andrew".endswith("Anders") (1000) ========== endswith single character 0.17 0.18 92.3 "Andrew".endswith("w") (1000) ========== formatting a string type with a dict N/A 0.91 0.0 "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (1000) ========== join empty string, with 1 character sep N/A 0.04 0.0 "A".join("") (100) ========== join empty string, with 5 character sep N/A 0.04 0.0 "ABCDE".join("") (100) ========== join list of 100 words, with 1 character sep 1.37 1.71 80.0 "A".join(["Bob"]100)) (1000) ========== join list of 100 words, with 5 character sep 1.50 1.86 80.8 "ABCDE".join(["Bob"]100)) (1000) ========== join list of 26 characters, with 1 character sep 0.48 0.49 99.6 "A".join(list("ABC..Z")) (1000) ========== join list of 26 characters, with 5 character sep 0.49 0.54 91.3 "ABCDE".join(list("ABC..Z")) (1000) ========== join string with 26 characters, with 1 character sep N/A 1.17 0.0 "A".join("ABC..Z") (1000) ========== join string with 26 characters, with 5 character sep N/A 1.22 0.0 "ABCDE".join("ABC..Z") (1000) ========== late match, 100 characters 8.48 8.46 100.2 s="ABC"33; ((s+"D")500+s+"E").find(s+"E") (100) 4.19 3.50 119.9 s="ABC"33; ((s+"D")500+"E"+s).find("E"+s) (100) 5.30 5.11 103.7 s="ABC"33; (s+"E") in ((s+"D")300+s+"E") (100) 8.47 8.45 100.2 s="ABC"33; ((s+"D")500+s+"E").index(s+"E") (100) 8.68 8.68 100.0 s="ABC"33; ((s+"D")500+s+"E").partition(s+"E") (100) 6.36 6.37 99.8 s="ABC"33; ("E"+s+("D"+s)500).rfind("E"+s) (100) 2.33 2.27 102.4 s="ABC"33; (s+"E"+("D"+s)500).rfind(s+"E") (100) 6.58 6.58 100.1 s="ABC"33; ("E"+s+("D"+s)500).rindex("E"+s) (100) 7.34 6.56 111.9 s="ABC"33; ("E"+s+("D"+s)500).rpartition("E"+s) (100) 6.69 7.65 87.5 s="ABC"33; ("E"+s+("D"+s)500).rsplit("E"+s, 1) (100) 8.47 8.87 95.4 s="ABC"33; ((s+"D")500+s+"E").split(s+"E", 1) (100) ========== late match, two characters 1.30 1.26 102.7 ("AB"300+"C").find("BC") (1000) 1.30 1.27 102.0 ("AB"300+"CA").find("CA") (1000) 1.42 1.10 129.6 "BC" in ("AB"300+"C") (1000) 1.20 1.20 100.2 ("AB"300+"C").index("BC") (1000) 1.16 1.26 92.3 ("AB"300+"C").partition("BC") (1000) 0.95 0.94 101.0 ("C"+"AB"300).rfind("CA") (1000) 0.90 0.69 131.2 ("BC"+"AB"300).rfind("BC") (1000) 0.94 0.94 100.1 ("C"+"AB"300).rindex("CA") (1000) 1.02 0.94 108.6 ("C"+"AB"300).rpartition("CA") (1000) 1.12 1.08 103.7 ("C"+"AB"300).rsplit("CA", 1) (1000) 1.27 1.38 91.8 ("AB"300+"C").split("BC", 1) (1000) ========== no match, single character 0.45 0.41 111.1 ("A"1000).find("B") (1000) 0.59 0.29 205.4 "B" in "A"1000 (1000) 0.30 0.31 97.4 ("A"1000).partition("B") (1000) 0.49 0.48 102.5 ("A"1000).rfind("B") (1000) 0.36 0.37 96.5 ("A"1000).rpartition("B") (1000) 0.77 0.76 101.4 ("A"1000).rsplit("B", 1) (1000) 0.83 0.81 101.6 ("A"1000).split("B", 1) (1000) ========== no match, two characters 3.80 3.78 100.6 ("AB"1000).find("BC") (1000) 4.08 3.68 111.0 ("AB"1000).find("CA") (1000) 3.71 3.40 109.2 "BC" in "AB"1000 (1000) 3.44 3.42 100.8 ("AB"1000).partition("BC") (1000) 2.56 1.86 137.9 ("AB"1000).rfind("BC") (1000) 2.69 2.69 100.2 ("AB"1000).rfind("CA") (1000) 2.50 1.84 135.6 ("AB"1000).rpartition("BC") (1000) 2.03 1.94 104.7 ("AB"1000).rsplit("BC", 1) (1000) 3.27 3.56 91.8 ("AB"1000).split("BC", 1) (1000) ========== quick replace multiple character match 0.08 0.08 99.7 ("A" + ("Z"1281024)).replace("AZZ", "BBZZ", 1) (10) ========== quick replace single character match 0.08 0.09 89.5 ("A" + ("Z"1281024)).replace("A", "BB", 1) (10) ========== repeat 1 character 10 times 0.06 0.07 87.0 "A"10 (1000) ========== repeat 1 character 1000 times 0.13 0.15 89.3 "A"1000 (1000) ========== repeat 5 characters 10 times 0.12 0.09 128.8 "ABCDE"10 (1000) ========== repeat 5 characters 1000 times 0.33 0.34 94.8 "ABCDE"1000 (1000) ========== replace and expand multiple characters, big string 1.83 2.11 86.4 "...text.with.2000.newlines...replace("\n", "\r\n") (10) ========== replace multiple characters, dna 3.21 3.23 99.5 dna.replace("ATC", "ATT") (10) ========== replace single character 0.18 0.25 70.9 "This is a test".replace(" ", "\t") (1000) ========== replace single character, big string 0.65 0.92 70.1 "...text.with.2000.lines...replace("\n", " ") (10) ========== replace/remove multiple characters 0.27 0.34 78.7 "When shall we three meet again?".replace("ee", "") (1000) ========== split 1 whitespace 0.12 0.14 82.7 ("Here are some words. "2).partition(" ") (1000) 0.08 0.11 75.9 ("Here are some words. "2).rpartition(" ") (1000) 0.23 0.26 87.4 ("Here are some words. "2).rsplit(None, 1) (1000) 0.24 0.25 95.9 ("Here are some words. "2).split(None, 1) (1000) ========== split 2000 newlines 1.59 1.75 90.8 "...text...".rsplit("\n") (10) 1.64 1.68 97.5 "...text...".split("\n") (10) 1.83 2.03 90.1 "...text...".splitlines() (10) ========== split newlines 0.26 0.29 88.8 "this\nis\na\ntest\n".rsplit("\n") (1000) 0.27 0.29 92.2 "this\nis\na\ntest\n".split("\n") (1000) 0.26 0.30 85.8 "this\nis\na\ntest\n".splitlines() (1000) ========== split on multicharacter separator (dna) 2.18 1.86 117.5 dna.rsplit("ACTAT") (10) 2.53 2.48 102.0 dna.split("ACTAT") (10) ========== split on multicharacter separator (small) 0.53 0.59 88.8 "this--is--a--test--of--the--emergency--broadcast--system".rsplit("--") (1000) 0.59 0.57 102.6 "this--is--a--test--of--the--emergency--broadcast--system".split("--") (1000) ========== split whitespace (huge) 1.50 1.73 86.9 human_text.rsplit() (10) 1.49 1.75 85.5 human_text.split() (10) ========== split whitespace (small) 0.43 0.50 87.0 ("Here are some words. "2).rsplit() (1000) 0.40 0.50 79.4 ("Here are some words. "2).split() (1000) ========== startswith multiple characters 0.17 0.18 92.0 "Andrew".startswith("Andrew") (1000) ========== startswith multiple characters - not! 0.17 0.17 99.5 "Andrew".startswith("Anders") (1000) ========== startswith single character 0.17 0.18 94.0 "Andrew".startswith("A") (1000) ========== strip terminal newline 0.07 0.15 46.9 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (1000) 0.06 0.07 78.1 "\nHello!".rstrip() (1000) 0.05 0.13 42.1 "Hello!\n".rstrip() (1000) 0.06 0.07 77.1 "\nHello!\n".strip() (1000) 0.06 0.07 77.6 "\nHello!".strip() (1000) 0.05 0.07 75.0 "Hello!\n".strip() (1000) ========== strip terminal spaces and tabs 0.06 0.08 74.2 "\t \tHello".rstrip() (1000) 0.06 0.07 79.4 "Hello\t \t".rstrip() (1000) 0.04 0.05 87.1 "Hello\t \t".strip() (1000) ========== tab split 0.44 0.51 87.2 GFF3_example.rsplit("\t", 8) (1000) 0.42 0.47 89.9 GFF3_example.rsplit("\t") (1000) 0.39 0.44 88.7 GFF3_example.split("\t", 8) (1000) 0.41 0.47 86.1 GFF3_example.split("\t") (1000) 158.46 160.84 98.5 TOTAL *************** iobench and stringbench results on patched Python (pack the 3 structures): $ ./python Tools/iobench/iobench.py -t Preparing files... Python 3.3.0a1+ (default:51016ff7f8c9+, Mar 27 2012, 13:11:28) [GCC 4.6.1] Unicode: PEP 393 Linux-3.0.0-16-generic-pae-i686-with-debian-wheezy-sid Text unit = one character (utf8-decoded) Text input [ 400KB ] read one unit at a time... 5.4 MB/s [ 400KB ] read 20 units at a time... 68.5 MB/s [ 400KB ] read one line at a time... 163 MB/s [ 400KB ] read 4096 units at a time... 295 MB/s [ 20KB ] read whole contents at once... 322 MB/s [ 400KB ] read whole contents at once... 336 MB/s [ 10MB ] read whole contents at once... 289 MB/s [ 400KB ] seek forward one unit at a time... 0.32 MB/s [ 400KB ] seek forward 1000 units at a time... 325 MB/s Text append [ 20KB ] write one unit at a time... 2.99 MB/s [ 400KB ] write 20 units at a time... 44 MB/s [ 400KB ] write 4096 units at a time... 556 MB/s [ 10MB ] write 1e6 units at a time... 456 MB/s Text overwrite ** [ 20KB ] modify one unit at a time... 1.16 MB/s [ 400KB ] modify 20 units at a time... 19.5 MB/s [ 400KB ] modify 4096 units at a time... 401 MB/s $ ./python stringbench/stringbench.py stringbench v2.0 3.3.0a1+ (default:51016ff7f8c9+, Mar 27 2012, 13:11:28) [GCC 4.6.1] 2012-03-27 13:17:42.363789 bytes unicode (in ms) (in ms) % comment ========== case conversion -- dense 0.37 0.38 98.6 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"10).lower() (1000) 0.37 0.38 98.4 ("where in the world is carmen san deigo?"10).upper() (1000) ========== case conversion -- rare 0.37 0.38 98.6 ("Where in the world is Carmen San Deigo?"10).lower() (1000) 0.37 0.38 98.4 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"10).upper() (1000) ========== concat 20 strings of words length 4 to 15 1.86 1.85 100.9 s1+s2+s3+s4+...+s20 (1000) ========== concat two strings 0.08 0.07 108.0 "Andrew"+"Dalke" (1000) ========== count AACT substrings in DNA example 2.16 2.12 101.8 dna.count("AACT") (10) ========== count newlines 0.59 0.58 101.3 ...text.with.2000.newlines.count("\n") (10) ========== early match, single character 0.18 0.17 103.7 ("A"1000).find("A") (1000) 0.36 0.05 775.5 "A" in "A"1000 (1000) 0.17 0.17 102.0 ("A"1000).index("A") (1000) 0.17 0.20 84.7 ("A"1000).partition("A") (1000) 0.19 0.19 102.2 ("A"1000).rfind("A") (1000) 0.19 0.38 50.7 ("A"1000).rindex("A") (1000) 0.18 0.20 90.0 ("A"1000).rpartition("A") (1000) 0.59 0.36 166.9 ("A"1000).rsplit("A", 1) (1000) 0.34 0.36 93.5 ("A"1000).split("A", 1) (1000) ========== early match, two characters 0.18 0.19 95.8 ("AB"1000).find("AB") (1000) 0.44 0.05 891.0 "AB" in "AB"1000 (1000) 0.23 0.31 73.4 ("AB"1000).index("AB") (1000) 0.22 0.31 70.7 ("AB"1000).partition("AB") (1000) 0.19 0.19 101.2 ("AB"1000).rfind("AB") (1000) 0.19 0.19 102.0 ("AB"1000).rindex("AB") (1000) 0.17 0.21 78.7 ("AB"1000).rpartition("AB") (1000) 0.35 0.38 93.0 ("AB"1000).rsplit("AB", 1) (1000) 0.39 0.42 93.0 ("AB"1000).split("AB", 1) (1000) ========== endswith multiple characters 0.16 0.17 93.0 "Andrew".endswith("Andrew") (1000) ========== endswith multiple characters - not! 0.16 0.16 101.4 "Andrew".endswith("Anders") (1000) ========== endswith single character 0.16 0.17 93.7 "Andrew".endswith("w") (1000) ========== formatting a string type with a dict N/A 0.86 0.0 "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (1000) ========== join empty string, with 1 character sep N/A 0.04 0.0 "A".join("") (100) ========== join empty string, with 5 character sep N/A 0.04 0.0 "ABCDE".join("") (100) ========== join list of 100 words, with 1 character sep 1.42 1.74 81.3 "A".join(["Bob"]100)) (1000) ========== join list of 100 words, with 5 character sep 1.62 1.95 83.3 "ABCDE".join(["Bob"]100)) (1000) ========== join list of 26 characters, with 1 character sep 0.51 0.57 89.7 "A".join(list("ABC..Z")) (1000) ========== join list of 26 characters, with 5 character sep 0.58 0.53 108.1 "ABCDE".join(list("ABC..Z")) (1000) ========== join string with 26 characters, with 1 character sep N/A 1.30 0.0 "A".join("ABC..Z") (1000) ========== join string with 26 characters, with 5 character sep N/A 1.22 0.0 "ABCDE".join("ABC..Z") (1000) ========== late match, 100 characters 8.50 8.45 100.6 s="ABC"33; ((s+"D")500+s+"E").find(s+"E") (100) 3.70 3.46 107.0 s="ABC"33; ((s+"D")500+"E"+s).find("E"+s) (100) 5.11 5.08 100.6 s="ABC"33; (s+"E") in ((s+"D")300+s+"E") (100) 8.62 8.47 101.7 s="ABC"33; ((s+"D")500+s+"E").index(s+"E") (100) 8.80 8.67 101.5 s="ABC"33; ((s+"D")500+s+"E").partition(s+"E") (100) 6.39 6.46 99.0 s="ABC"33; ("E"+s+("D"+s)500).rfind("E"+s) (100) 2.31 2.18 105.9 s="ABC"33; (s+"E"+("D"+s)500).rfind(s+"E") (100) 6.41 6.35 100.9 s="ABC"33; ("E"+s+("D"+s)500).rindex("E"+s) (100) 7.41 6.56 112.9 s="ABC"33; ("E"+s+("D"+s)500).rpartition("E"+s) (100) 6.59 6.59 100.0 s="ABC"33; ("E"+s+("D"+s)500).rsplit("E"+s, 1) (100) 8.00 8.69 92.0 s="ABC"33; ((s+"D")500+s+"E").split(s+"E", 1) (100) ========== late match, two characters 1.20 1.21 99.6 ("AB"300+"C").find("BC") (1000) 1.29 1.25 103.1 ("AB"300+"CA").find("CA") (1000) 1.41 1.07 130.9 "BC" in ("AB"300+"C") (1000) 1.20 1.21 99.3 ("AB"300+"C").index("BC") (1000) 1.17 1.20 97.5 ("AB"300+"C").partition("BC") (1000) 0.95 0.93 101.4 ("C"+"AB"300).rfind("CA") (1000) 0.90 0.69 129.3 ("BC"+"AB"300).rfind("BC") (1000) 0.95 0.94 101.2 ("C"+"AB"300).rindex("CA") (1000) 1.01 0.94 106.8 ("C"+"AB"300).rpartition("CA") (1000) 1.11 1.10 101.5 ("C"+"AB"300).rsplit("CA", 1) (1000) 1.28 1.37 93.6 ("AB"300+"C").split("BC", 1) (1000) ========== no match, single character 0.41 0.40 101.2 ("A"1000).find("B") (1000) 0.59 0.29 203.8 "B" in "A"1000 (1000) 0.29 0.30 95.7 ("A"1000).partition("B") (1000) 0.49 0.48 101.4 ("A"1000).rfind("B") (1000) 0.37 0.38 97.3 ("A"1000).rpartition("B") (1000) 0.76 0.75 101.1 ("A"1000).rsplit("B", 1) (1000) 0.76 0.75 100.9 ("A"1000).split("B", 1) (1000) ========== no match, two characters 3.53 3.52 100.2 ("AB"1000).find("BC") (1000) 3.92 3.67 106.9 ("AB"1000).find("CA") (1000) 3.71 3.39 109.6 "BC" in "AB"1000 (1000) 3.40 3.42 99.5 ("AB"1000).partition("BC") (1000) 2.55 1.90 134.2 ("AB"1000).rfind("BC") (1000) 2.69 2.68 100.1 ("AB"1000).rfind("CA") (1000) 2.43 1.81 133.9 ("AB"1000).rpartition("BC") (1000) 2.02 1.92 104.8 ("AB"1000).rsplit("BC", 1) (1000) 3.27 3.54 92.4 ("AB"1000).split("BC", 1) (1000) ========== quick replace multiple character match 0.09 0.08 107.7 ("A" + ("Z"1281024)).replace("AZZ", "BBZZ", 1) (10) ========== quick replace single character match 0.09 0.08 108.7 ("A" + ("Z"1281024)).replace("A", "BB", 1) (10) ========== repeat 1 character 10 times 0.06 0.07 87.5 "A"10 (1000) ========== repeat 1 character 1000 times 0.16 0.12 135.0 "A"1000 (1000) ========== repeat 5 characters 10 times 0.11 0.10 104.9 "ABCDE"10 (1000) ========== repeat 5 characters 1000 times 0.35 0.37 93.7 "ABCDE"1000 (1000) ========== replace and expand multiple characters, big string 1.78 2.04 87.3 "...text.with.2000.newlines...replace("\n", "\r\n") (10) ========== replace multiple characters, dna 3.20 3.25 98.5 dna.replace("ATC", "ATT") (10) ========== replace single character 0.17 0.24 73.0 "This is a test".replace(" ", "\t") (1000) ========== replace single character, big string 0.62 0.88 69.7 "...text.with.2000.lines...replace("\n", " ") (10) ========== replace/remove multiple characters 0.25 0.32 78.3 "When shall we three meet again?".replace("ee", "") (1000) ========== split 1 whitespace 0.10 0.13 78.9 ("Here are some words. "2).partition(" ") (1000) 0.08 0.11 76.8 ("Here are some words. "2).rpartition(" ") (1000) 0.23 0.25 91.7 ("Here are some words. "2).rsplit(None, 1) (1000) 0.23 0.26 87.1 ("Here are some words. "2).split(None, 1) (1000) ========== split 2000 newlines 1.60 1.75 91.7 "...text...".rsplit("\n") (10) 1.56 1.65 94.3 "...text...".split("\n") (10) 1.78 2.04 87.0 "...text...".splitlines() (10) ========== split newlines 0.27 0.29 92.6 "this\nis\na\ntest\n".rsplit("\n") (1000) 0.27 0.29 94.2 "this\nis\na\ntest\n".split("\n") (1000) 0.26 0.29 90.4 "this\nis\na\ntest\n".splitlines() (1000) ========== split on multicharacter separator (dna) 2.09 1.92 108.5 dna.rsplit("ACTAT") (10) 2.56 2.64 96.9 dna.split("ACTAT") (10) ========== split on multicharacter separator (small) 0.72 0.89 81.1 "this--is--a--test--of--the--emergency--broadcast--system".rsplit("--") (1000) 0.75 0.65 114.5 "this--is--a--test--of--the--emergency--broadcast--system".split("--") (1000) ========== split whitespace (huge) 1.50 1.73 86.3 human_text.rsplit() (10) 2.25 2.68 83.8 human_text.split() (10) ========== split whitespace (small) 0.42 0.51 82.0 ("Here are some words. "2).rsplit() (1000) 0.41 0.48 86.7 ("Here are some words. "2).split() (1000) ========== startswith multiple characters 0.16 0.18 88.9 "Andrew".startswith("Andrew") (1000) ========== startswith multiple characters - not! 0.19 0.17 112.0 "Andrew".startswith("Anders") (1000) ========== startswith single character 0.16 0.18 88.2 "Andrew".startswith("A") (1000) ========== strip terminal newline 0.07 0.16 45.5 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (1000) 0.05 0.07 79.2 "\nHello!".rstrip() (1000) 0.05 0.07 76.5 "Hello!\n".rstrip() (1000) 0.06 0.07 80.9 "\nHello!\n".strip() (1000) 0.06 0.07 80.7 "\nHello!".strip() (1000) 0.05 0.07 77.4 "Hello!\n".strip() (1000) ========== strip terminal spaces and tabs 0.06 0.08 77.6 "\t \tHello".rstrip() (1000) 0.06 0.07 81.8 "Hello\t \t".rstrip() (1000) 0.04 0.05 77.5 "Hello\t \t".strip() (1000) ========== tab split 0.47 0.50 94.5 GFF3_example.rsplit("\t", 8) (1000) 0.43 0.47 91.3 GFF3_example.rsplit("\t") (1000) 0.38 0.43 88.7 GFF3_example.split("\t", 8) (1000) 0.40 0.46 87.4 GFF3_example.split("\t") (1000) 157.65 160.53 98.2 TOTAL
msg156910 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-03-27 11:29
Compare stringio total: 160.84 (unpatched) vs 160.53 (patched). I don't see any difference in the benchmarks results. The small differnces are just the noise of the benchmark.
msg156930 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2012-03-27 14:43
-1. Using packed structures may violate all kinds of expectations in extension modules. I consider it important that the data block of a string is well-aligned.
msg157149 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2012-03-30 21:26
Looks like this should be closed rejected?
msg157150 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-03-30 21:36
> I consider it important that the data block of a string is well-aligned. I suppose that it doesn't matter for latin1, but it can be a problem for UCS-2 and UCS-4. There are more drawbacks than advantages, so I agree to close this issue. And let's focus on enabling optimizations based on memory alignement like #14419 :-)

History
Date	User	Action	Args
2022-04-11 14:57:28	admin	set	github: 58630
2012-03-30 21:36:51	vstinner	set	status: open -> closed resolution: wont fix messages: + msg157150
2012-03-30 21:26:09	r.david.murray	set	type: enhancement messages: + msg157149 nosy: + r.david.murray
2012-03-30 16:49:31	jcea	set	nosy: + jcea
2012-03-27 14:43:17	loewis	set	messages: + msg156930
2012-03-27 11:29:43	vstinner	set	messages: + msg156910
2012-03-27 11:23:02	vstinner	set	messages: + msg156908
2012-03-27 11:14:17	vstinner	create