New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pack PyASCIIObject fields to reduce memory consumption of pure ASCII strings #58630
Comments
It is possible to reduce PyASCIIObject.state to 8 bits instead of 32, move it to the end (exchange wstr and state) of the structure and pack the structure. As a result, the structure size is reduced by 3 bytes (state type changes from int to char). I expect a low or not overhead on performances because only PyASCIIObject.state field is affected and this field size is 8 bits. See also the issue bpo-14419 which relies on memory alignment (of the ASCII string data) to optimize the ASCII decoder. If I understand correctly, my patch disables the possibility of this optimization. -- Example on Linux 32 bits: $ cat x.c
#include <Python.h>
int main()
{
printf("sizeof(PyASCIIObject)=%u bytes\n", sizeof(PyASCIIObject));
printf("sizeof(PyCompactUnicodeObject)=%u bytes\n", sizeof(PyCompactUnicodeObject));
printf("sizeof(PyUnicodeObject)=%u bytes\n", sizeof(PyUnicodeObject));
return 0;
} # unpatched # pack the 3 structures -- We might also pack PyCompactUnicodeObject and PyUnicodeObject but it would have a bad impact on performances because utf8_length, utf8, wstr_length and data would not be aligned anymore. |
iobench and stringbench results on unpatched Python: $ ./python Tools/iobench/iobench.py -t
Preparing files...
Python 3.3.0a1+ (default:51016ff7f8c9, Mar 27 2012, 13:19:52)
[GCC 4.6.1]
Unicode: PEP 393
Linux-3.0.0-16-generic-pae-i686-with-debian-wheezy-sid
Text unit = one character (utf8-decoded) ** Text input ** [ 400KB ] read one unit at a time... 5.4 MB/s [ 20KB ] read whole contents at once... 315 MB/s [ 400KB ] seek forward one unit at a time... 0.304 MB/s ** Text append ** [ 20KB ] write one unit at a time... 3.05 MB/s ** Text overwrite ** [ 20KB ] modify one unit at a time... 1.18 MB/s $ ./python stringbench/stringbench.py
stringbench v2.0
3.3.0a1+ (default:51016ff7f8c9, Mar 27 2012, 13:19:52)
[GCC 4.6.1]
2012-03-27 13:21:01.217823
bytes unicode
(in ms) (in ms) % comment
========== case conversion -- dense
0.37 0.38 97.9 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower() (*1000)
0.38 0.38 99.3 ("where in the world is carmen san deigo?"*10).upper() (*1000)
========== case conversion -- rare
0.38 0.38 99.9 ("Where in the world is Carmen San Deigo?"*10).lower() (*1000)
0.43 0.38 113.6 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper() (*1000)
========== concat 20 strings of words length 4 to 15
1.76 1.69 104.2 s1+s2+s3+s4+...+s20 (*1000)
========== concat two strings
0.08 0.07 107.7 "Andrew"+"Dalke" (*1000)
========== count AACT substrings in DNA example
2.15 2.13 100.7 dna.count("AACT") (*10)
========== count newlines
0.65 0.58 110.8 ...text.with.2000.newlines.count("\n") (*10)
========== early match, single character
0.20 0.19 107.9 ("A"*1000).find("A") (*1000)
0.36 0.05 745.8 "A" in "A"*1000 (*1000)
0.18 0.19 96.4 ("A"*1000).index("A") (*1000)
0.18 0.21 85.5 ("A"*1000).partition("A") (*1000)
0.21 0.20 103.6 ("A"*1000).rfind("A") (*1000)
0.21 0.30 69.8 ("A"*1000).rindex("A") (*1000)
0.37 0.21 171.7 ("A"*1000).rpartition("A") (*1000)
0.38 0.39 98.4 ("A"*1000).rsplit("A", 1) (*1000)
0.37 0.37 100.7 ("A"*1000).split("A", 1) (*1000)
========== early match, two characters
0.20 0.19 107.7 ("AB"*1000).find("AB") (*1000)
0.36 0.05 702.1 "AB" in "AB"*1000 (*1000)
0.18 0.19 96.9 ("AB"*1000).index("AB") (*1000)
0.20 0.24 83.9 ("AB"*1000).partition("AB") (*1000)
0.20 0.20 103.6 ("AB"*1000).rfind("AB") (*1000)
0.20 0.19 102.9 ("AB"*1000).rindex("AB") (*1000)
0.20 0.23 86.7 ("AB"*1000).rpartition("AB") (*1000)
0.39 0.40 97.7 ("AB"*1000).rsplit("AB", 1) (*1000)
0.40 0.42 94.4 ("AB"*1000).split("AB", 1) (*1000)
========== endswith multiple characters
0.17 0.19 92.6 "Andrew".endswith("Andrew") (*1000)
========== endswith multiple characters - not!
0.17 0.18 95.2 "Andrew".endswith("Anders") (*1000)
========== endswith single character
0.17 0.18 92.3 "Andrew".endswith("w") (*1000)
========== formatting a string type with a dict
N/A 0.91 0.0 "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000)
========== join empty string, with 1 character sep
N/A 0.04 0.0 "A".join("") (*100)
========== join empty string, with 5 character sep
N/A 0.04 0.0 "ABCDE".join("") (*100)
========== join list of 100 words, with 1 character sep
1.37 1.71 80.0 "A".join(["Bob"]*100)) (*1000)
========== join list of 100 words, with 5 character sep
1.50 1.86 80.8 "ABCDE".join(["Bob"]*100)) (*1000)
========== join list of 26 characters, with 1 character sep
0.48 0.49 99.6 "A".join(list("ABC..Z")) (*1000)
========== join list of 26 characters, with 5 character sep
0.49 0.54 91.3 "ABCDE".join(list("ABC..Z")) (*1000)
========== join string with 26 characters, with 1 character sep
N/A 1.17 0.0 "A".join("ABC..Z") (*1000)
========== join string with 26 characters, with 5 character sep
N/A 1.22 0.0 "ABCDE".join("ABC..Z") (*1000)
========== late match, 100 characters
8.48 8.46 100.2 s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
4.19 3.50 119.9 s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)
5.30 5.11 103.7 s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)
8.47 8.45 100.2 s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)
8.68 8.68 100.0 s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)
6.36 6.37 99.8 s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)
2.33 2.27 102.4 s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)
6.58 6.58 100.1 s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)
7.34 6.56 111.9 s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)
6.69 7.65 87.5 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
8.47 8.87 95.4 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
========== late match, two characters
1.30 1.26 102.7 ("AB"*300+"C").find("BC") (*1000)
1.30 1.27 102.0 ("AB"*300+"CA").find("CA") (*1000)
1.42 1.10 129.6 "BC" in ("AB"*300+"C") (*1000)
1.20 1.20 100.2 ("AB"*300+"C").index("BC") (*1000)
1.16 1.26 92.3 ("AB"*300+"C").partition("BC") (*1000)
0.95 0.94 101.0 ("C"+"AB"*300).rfind("CA") (*1000)
0.90 0.69 131.2 ("BC"+"AB"*300).rfind("BC") (*1000)
0.94 0.94 100.1 ("C"+"AB"*300).rindex("CA") (*1000)
1.02 0.94 108.6 ("C"+"AB"*300).rpartition("CA") (*1000)
1.12 1.08 103.7 ("C"+"AB"*300).rsplit("CA", 1) (*1000)
1.27 1.38 91.8 ("AB"*300+"C").split("BC", 1) (*1000)
========== no match, single character
0.45 0.41 111.1 ("A"*1000).find("B") (*1000)
0.59 0.29 205.4 "B" in "A"*1000 (*1000)
0.30 0.31 97.4 ("A"*1000).partition("B") (*1000)
0.49 0.48 102.5 ("A"*1000).rfind("B") (*1000)
0.36 0.37 96.5 ("A"*1000).rpartition("B") (*1000)
0.77 0.76 101.4 ("A"*1000).rsplit("B", 1) (*1000)
0.83 0.81 101.6 ("A"*1000).split("B", 1) (*1000)
========== no match, two characters
3.80 3.78 100.6 ("AB"*1000).find("BC") (*1000)
4.08 3.68 111.0 ("AB"*1000).find("CA") (*1000)
3.71 3.40 109.2 "BC" in "AB"*1000 (*1000)
3.44 3.42 100.8 ("AB"*1000).partition("BC") (*1000)
2.56 1.86 137.9 ("AB"*1000).rfind("BC") (*1000)
2.69 2.69 100.2 ("AB"*1000).rfind("CA") (*1000)
2.50 1.84 135.6 ("AB"*1000).rpartition("BC") (*1000)
2.03 1.94 104.7 ("AB"*1000).rsplit("BC", 1) (*1000)
3.27 3.56 91.8 ("AB"*1000).split("BC", 1) (*1000)
========== quick replace multiple character match
0.08 0.08 99.7 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10)
========== quick replace single character match
0.08 0.09 89.5 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10)
========== repeat 1 character 10 times
0.06 0.07 87.0 "A"*10 (*1000)
========== repeat 1 character 1000 times
0.13 0.15 89.3 "A"*1000 (*1000)
========== repeat 5 characters 10 times
0.12 0.09 128.8 "ABCDE"*10 (*1000)
========== repeat 5 characters 1000 times
0.33 0.34 94.8 "ABCDE"*1000 (*1000)
========== replace and expand multiple characters, big string
1.83 2.11 86.4 "...text.with.2000.newlines...replace("\n", "\r\n") (*10)
========== replace multiple characters, dna
3.21 3.23 99.5 dna.replace("ATC", "ATT") (*10)
========== replace single character
0.18 0.25 70.9 "This is a test".replace(" ", "\t") (*1000)
========== replace single character, big string
0.65 0.92 70.1 "...text.with.2000.lines...replace("\n", " ") (*10)
========== replace/remove multiple characters
0.27 0.34 78.7 "When shall we three meet again?".replace("ee", "") (*1000)
========== split 1 whitespace
0.12 0.14 82.7 ("Here are some words. "*2).partition(" ") (*1000)
0.08 0.11 75.9 ("Here are some words. "*2).rpartition(" ") (*1000)
0.23 0.26 87.4 ("Here are some words. "*2).rsplit(None, 1) (*1000)
0.24 0.25 95.9 ("Here are some words. "*2).split(None, 1) (*1000)
========== split 2000 newlines
1.59 1.75 90.8 "...text...".rsplit("\n") (*10)
1.64 1.68 97.5 "...text...".split("\n") (*10)
1.83 2.03 90.1 "...text...".splitlines() (*10)
========== split newlines
0.26 0.29 88.8 "this\nis\na\ntest\n".rsplit("\n") (*1000)
0.27 0.29 92.2 "this\nis\na\ntest\n".split("\n") (*1000)
0.26 0.30 85.8 "this\nis\na\ntest\n".splitlines() (*1000)
========== split on multicharacter separator (dna)
2.18 1.86 117.5 dna.rsplit("ACTAT") (*10)
2.53 2.48 102.0 dna.split("ACTAT") (*10)
========== split on multicharacter separator (small)
0.53 0.59 88.8 "this--is--a--test--of--the--emergency--broadcast--system".rsplit("--") (*1000)
0.59 0.57 102.6 "this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000)
========== split whitespace (huge)
1.50 1.73 86.9 human_text.rsplit() (*10)
1.49 1.75 85.5 human_text.split() (*10)
========== split whitespace (small)
0.43 0.50 87.0 ("Here are some words. "*2).rsplit() (*1000)
0.40 0.50 79.4 ("Here are some words. "*2).split() (*1000)
========== startswith multiple characters
0.17 0.18 92.0 "Andrew".startswith("Andrew") (*1000)
========== startswith multiple characters - not!
0.17 0.17 99.5 "Andrew".startswith("Anders") (*1000)
========== startswith single character
0.17 0.18 94.0 "Andrew".startswith("A") (*1000)
========== strip terminal newline
0.07 0.15 46.9 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000)
0.06 0.07 78.1 "\nHello!".rstrip() (*1000)
0.05 0.13 42.1 "Hello!\n".rstrip() (*1000)
0.06 0.07 77.1 "\nHello!\n".strip() (*1000)
0.06 0.07 77.6 "\nHello!".strip() (*1000)
0.05 0.07 75.0 "Hello!\n".strip() (*1000)
========== strip terminal spaces and tabs
0.06 0.08 74.2 "\t \tHello".rstrip() (*1000)
0.06 0.07 79.4 "Hello\t \t".rstrip() (*1000)
0.04 0.05 87.1 "Hello\t \t".strip() (*1000)
========== tab split
0.44 0.51 87.2 GFF3_example.rsplit("\t", 8) (*1000)
0.42 0.47 89.9 GFF3_example.rsplit("\t") (*1000)
0.39 0.44 88.7 GFF3_example.split("\t", 8) (*1000)
0.41 0.47 86.1 GFF3_example.split("\t") (*1000)
158.46 160.84 98.5 TOTAL iobench and stringbench results on patched Python (pack the 3 structures): $ ./python Tools/iobench/iobench.py -t
Preparing files...
Python 3.3.0a1+ (default:51016ff7f8c9+, Mar 27 2012, 13:11:28)
[GCC 4.6.1]
Unicode: PEP 393
Linux-3.0.0-16-generic-pae-i686-with-debian-wheezy-sid
Text unit = one character (utf8-decoded) ** Text input ** [ 400KB ] read one unit at a time... 5.4 MB/s [ 20KB ] read whole contents at once... 322 MB/s [ 400KB ] seek forward one unit at a time... 0.32 MB/s ** Text append ** [ 20KB ] write one unit at a time... 2.99 MB/s ** Text overwrite ** [ 20KB ] modify one unit at a time... 1.16 MB/s $ ./python stringbench/stringbench.py
stringbench v2.0
3.3.0a1+ (default:51016ff7f8c9+, Mar 27 2012, 13:11:28)
[GCC 4.6.1]
2012-03-27 13:17:42.363789
bytes unicode
(in ms) (in ms) % comment
========== case conversion -- dense
0.37 0.38 98.6 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower() (*1000)
0.37 0.38 98.4 ("where in the world is carmen san deigo?"*10).upper() (*1000)
========== case conversion -- rare
0.37 0.38 98.6 ("Where in the world is Carmen San Deigo?"*10).lower() (*1000)
0.37 0.38 98.4 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper() (*1000)
========== concat 20 strings of words length 4 to 15
1.86 1.85 100.9 s1+s2+s3+s4+...+s20 (*1000)
========== concat two strings
0.08 0.07 108.0 "Andrew"+"Dalke" (*1000)
========== count AACT substrings in DNA example
2.16 2.12 101.8 dna.count("AACT") (*10)
========== count newlines
0.59 0.58 101.3 ...text.with.2000.newlines.count("\n") (*10)
========== early match, single character
0.18 0.17 103.7 ("A"*1000).find("A") (*1000)
0.36 0.05 775.5 "A" in "A"*1000 (*1000)
0.17 0.17 102.0 ("A"*1000).index("A") (*1000)
0.17 0.20 84.7 ("A"*1000).partition("A") (*1000)
0.19 0.19 102.2 ("A"*1000).rfind("A") (*1000)
0.19 0.38 50.7 ("A"*1000).rindex("A") (*1000)
0.18 0.20 90.0 ("A"*1000).rpartition("A") (*1000)
0.59 0.36 166.9 ("A"*1000).rsplit("A", 1) (*1000)
0.34 0.36 93.5 ("A"*1000).split("A", 1) (*1000)
========== early match, two characters
0.18 0.19 95.8 ("AB"*1000).find("AB") (*1000)
0.44 0.05 891.0 "AB" in "AB"*1000 (*1000)
0.23 0.31 73.4 ("AB"*1000).index("AB") (*1000)
0.22 0.31 70.7 ("AB"*1000).partition("AB") (*1000)
0.19 0.19 101.2 ("AB"*1000).rfind("AB") (*1000)
0.19 0.19 102.0 ("AB"*1000).rindex("AB") (*1000)
0.17 0.21 78.7 ("AB"*1000).rpartition("AB") (*1000)
0.35 0.38 93.0 ("AB"*1000).rsplit("AB", 1) (*1000)
0.39 0.42 93.0 ("AB"*1000).split("AB", 1) (*1000)
========== endswith multiple characters
0.16 0.17 93.0 "Andrew".endswith("Andrew") (*1000)
========== endswith multiple characters - not!
0.16 0.16 101.4 "Andrew".endswith("Anders") (*1000)
========== endswith single character
0.16 0.17 93.7 "Andrew".endswith("w") (*1000)
========== formatting a string type with a dict
N/A 0.86 0.0 "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000)
========== join empty string, with 1 character sep
N/A 0.04 0.0 "A".join("") (*100)
========== join empty string, with 5 character sep
N/A 0.04 0.0 "ABCDE".join("") (*100)
========== join list of 100 words, with 1 character sep
1.42 1.74 81.3 "A".join(["Bob"]*100)) (*1000)
========== join list of 100 words, with 5 character sep
1.62 1.95 83.3 "ABCDE".join(["Bob"]*100)) (*1000)
========== join list of 26 characters, with 1 character sep
0.51 0.57 89.7 "A".join(list("ABC..Z")) (*1000)
========== join list of 26 characters, with 5 character sep
0.58 0.53 108.1 "ABCDE".join(list("ABC..Z")) (*1000)
========== join string with 26 characters, with 1 character sep
N/A 1.30 0.0 "A".join("ABC..Z") (*1000)
========== join string with 26 characters, with 5 character sep
N/A 1.22 0.0 "ABCDE".join("ABC..Z") (*1000)
========== late match, 100 characters
8.50 8.45 100.6 s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
3.70 3.46 107.0 s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)
5.11 5.08 100.6 s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)
8.62 8.47 101.7 s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)
8.80 8.67 101.5 s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)
6.39 6.46 99.0 s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)
2.31 2.18 105.9 s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)
6.41 6.35 100.9 s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)
7.41 6.56 112.9 s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)
6.59 6.59 100.0 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
8.00 8.69 92.0 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
========== late match, two characters
1.20 1.21 99.6 ("AB"*300+"C").find("BC") (*1000)
1.29 1.25 103.1 ("AB"*300+"CA").find("CA") (*1000)
1.41 1.07 130.9 "BC" in ("AB"*300+"C") (*1000)
1.20 1.21 99.3 ("AB"*300+"C").index("BC") (*1000)
1.17 1.20 97.5 ("AB"*300+"C").partition("BC") (*1000)
0.95 0.93 101.4 ("C"+"AB"*300).rfind("CA") (*1000)
0.90 0.69 129.3 ("BC"+"AB"*300).rfind("BC") (*1000)
0.95 0.94 101.2 ("C"+"AB"*300).rindex("CA") (*1000)
1.01 0.94 106.8 ("C"+"AB"*300).rpartition("CA") (*1000)
1.11 1.10 101.5 ("C"+"AB"*300).rsplit("CA", 1) (*1000)
1.28 1.37 93.6 ("AB"*300+"C").split("BC", 1) (*1000)
========== no match, single character
0.41 0.40 101.2 ("A"*1000).find("B") (*1000)
0.59 0.29 203.8 "B" in "A"*1000 (*1000)
0.29 0.30 95.7 ("A"*1000).partition("B") (*1000)
0.49 0.48 101.4 ("A"*1000).rfind("B") (*1000)
0.37 0.38 97.3 ("A"*1000).rpartition("B") (*1000)
0.76 0.75 101.1 ("A"*1000).rsplit("B", 1) (*1000)
0.76 0.75 100.9 ("A"*1000).split("B", 1) (*1000)
========== no match, two characters
3.53 3.52 100.2 ("AB"*1000).find("BC") (*1000)
3.92 3.67 106.9 ("AB"*1000).find("CA") (*1000)
3.71 3.39 109.6 "BC" in "AB"*1000 (*1000)
3.40 3.42 99.5 ("AB"*1000).partition("BC") (*1000)
2.55 1.90 134.2 ("AB"*1000).rfind("BC") (*1000)
2.69 2.68 100.1 ("AB"*1000).rfind("CA") (*1000)
2.43 1.81 133.9 ("AB"*1000).rpartition("BC") (*1000)
2.02 1.92 104.8 ("AB"*1000).rsplit("BC", 1) (*1000)
3.27 3.54 92.4 ("AB"*1000).split("BC", 1) (*1000)
========== quick replace multiple character match
0.09 0.08 107.7 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10)
========== quick replace single character match
0.09 0.08 108.7 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10)
========== repeat 1 character 10 times
0.06 0.07 87.5 "A"*10 (*1000)
========== repeat 1 character 1000 times
0.16 0.12 135.0 "A"*1000 (*1000)
========== repeat 5 characters 10 times
0.11 0.10 104.9 "ABCDE"*10 (*1000)
========== repeat 5 characters 1000 times
0.35 0.37 93.7 "ABCDE"*1000 (*1000)
========== replace and expand multiple characters, big string
1.78 2.04 87.3 "...text.with.2000.newlines...replace("\n", "\r\n") (*10)
========== replace multiple characters, dna
3.20 3.25 98.5 dna.replace("ATC", "ATT") (*10)
========== replace single character
0.17 0.24 73.0 "This is a test".replace(" ", "\t") (*1000)
========== replace single character, big string
0.62 0.88 69.7 "...text.with.2000.lines...replace("\n", " ") (*10)
========== replace/remove multiple characters
0.25 0.32 78.3 "When shall we three meet again?".replace("ee", "") (*1000)
========== split 1 whitespace
0.10 0.13 78.9 ("Here are some words. "*2).partition(" ") (*1000)
0.08 0.11 76.8 ("Here are some words. "*2).rpartition(" ") (*1000)
0.23 0.25 91.7 ("Here are some words. "*2).rsplit(None, 1) (*1000)
0.23 0.26 87.1 ("Here are some words. "*2).split(None, 1) (*1000)
========== split 2000 newlines
1.60 1.75 91.7 "...text...".rsplit("\n") (*10)
1.56 1.65 94.3 "...text...".split("\n") (*10)
1.78 2.04 87.0 "...text...".splitlines() (*10)
========== split newlines
0.27 0.29 92.6 "this\nis\na\ntest\n".rsplit("\n") (*1000)
0.27 0.29 94.2 "this\nis\na\ntest\n".split("\n") (*1000)
0.26 0.29 90.4 "this\nis\na\ntest\n".splitlines() (*1000)
========== split on multicharacter separator (dna)
2.09 1.92 108.5 dna.rsplit("ACTAT") (*10)
2.56 2.64 96.9 dna.split("ACTAT") (*10)
========== split on multicharacter separator (small)
0.72 0.89 81.1 "this--is--a--test--of--the--emergency--broadcast--system".rsplit("--") (*1000)
0.75 0.65 114.5 "this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000)
========== split whitespace (huge)
1.50 1.73 86.3 human_text.rsplit() (*10)
2.25 2.68 83.8 human_text.split() (*10)
========== split whitespace (small)
0.42 0.51 82.0 ("Here are some words. "*2).rsplit() (*1000)
0.41 0.48 86.7 ("Here are some words. "*2).split() (*1000)
========== startswith multiple characters
0.16 0.18 88.9 "Andrew".startswith("Andrew") (*1000)
========== startswith multiple characters - not!
0.19 0.17 112.0 "Andrew".startswith("Anders") (*1000)
========== startswith single character
0.16 0.18 88.2 "Andrew".startswith("A") (*1000)
========== strip terminal newline
0.07 0.16 45.5 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000)
0.05 0.07 79.2 "\nHello!".rstrip() (*1000)
0.05 0.07 76.5 "Hello!\n".rstrip() (*1000)
0.06 0.07 80.9 "\nHello!\n".strip() (*1000)
0.06 0.07 80.7 "\nHello!".strip() (*1000)
0.05 0.07 77.4 "Hello!\n".strip() (*1000)
========== strip terminal spaces and tabs
0.06 0.08 77.6 "\t \tHello".rstrip() (*1000)
0.06 0.07 81.8 "Hello\t \t".rstrip() (*1000)
0.04 0.05 77.5 "Hello\t \t".strip() (*1000)
========== tab split
0.47 0.50 94.5 GFF3_example.rsplit("\t", 8) (*1000)
0.43 0.47 91.3 GFF3_example.rsplit("\t") (*1000)
0.38 0.43 88.7 GFF3_example.split("\t", 8) (*1000)
0.40 0.46 87.4 GFF3_example.split("\t") (*1000)
157.65 160.53 98.2 TOTAL |
Compare stringio total: 160.84 (unpatched) vs 160.53 (patched). I don't see any difference in the benchmarks results. The small differnces are just the noise of the benchmark. |
-1. Using packed structures may violate all kinds of expectations in extension modules. I consider it important that the data block of a string is well-aligned. |
Looks like this should be closed rejected? |
I suppose that it doesn't matter for latin1, but it can be a problem for UCS-2 and UCS-4. There are more drawbacks than advantages, so I agree to close this issue. And let's focus on enabling optimizations based on memory alignement like bpo-14419 :-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: