Issue14422
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012-03-27 11:14 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
pack_pyasciiobject.patch | vstinner, 2012-03-27 11:14 | review |
Messages (6) | |||
---|---|---|---|
msg156905 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2012-03-27 11:14 | |
It is possible to reduce PyASCIIObject.state to 8 bits instead of 32, move it to the end (exchange wstr and state) of the structure and pack the structure. As a result, the structure size is reduced by 3 bytes (state type changes from int to char). I expect a low or not overhead on performances because only PyASCIIObject.state field is affected and this field size is 8 bits. See also the issue #14419 which relies on memory alignment (of the ASCII string data) to optimize the ASCII decoder. If I understand correctly, my patch disables the possibility of this optimization. -- Example on Linux 32 bits: $ cat x.c #include <Python.h> int main() { printf("sizeof(PyASCIIObject)=%u bytes\n", sizeof(PyASCIIObject)); printf("sizeof(PyCompactUnicodeObject)=%u bytes\n", sizeof(PyCompactUnicodeObject)); printf("sizeof(PyUnicodeObject)=%u bytes\n", sizeof(PyUnicodeObject)); return 0; } # unpatched $ gcc -I Include/ -I . x.c -o x && ./x sizeof(PyASCIIObject)=24 bytes sizeof(PyCompactUnicodeObject)=36 bytes sizeof(PyUnicodeObject)=40 bytes # pack the 3 structures $ gcc -I Include/ -I . x.c -o x && ./x sizeof(PyASCIIObject)=21 bytes sizeof(PyCompactUnicodeObject)=33 bytes sizeof(PyUnicodeObject)=37 bytes -- We might also pack PyCompactUnicodeObject and PyUnicodeObject but it would have a bad impact on performances because utf8_length, utf8, wstr_length and data would not be aligned anymore. |
|||
msg156908 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2012-03-27 11:23 | |
iobench and stringbench results on unpatched Python: $ ./python Tools/iobench/iobench.py -t Preparing files... Python 3.3.0a1+ (default:51016ff7f8c9, Mar 27 2012, 13:19:52) [GCC 4.6.1] Unicode: PEP 393 Linux-3.0.0-16-generic-pae-i686-with-debian-wheezy-sid Text unit = one character (utf8-decoded) ** Text input ** [ 400KB ] read one unit at a time... 5.4 MB/s [ 400KB ] read 20 units at a time... 68 MB/s [ 400KB ] read one line at a time... 174 MB/s [ 400KB ] read 4096 units at a time... 289 MB/s [ 20KB ] read whole contents at once... 315 MB/s [ 400KB ] read whole contents at once... 332 MB/s [ 10MB ] read whole contents at once... 292 MB/s [ 400KB ] seek forward one unit at a time... 0.304 MB/s [ 400KB ] seek forward 1000 units at a time... 312 MB/s ** Text append ** [ 20KB ] write one unit at a time... 3.05 MB/s [ 400KB ] write 20 units at a time... 43 MB/s [ 400KB ] write 4096 units at a time... 554 MB/s [ 10MB ] write 1e6 units at a time... 450 MB/s ** Text overwrite ** [ 20KB ] modify one unit at a time... 1.18 MB/s [ 400KB ] modify 20 units at a time... 18.9 MB/s [ 400KB ] modify 4096 units at a time... 400 MB/s $ ./python stringbench/stringbench.py stringbench v2.0 3.3.0a1+ (default:51016ff7f8c9, Mar 27 2012, 13:19:52) [GCC 4.6.1] 2012-03-27 13:21:01.217823 bytes unicode (in ms) (in ms) % comment ========== case conversion -- dense 0.37 0.38 97.9 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower() (*1000) 0.38 0.38 99.3 ("where in the world is carmen san deigo?"*10).upper() (*1000) ========== case conversion -- rare 0.38 0.38 99.9 ("Where in the world is Carmen San Deigo?"*10).lower() (*1000) 0.43 0.38 113.6 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper() (*1000) ========== concat 20 strings of words length 4 to 15 1.76 1.69 104.2 s1+s2+s3+s4+...+s20 (*1000) ========== concat two strings 0.08 0.07 107.7 "Andrew"+"Dalke" (*1000) ========== count AACT substrings in DNA example 2.15 2.13 100.7 dna.count("AACT") (*10) ========== count newlines 0.65 0.58 110.8 ...text.with.2000.newlines.count("\n") (*10) ========== early match, single character 0.20 0.19 107.9 ("A"*1000).find("A") (*1000) 0.36 0.05 745.8 "A" in "A"*1000 (*1000) 0.18 0.19 96.4 ("A"*1000).index("A") (*1000) 0.18 0.21 85.5 ("A"*1000).partition("A") (*1000) 0.21 0.20 103.6 ("A"*1000).rfind("A") (*1000) 0.21 0.30 69.8 ("A"*1000).rindex("A") (*1000) 0.37 0.21 171.7 ("A"*1000).rpartition("A") (*1000) 0.38 0.39 98.4 ("A"*1000).rsplit("A", 1) (*1000) 0.37 0.37 100.7 ("A"*1000).split("A", 1) (*1000) ========== early match, two characters 0.20 0.19 107.7 ("AB"*1000).find("AB") (*1000) 0.36 0.05 702.1 "AB" in "AB"*1000 (*1000) 0.18 0.19 96.9 ("AB"*1000).index("AB") (*1000) 0.20 0.24 83.9 ("AB"*1000).partition("AB") (*1000) 0.20 0.20 103.6 ("AB"*1000).rfind("AB") (*1000) 0.20 0.19 102.9 ("AB"*1000).rindex("AB") (*1000) 0.20 0.23 86.7 ("AB"*1000).rpartition("AB") (*1000) 0.39 0.40 97.7 ("AB"*1000).rsplit("AB", 1) (*1000) 0.40 0.42 94.4 ("AB"*1000).split("AB", 1) (*1000) ========== endswith multiple characters 0.17 0.19 92.6 "Andrew".endswith("Andrew") (*1000) ========== endswith multiple characters - not! 0.17 0.18 95.2 "Andrew".endswith("Anders") (*1000) ========== endswith single character 0.17 0.18 92.3 "Andrew".endswith("w") (*1000) ========== formatting a string type with a dict N/A 0.91 0.0 "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000) ========== join empty string, with 1 character sep N/A 0.04 0.0 "A".join("") (*100) ========== join empty string, with 5 character sep N/A 0.04 0.0 "ABCDE".join("") (*100) ========== join list of 100 words, with 1 character sep 1.37 1.71 80.0 "A".join(["Bob"]*100)) (*1000) ========== join list of 100 words, with 5 character sep 1.50 1.86 80.8 "ABCDE".join(["Bob"]*100)) (*1000) ========== join list of 26 characters, with 1 character sep 0.48 0.49 99.6 "A".join(list("ABC..Z")) (*1000) ========== join list of 26 characters, with 5 character sep 0.49 0.54 91.3 "ABCDE".join(list("ABC..Z")) (*1000) ========== join string with 26 characters, with 1 character sep N/A 1.17 0.0 "A".join("ABC..Z") (*1000) ========== join string with 26 characters, with 5 character sep N/A 1.22 0.0 "ABCDE".join("ABC..Z") (*1000) ========== late match, 100 characters 8.48 8.46 100.2 s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100) 4.19 3.50 119.9 s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100) 5.30 5.11 103.7 s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100) 8.47 8.45 100.2 s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100) 8.68 8.68 100.0 s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100) 6.36 6.37 99.8 s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100) 2.33 2.27 102.4 s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100) 6.58 6.58 100.1 s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100) 7.34 6.56 111.9 s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100) 6.69 7.65 87.5 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100) 8.47 8.87 95.4 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100) ========== late match, two characters 1.30 1.26 102.7 ("AB"*300+"C").find("BC") (*1000) 1.30 1.27 102.0 ("AB"*300+"CA").find("CA") (*1000) 1.42 1.10 129.6 "BC" in ("AB"*300+"C") (*1000) 1.20 1.20 100.2 ("AB"*300+"C").index("BC") (*1000) 1.16 1.26 92.3 ("AB"*300+"C").partition("BC") (*1000) 0.95 0.94 101.0 ("C"+"AB"*300).rfind("CA") (*1000) 0.90 0.69 131.2 ("BC"+"AB"*300).rfind("BC") (*1000) 0.94 0.94 100.1 ("C"+"AB"*300).rindex("CA") (*1000) 1.02 0.94 108.6 ("C"+"AB"*300).rpartition("CA") (*1000) 1.12 1.08 103.7 ("C"+"AB"*300).rsplit("CA", 1) (*1000) 1.27 1.38 91.8 ("AB"*300+"C").split("BC", 1) (*1000) ========== no match, single character 0.45 0.41 111.1 ("A"*1000).find("B") (*1000) 0.59 0.29 205.4 "B" in "A"*1000 (*1000) 0.30 0.31 97.4 ("A"*1000).partition("B") (*1000) 0.49 0.48 102.5 ("A"*1000).rfind("B") (*1000) 0.36 0.37 96.5 ("A"*1000).rpartition("B") (*1000) 0.77 0.76 101.4 ("A"*1000).rsplit("B", 1) (*1000) 0.83 0.81 101.6 ("A"*1000).split("B", 1) (*1000) ========== no match, two characters 3.80 3.78 100.6 ("AB"*1000).find("BC") (*1000) 4.08 3.68 111.0 ("AB"*1000).find("CA") (*1000) 3.71 3.40 109.2 "BC" in "AB"*1000 (*1000) 3.44 3.42 100.8 ("AB"*1000).partition("BC") (*1000) 2.56 1.86 137.9 ("AB"*1000).rfind("BC") (*1000) 2.69 2.69 100.2 ("AB"*1000).rfind("CA") (*1000) 2.50 1.84 135.6 ("AB"*1000).rpartition("BC") (*1000) 2.03 1.94 104.7 ("AB"*1000).rsplit("BC", 1) (*1000) 3.27 3.56 91.8 ("AB"*1000).split("BC", 1) (*1000) ========== quick replace multiple character match 0.08 0.08 99.7 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10) ========== quick replace single character match 0.08 0.09 89.5 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10) ========== repeat 1 character 10 times 0.06 0.07 87.0 "A"*10 (*1000) ========== repeat 1 character 1000 times 0.13 0.15 89.3 "A"*1000 (*1000) ========== repeat 5 characters 10 times 0.12 0.09 128.8 "ABCDE"*10 (*1000) ========== repeat 5 characters 1000 times 0.33 0.34 94.8 "ABCDE"*1000 (*1000) ========== replace and expand multiple characters, big string 1.83 2.11 86.4 "...text.with.2000.newlines...replace("\n", "\r\n") (*10) ========== replace multiple characters, dna 3.21 3.23 99.5 dna.replace("ATC", "ATT") (*10) ========== replace single character 0.18 0.25 70.9 "This is a test".replace(" ", "\t") (*1000) ========== replace single character, big string 0.65 0.92 70.1 "...text.with.2000.lines...replace("\n", " ") (*10) ========== replace/remove multiple characters 0.27 0.34 78.7 "When shall we three meet again?".replace("ee", "") (*1000) ========== split 1 whitespace 0.12 0.14 82.7 ("Here are some words. "*2).partition(" ") (*1000) 0.08 0.11 75.9 ("Here are some words. "*2).rpartition(" ") (*1000) 0.23 0.26 87.4 ("Here are some words. "*2).rsplit(None, 1) (*1000) 0.24 0.25 95.9 ("Here are some words. "*2).split(None, 1) (*1000) ========== split 2000 newlines 1.59 1.75 90.8 "...text...".rsplit("\n") (*10) 1.64 1.68 97.5 "...text...".split("\n") (*10) 1.83 2.03 90.1 "...text...".splitlines() (*10) ========== split newlines 0.26 0.29 88.8 "this\nis\na\ntest\n".rsplit("\n") (*1000) 0.27 0.29 92.2 "this\nis\na\ntest\n".split("\n") (*1000) 0.26 0.30 85.8 "this\nis\na\ntest\n".splitlines() (*1000) ========== split on multicharacter separator (dna) 2.18 1.86 117.5 dna.rsplit("ACTAT") (*10) 2.53 2.48 102.0 dna.split("ACTAT") (*10) ========== split on multicharacter separator (small) 0.53 0.59 88.8 "this--is--a--test--of--the--emergency--broadcast--system".rsplit("--") (*1000) 0.59 0.57 102.6 "this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000) ========== split whitespace (huge) 1.50 1.73 86.9 human_text.rsplit() (*10) 1.49 1.75 85.5 human_text.split() (*10) ========== split whitespace (small) 0.43 0.50 87.0 ("Here are some words. "*2).rsplit() (*1000) 0.40 0.50 79.4 ("Here are some words. "*2).split() (*1000) ========== startswith multiple characters 0.17 0.18 92.0 "Andrew".startswith("Andrew") (*1000) ========== startswith multiple characters - not! 0.17 0.17 99.5 "Andrew".startswith("Anders") (*1000) ========== startswith single character 0.17 0.18 94.0 "Andrew".startswith("A") (*1000) ========== strip terminal newline 0.07 0.15 46.9 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000) 0.06 0.07 78.1 "\nHello!".rstrip() (*1000) 0.05 0.13 42.1 "Hello!\n".rstrip() (*1000) 0.06 0.07 77.1 "\nHello!\n".strip() (*1000) 0.06 0.07 77.6 "\nHello!".strip() (*1000) 0.05 0.07 75.0 "Hello!\n".strip() (*1000) ========== strip terminal spaces and tabs 0.06 0.08 74.2 "\t \tHello".rstrip() (*1000) 0.06 0.07 79.4 "Hello\t \t".rstrip() (*1000) 0.04 0.05 87.1 "Hello\t \t".strip() (*1000) ========== tab split 0.44 0.51 87.2 GFF3_example.rsplit("\t", 8) (*1000) 0.42 0.47 89.9 GFF3_example.rsplit("\t") (*1000) 0.39 0.44 88.7 GFF3_example.split("\t", 8) (*1000) 0.41 0.47 86.1 GFF3_example.split("\t") (*1000) 158.46 160.84 98.5 TOTAL ***************** iobench and stringbench results on patched Python (pack the 3 structures): $ ./python Tools/iobench/iobench.py -t Preparing files... Python 3.3.0a1+ (default:51016ff7f8c9+, Mar 27 2012, 13:11:28) [GCC 4.6.1] Unicode: PEP 393 Linux-3.0.0-16-generic-pae-i686-with-debian-wheezy-sid Text unit = one character (utf8-decoded) ** Text input ** [ 400KB ] read one unit at a time... 5.4 MB/s [ 400KB ] read 20 units at a time... 68.5 MB/s [ 400KB ] read one line at a time... 163 MB/s [ 400KB ] read 4096 units at a time... 295 MB/s [ 20KB ] read whole contents at once... 322 MB/s [ 400KB ] read whole contents at once... 336 MB/s [ 10MB ] read whole contents at once... 289 MB/s [ 400KB ] seek forward one unit at a time... 0.32 MB/s [ 400KB ] seek forward 1000 units at a time... 325 MB/s ** Text append ** [ 20KB ] write one unit at a time... 2.99 MB/s [ 400KB ] write 20 units at a time... 44 MB/s [ 400KB ] write 4096 units at a time... 556 MB/s [ 10MB ] write 1e6 units at a time... 456 MB/s ** Text overwrite ** [ 20KB ] modify one unit at a time... 1.16 MB/s [ 400KB ] modify 20 units at a time... 19.5 MB/s [ 400KB ] modify 4096 units at a time... 401 MB/s $ ./python stringbench/stringbench.py stringbench v2.0 3.3.0a1+ (default:51016ff7f8c9+, Mar 27 2012, 13:11:28) [GCC 4.6.1] 2012-03-27 13:17:42.363789 bytes unicode (in ms) (in ms) % comment ========== case conversion -- dense 0.37 0.38 98.6 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower() (*1000) 0.37 0.38 98.4 ("where in the world is carmen san deigo?"*10).upper() (*1000) ========== case conversion -- rare 0.37 0.38 98.6 ("Where in the world is Carmen San Deigo?"*10).lower() (*1000) 0.37 0.38 98.4 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper() (*1000) ========== concat 20 strings of words length 4 to 15 1.86 1.85 100.9 s1+s2+s3+s4+...+s20 (*1000) ========== concat two strings 0.08 0.07 108.0 "Andrew"+"Dalke" (*1000) ========== count AACT substrings in DNA example 2.16 2.12 101.8 dna.count("AACT") (*10) ========== count newlines 0.59 0.58 101.3 ...text.with.2000.newlines.count("\n") (*10) ========== early match, single character 0.18 0.17 103.7 ("A"*1000).find("A") (*1000) 0.36 0.05 775.5 "A" in "A"*1000 (*1000) 0.17 0.17 102.0 ("A"*1000).index("A") (*1000) 0.17 0.20 84.7 ("A"*1000).partition("A") (*1000) 0.19 0.19 102.2 ("A"*1000).rfind("A") (*1000) 0.19 0.38 50.7 ("A"*1000).rindex("A") (*1000) 0.18 0.20 90.0 ("A"*1000).rpartition("A") (*1000) 0.59 0.36 166.9 ("A"*1000).rsplit("A", 1) (*1000) 0.34 0.36 93.5 ("A"*1000).split("A", 1) (*1000) ========== early match, two characters 0.18 0.19 95.8 ("AB"*1000).find("AB") (*1000) 0.44 0.05 891.0 "AB" in "AB"*1000 (*1000) 0.23 0.31 73.4 ("AB"*1000).index("AB") (*1000) 0.22 0.31 70.7 ("AB"*1000).partition("AB") (*1000) 0.19 0.19 101.2 ("AB"*1000).rfind("AB") (*1000) 0.19 0.19 102.0 ("AB"*1000).rindex("AB") (*1000) 0.17 0.21 78.7 ("AB"*1000).rpartition("AB") (*1000) 0.35 0.38 93.0 ("AB"*1000).rsplit("AB", 1) (*1000) 0.39 0.42 93.0 ("AB"*1000).split("AB", 1) (*1000) ========== endswith multiple characters 0.16 0.17 93.0 "Andrew".endswith("Andrew") (*1000) ========== endswith multiple characters - not! 0.16 0.16 101.4 "Andrew".endswith("Anders") (*1000) ========== endswith single character 0.16 0.17 93.7 "Andrew".endswith("w") (*1000) ========== formatting a string type with a dict N/A 0.86 0.0 "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000) ========== join empty string, with 1 character sep N/A 0.04 0.0 "A".join("") (*100) ========== join empty string, with 5 character sep N/A 0.04 0.0 "ABCDE".join("") (*100) ========== join list of 100 words, with 1 character sep 1.42 1.74 81.3 "A".join(["Bob"]*100)) (*1000) ========== join list of 100 words, with 5 character sep 1.62 1.95 83.3 "ABCDE".join(["Bob"]*100)) (*1000) ========== join list of 26 characters, with 1 character sep 0.51 0.57 89.7 "A".join(list("ABC..Z")) (*1000) ========== join list of 26 characters, with 5 character sep 0.58 0.53 108.1 "ABCDE".join(list("ABC..Z")) (*1000) ========== join string with 26 characters, with 1 character sep N/A 1.30 0.0 "A".join("ABC..Z") (*1000) ========== join string with 26 characters, with 5 character sep N/A 1.22 0.0 "ABCDE".join("ABC..Z") (*1000) ========== late match, 100 characters 8.50 8.45 100.6 s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100) 3.70 3.46 107.0 s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100) 5.11 5.08 100.6 s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100) 8.62 8.47 101.7 s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100) 8.80 8.67 101.5 s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100) 6.39 6.46 99.0 s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100) 2.31 2.18 105.9 s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100) 6.41 6.35 100.9 s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100) 7.41 6.56 112.9 s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100) 6.59 6.59 100.0 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100) 8.00 8.69 92.0 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100) ========== late match, two characters 1.20 1.21 99.6 ("AB"*300+"C").find("BC") (*1000) 1.29 1.25 103.1 ("AB"*300+"CA").find("CA") (*1000) 1.41 1.07 130.9 "BC" in ("AB"*300+"C") (*1000) 1.20 1.21 99.3 ("AB"*300+"C").index("BC") (*1000) 1.17 1.20 97.5 ("AB"*300+"C").partition("BC") (*1000) 0.95 0.93 101.4 ("C"+"AB"*300).rfind("CA") (*1000) 0.90 0.69 129.3 ("BC"+"AB"*300).rfind("BC") (*1000) 0.95 0.94 101.2 ("C"+"AB"*300).rindex("CA") (*1000) 1.01 0.94 106.8 ("C"+"AB"*300).rpartition("CA") (*1000) 1.11 1.10 101.5 ("C"+"AB"*300).rsplit("CA", 1) (*1000) 1.28 1.37 93.6 ("AB"*300+"C").split("BC", 1) (*1000) ========== no match, single character 0.41 0.40 101.2 ("A"*1000).find("B") (*1000) 0.59 0.29 203.8 "B" in "A"*1000 (*1000) 0.29 0.30 95.7 ("A"*1000).partition("B") (*1000) 0.49 0.48 101.4 ("A"*1000).rfind("B") (*1000) 0.37 0.38 97.3 ("A"*1000).rpartition("B") (*1000) 0.76 0.75 101.1 ("A"*1000).rsplit("B", 1) (*1000) 0.76 0.75 100.9 ("A"*1000).split("B", 1) (*1000) ========== no match, two characters 3.53 3.52 100.2 ("AB"*1000).find("BC") (*1000) 3.92 3.67 106.9 ("AB"*1000).find("CA") (*1000) 3.71 3.39 109.6 "BC" in "AB"*1000 (*1000) 3.40 3.42 99.5 ("AB"*1000).partition("BC") (*1000) 2.55 1.90 134.2 ("AB"*1000).rfind("BC") (*1000) 2.69 2.68 100.1 ("AB"*1000).rfind("CA") (*1000) 2.43 1.81 133.9 ("AB"*1000).rpartition("BC") (*1000) 2.02 1.92 104.8 ("AB"*1000).rsplit("BC", 1) (*1000) 3.27 3.54 92.4 ("AB"*1000).split("BC", 1) (*1000) ========== quick replace multiple character match 0.09 0.08 107.7 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10) ========== quick replace single character match 0.09 0.08 108.7 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10) ========== repeat 1 character 10 times 0.06 0.07 87.5 "A"*10 (*1000) ========== repeat 1 character 1000 times 0.16 0.12 135.0 "A"*1000 (*1000) ========== repeat 5 characters 10 times 0.11 0.10 104.9 "ABCDE"*10 (*1000) ========== repeat 5 characters 1000 times 0.35 0.37 93.7 "ABCDE"*1000 (*1000) ========== replace and expand multiple characters, big string 1.78 2.04 87.3 "...text.with.2000.newlines...replace("\n", "\r\n") (*10) ========== replace multiple characters, dna 3.20 3.25 98.5 dna.replace("ATC", "ATT") (*10) ========== replace single character 0.17 0.24 73.0 "This is a test".replace(" ", "\t") (*1000) ========== replace single character, big string 0.62 0.88 69.7 "...text.with.2000.lines...replace("\n", " ") (*10) ========== replace/remove multiple characters 0.25 0.32 78.3 "When shall we three meet again?".replace("ee", "") (*1000) ========== split 1 whitespace 0.10 0.13 78.9 ("Here are some words. "*2).partition(" ") (*1000) 0.08 0.11 76.8 ("Here are some words. "*2).rpartition(" ") (*1000) 0.23 0.25 91.7 ("Here are some words. "*2).rsplit(None, 1) (*1000) 0.23 0.26 87.1 ("Here are some words. "*2).split(None, 1) (*1000) ========== split 2000 newlines 1.60 1.75 91.7 "...text...".rsplit("\n") (*10) 1.56 1.65 94.3 "...text...".split("\n") (*10) 1.78 2.04 87.0 "...text...".splitlines() (*10) ========== split newlines 0.27 0.29 92.6 "this\nis\na\ntest\n".rsplit("\n") (*1000) 0.27 0.29 94.2 "this\nis\na\ntest\n".split("\n") (*1000) 0.26 0.29 90.4 "this\nis\na\ntest\n".splitlines() (*1000) ========== split on multicharacter separator (dna) 2.09 1.92 108.5 dna.rsplit("ACTAT") (*10) 2.56 2.64 96.9 dna.split("ACTAT") (*10) ========== split on multicharacter separator (small) 0.72 0.89 81.1 "this--is--a--test--of--the--emergency--broadcast--system".rsplit("--") (*1000) 0.75 0.65 114.5 "this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000) ========== split whitespace (huge) 1.50 1.73 86.3 human_text.rsplit() (*10) 2.25 2.68 83.8 human_text.split() (*10) ========== split whitespace (small) 0.42 0.51 82.0 ("Here are some words. "*2).rsplit() (*1000) 0.41 0.48 86.7 ("Here are some words. "*2).split() (*1000) ========== startswith multiple characters 0.16 0.18 88.9 "Andrew".startswith("Andrew") (*1000) ========== startswith multiple characters - not! 0.19 0.17 112.0 "Andrew".startswith("Anders") (*1000) ========== startswith single character 0.16 0.18 88.2 "Andrew".startswith("A") (*1000) ========== strip terminal newline 0.07 0.16 45.5 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000) 0.05 0.07 79.2 "\nHello!".rstrip() (*1000) 0.05 0.07 76.5 "Hello!\n".rstrip() (*1000) 0.06 0.07 80.9 "\nHello!\n".strip() (*1000) 0.06 0.07 80.7 "\nHello!".strip() (*1000) 0.05 0.07 77.4 "Hello!\n".strip() (*1000) ========== strip terminal spaces and tabs 0.06 0.08 77.6 "\t \tHello".rstrip() (*1000) 0.06 0.07 81.8 "Hello\t \t".rstrip() (*1000) 0.04 0.05 77.5 "Hello\t \t".strip() (*1000) ========== tab split 0.47 0.50 94.5 GFF3_example.rsplit("\t", 8) (*1000) 0.43 0.47 91.3 GFF3_example.rsplit("\t") (*1000) 0.38 0.43 88.7 GFF3_example.split("\t", 8) (*1000) 0.40 0.46 87.4 GFF3_example.split("\t") (*1000) 157.65 160.53 98.2 TOTAL |
|||
msg156910 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2012-03-27 11:29 | |
Compare stringio total: 160.84 (unpatched) vs 160.53 (patched). I don't see any difference in the benchmarks results. The small differnces are just the noise of the benchmark. |
|||
msg156930 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2012-03-27 14:43 | |
-1. Using packed structures may violate all kinds of expectations in extension modules. I consider it important that the data block of a string is well-aligned. |
|||
msg157149 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2012-03-30 21:26 | |
Looks like this should be closed rejected? |
|||
msg157150 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2012-03-30 21:36 | |
> I consider it important that the data block of a string is well-aligned. I suppose that it doesn't matter for latin1, but it can be a problem for UCS-2 and UCS-4. There are more drawbacks than advantages, so I agree to close this issue. And let's focus on enabling optimizations based on memory alignement like #14419 :-) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:28 | admin | set | github: 58630 |
2012-03-30 21:36:51 | vstinner | set | status: open -> closed resolution: wont fix messages: + msg157150 |
2012-03-30 21:26:09 | r.david.murray | set | type: enhancement messages: + msg157149 nosy: + r.david.murray |
2012-03-30 16:49:31 | jcea | set | nosy:
+ jcea |
2012-03-27 14:43:17 | loewis | set | messages: + msg156930 |
2012-03-27 11:29:43 | vstinner | set | messages: + msg156910 |
2012-03-27 11:23:02 | vstinner | set | messages: + msg156908 |
2012-03-27 11:14:17 | vstinner | create |