Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickle: Faster serialization of Unicode strings #59801

Closed
vstinner opened this issue Aug 8, 2012 · 15 comments
Closed

pickle: Faster serialization of Unicode strings #59801

vstinner opened this issue Aug 8, 2012 · 15 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

vstinner commented Aug 8, 2012

BPO 15596
Nosy @jcea, @pitrou, @vstinner, @avassalotti, @serhiy-storchaka
Files
  • pickle_unicode.patch
  • pickleutf8.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-04-07.15:50:10.726>
    created_at = <Date 2012-08-08.22:38:41.790>
    labels = ['library', 'performance']
    title = 'pickle: Faster serialization of Unicode strings'
    updated_at = <Date 2013-04-07.21:34:52.757>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2013-04-07.21:34:52.757>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-04-07.15:50:10.726>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2012-08-08.22:38:41.790>
    creator = 'vstinner'
    dependencies = []
    files = ['26730', '29690']
    hgrepos = []
    issue_num = 15596
    keywords = ['patch']
    message_count = 15.0
    messages = ['167730', '167731', '167796', '167839', '167842', '167847', '167848', '178872', '178934', '186115', '186126', '186139', '186218', '186219', '186247']
    nosy_count = 6.0
    nosy_names = ['jcea', 'pitrou', 'vstinner', 'alexandre.vassalotti', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue15596'
    versions = ['Python 3.4']

    @vstinner
    Copy link
    Member Author

    vstinner commented Aug 8, 2012

    Serialization of Unicode strings in the pickle module is suboptimal, especially for long strings.

    Attached patch optimize the serialization thanks to new properties of Unicode strings (PEP-393):

    • text (protocol 0): avoid any temporary buffer if the string is an ASCII or latin1 string without "\\" or "\n" character; otherwise use a small buffer of 64 KB (instead of two buffer)
    • binary (protocol 1, 2): avoid any temporary buffer if string is an ASCII string or if the string is already available encoded as UTF-8

    The current code for protocol 0 uses raw_unicode_escape() which is really suboptimal: it uses a first buffer to write the escape string, and then a new temporary buffer to store the buffer with the right size (instead of just calling _PyBytes_Resize).

    @vstinner vstinner added stdlib Python modules in the Lib dir performance Performance or resource usage labels Aug 8, 2012
    @vstinner
    Copy link
    Member Author

    vstinner commented Aug 8, 2012

    Oh, I forgot to explain that I initially wrote the patch to fix the following failure on our "bigmem" buildbot.

    http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%20bigmem%203.x/builds/165/steps/test/logs/stdio

    ======================================================================
    ERROR: test_huge_str_32b (test.test_pickle.InMemoryPickleTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/support.py", line 1281, in wrapper
        return f(self, maxsize)
      File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/pickletester.py", line 1267, in test_huge_str_32b
        pickled = self.dumps(data, protocol=proto)
      File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/test_pickle.py", line 49, in dumps
        return pickle.dumps(arg, protocol)
    MemoryError

    @pitrou
    Copy link
    Member

    pitrou commented Aug 9, 2012

    Looks interesting. Can you post benchmark numbers?
    (you can use the pickle tests from http://hg.python.org/benchmarks )

    @vstinner
    Copy link
    Member Author

    vstinner commented Aug 9, 2012

    Here is a benchmark comparing Python 3.3 without and with my patch

    ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../default/python ../fasterpickle/python
    Running fastpickle...
    INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle
    INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle
    Running pickle_dict...
    INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
    INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
    Running pickle_list...
    INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
    INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
    Running slowpickle...
    INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 pickle
    INFO:root:Running ../default/python performance/bm_pickle.py -n 50 pickle

    Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64
    Total CPU cores: 8

    ### fastpickle ###
    Min: 0.530622 -> 0.332841: 1.59x faster
    Avg: 0.539450 -> 0.336833: 1.60x faster
    Significant (t=232.04)
    Stddev: 0.00552 -> 0.00276: 2.0032x smaller
    Timeline: b'http://tinyurl.com/dyu3vap'

    The following not significant results are hidden, use -v to show them:
    pickle_dict, pickle_list, slowpickle.

    @vstinner
    Copy link
    Member Author

    vstinner commented Aug 9, 2012

    For your information, results of benchmark comparing Python 3.2 to 3.3:

    ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../default/python
    Running fastpickle...
    INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle
    INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle
    Running pickle_dict...
    INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
    INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
    Running pickle_list...
    INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
    INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
    Running slowpickle...
    INFO:root:Running ../default/python performance/bm_pickle.py -n 50 pickle
    INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 pickle

    Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64
    Total CPU cores: 8

    ### fastpickle ###
    Min: 0.455842 -> 0.542103: 1.19x slower
    Avg: 0.462334 -> 0.547271: 1.18x slower
    Significant (t=-101.15)
    Stddev: 0.00362 -> 0.00471: 1.3028x larger
    Timeline: b'http://tinyurl.com/btr644x'

    ### pickle_dict ###
    Min: 0.360125 -> 0.345850: 1.04x faster
    Avg: 0.364019 -> 0.348431: 1.04x faster
    Significant (t=30.84)
    Stddev: 0.00308 -> 0.00181: 1.6973x smaller
    Timeline: b'http://tinyurl.com/cd3ashu'

    ### pickle_list ###
    Min: 0.803941 -> 0.584800: 1.37x faster
    Avg: 0.811115 -> 0.589200: 1.38x faster
    Significant (t=455.00)
    Stddev: 0.00261 -> 0.00225: 1.1612x smaller
    Timeline: b'http://tinyurl.com/8u4m2wf'

    ### slowpickle ###
    Min: 0.409008 -> 0.461257: 1.13x slower
    Avg: 0.413668 -> 0.466201: 1.13x slower
    Significant (t=-115.31)
    Stddev: 0.00236 -> 0.00219: 1.0772x smaller
    Timeline: b'http://tinyurl.com/czrg5kf'

    @avassalotti
    Copy link
    Member

    Amazing! Though, it would probably be good idea to benchmarks non-ASCII strings as well.

    @vstinner
    Copy link
    Member Author

    vstinner commented Aug 9, 2012

    Last one: Python 3.2 vs patched Python 3.3.

    ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../fasterpickle/python
    Running fastpickle...
    INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle
    INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle
    Running pickle_dict...
    INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
    INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
    Running pickle_list...
    INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
    INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
    Running slowpickle...
    INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 pickle
    INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 pickle

    Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64
    Total CPU cores: 8

    ### fastpickle ###
    Min: 0.470211 -> 0.322453: 1.46x faster
    Avg: 0.475718 -> 0.328496: 1.45x faster
    Significant (t=205.65)
    Stddev: 0.00317 -> 0.00395: 1.2456x larger
    Timeline: b'http://tinyurl.com/9qpphzp'

    ### pickle_dict ###
    Min: 0.353965 -> 0.347959: 1.02x faster
    Avg: 0.358980 -> 0.350596: 1.02x faster
    Significant (t=10.44)
    Stddev: 0.00545 -> 0.00160: 3.3956x smaller
    Timeline: b'http://tinyurl.com/9pfeqf9'

    ### pickle_list ###
    Min: 0.838222 -> 0.593497: 1.41x faster
    Avg: 0.844636 -> 0.599491: 1.41x faster
    Significant (t=296.53)
    Stddev: 0.00520 -> 0.00267: 1.9521x smaller
    Timeline: b'http://tinyurl.com/9rynvnv'

    ### slowpickle ###
    Min: 0.408205 -> 0.458309: 1.12x slower
    Avg: 0.413738 -> 0.463916: 1.12x slower
    Significant (t=-53.85)
    Stddev: 0.00263 -> 0.00604: 2.3019x larger
    Timeline: b'http://tinyurl.com/coffkbg'

    @vstinner
    Copy link
    Member Author

    vstinner commented Jan 3, 2013

    serhiy: I'm not really motivated to finish the work on this issue (especially "... it would probably be good idea to benchmarks non-ASCII strings as well."). Would you like to work on this?

    @serhiy-storchaka
    Copy link
    Member

    Well, I take care of this. I have the own patch for raw_unicode_escape() optimization, but microbenchmarks don't show any speed up. Maybe your approach will be better.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 5, 2013

    Ping?

    @pitrou
    Copy link
    Member

    pitrou commented Apr 6, 2013

    Since protocol 0 is essentially dead in Python 3, I would like to propose something simpler and safer: only optimize the binary protocols. If noone beats me to it, I'll adapt Victor's patch for that.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 6, 2013

    Here is a new patch. Benchmark:

    ### fastpickle ###
    Min: 0.631457 -> 0.399104: 1.58x faster
    Avg: 0.631868 -> 0.399519: 1.58x faster
    Significant (t=701.85)
    Stddev: 0.00037 -> 0.00064: 1.7604x larger
    Timeline: http://tinyurl.com/c6n8h5g

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 7, 2013

    New changeset 09a84091ae96 by Antoine Pitrou in branch 'default':
    Issue bpo-15596: Faster pickling of unicode strings.
    http://hg.python.org/cpython/rev/09a84091ae96

    @pitrou
    Copy link
    Member

    pitrou commented Apr 7, 2013

    I've applied the review comments and committed the patch. Thank you!

    @pitrou pitrou closed this as completed Apr 7, 2013
    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 7, 2013

    Hi Antoine, I prefer your patch. Great job!

    2013/4/7 Antoine Pitrou <report@bugs.python.org>:

    Antoine Pitrou added the comment:

    I've applied the review comments and committed the patch. Thank you!

    ----------
    resolution: -> fixed
    stage: patch review -> committed/rejected
    status: open -> closed


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue15596\>


    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants