Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Fraction pickling #88320

Closed
skirpichev mannequin opened this issue May 17, 2021 · 8 comments
Closed

Optimize Fraction pickling #88320

skirpichev mannequin opened this issue May 17, 2021 · 8 comments
Assignees
Labels
3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@skirpichev
Copy link
Mannequin

skirpichev mannequin commented May 17, 2021

BPO 44154
Nosy @tim-one, @rhettinger, @skirpichev
PRs
  • bpo-44154: optimize Fraction pickling #26186
  • Files
  • fractions-pickle.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2021-05-17.09:53:44.962>
    created_at = <Date 2021-05-17.03:58:34.076>
    labels = ['3.11', 'library', 'performance']
    title = 'Optimize Fraction pickling'
    updated_at = <Date 2021-05-24.01:36:28.869>
    user = 'https://github.com/skirpichev'

    bugs.python.org fields:

    activity = <Date 2021-05-24.01:36:28.869>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2021-05-17.09:53:44.962>
    closer = 'Sergey.Kirpichev'
    components = ['Library (Lib)']
    creation = <Date 2021-05-17.03:58:34.076>
    creator = 'Sergey.Kirpichev'
    dependencies = []
    files = ['50047']
    hgrepos = []
    issue_num = 44154
    keywords = ['patch']
    message_count = 8.0
    messages = ['393781', '393782', '393783', '393784', '393803', '393988', '394177', '394231']
    nosy_count = 3.0
    nosy_names = ['tim.peters', 'rhettinger', 'Sergey.Kirpichev']
    pr_nums = ['26186']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue44154'
    versions = ['Python 3.11']

    @skirpichev
    Copy link
    Mannequin Author

    skirpichev mannequin commented May 17, 2021

    The current version of the Fraction.__reduce__() method uses str(), which produces bigger dumps, esp. for large components.

    C.f.:
    >>> import random, pickle
    >>> from fractions import Fraction as F
    >>> random.seed(1); a = F(*random.random().as_integer_ratio())
    >>> for proto in range(pickle.HIGHEST_PROTOCOL + 1):
    ...     print(len(pickle.dumps(a, proto)))
    ... 
    71
    70
    71
    71
    77
    77
    >>> b = a**13
    >>> for proto in range(pickle.HIGHEST_PROTOCOL + 1):
    ...     print(len(pickle.dumps(b, proto)))
    ... 
    444
    443
    444
    444
    453
    453
    
    vs the attached patch:
    >>> for proto in range(pickle.HIGHEST_PROTOCOL + 1):
    ...     print(len(pickle.dumps(a, proto)))
    ... 
    71
    68
    49
    49
    59
    59
    >>> for proto in range(pickle.HIGHEST_PROTOCOL + 1):
    ...     print(len(pickle.dumps(b, proto)))
    ... 
    444
    441
    204
    204
    214
    214

    Testing for non-default protocols was also added. Let me know if all this does make sense as a PR.

    @skirpichev skirpichev mannequin added 3.11 only security fixes stdlib Python modules in the Lib dir labels May 17, 2021
    @rhettinger
    Copy link
    Contributor

    Yes, this looks reasonable. Go ahead with a PR.

    @rhettinger rhettinger self-assigned this May 17, 2021
    @rhettinger rhettinger added the performance Performance or resource usage label May 17, 2021
    @rhettinger rhettinger self-assigned this May 17, 2021
    @tim-one
    Copy link
    Member

    tim-one commented May 17, 2021

    Oh yes - please do. It's not just pickle size - going through str() makes (un)pickling quadratic time in both directions if components are large. Pickle the component ints instead, and the more recent pickle protocol(s) can do both directions in linear time instead.

    @skirpichev
    Copy link
    Mannequin Author

    skirpichev mannequin commented May 17, 2021

    Oh yes - please do.

    Ok, I did.

    It's not just pickle size - going through str() makes (un)pickling quadratic time in both directions if components are large.

    Yeah, I noticed speedup too, but size was much more important for may application.

    BTW, the same issue affects some other stdlib modules, ex. in the Decimal() it will be more efficient to use the tuple (sign, digit_tuple, exponent) instead of dumping strings. Maybe more, simple fgrep suggests me also the ipaddress module, but I think here it's ok;-)

    @skirpichev
    Copy link
    Mannequin Author

    skirpichev mannequin commented May 17, 2021

    Not sure why this wasn't closed after pr merging. If this was intentional - let me know and reopen.

    I'm less sure if something like this will work for a Decimal(). Perhaps, if the constructor will accept an integer as the value[1], not just a tuple of digits.

    @skirpichev skirpichev mannequin closed this as completed May 17, 2021
    @rhettinger
    Copy link
    Contributor

    You're right that this won't work for decimal because it takes a string constructor. A fancier reduce might do the trick but it would involve modifying the C code (no fun) as well as the Python code. Also, the conversion from decimal to string and back isn't quadratic, so we don't have the same worries. Lastly, really large fractions happen naturally as they interoperate, but oversized decimals are uncommon.

    @skirpichev
    Copy link
    Mannequin Author

    skirpichev mannequin commented May 22, 2021

    On Thu, May 20, 2021 at 12:03:38AM +0000, Raymond Hettinger wrote:

    Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:
    You're right that this won't work for decimal because it takes a
    string constructor. A fancier reduce might do the trick but it would
    involve modifying the C code (no fun) as well as the Python code.

    Yes, it will be harder. But I think - is possible.

    E.g. with this trivial patch:
    $ git diff
    diff --git a/Lib/_pydecimal.py b/Lib/_pydecimal.py
    index ff23322ed5..473fb86770 100644
    --- a/Lib/_pydecimal.py
    +++ b/Lib/_pydecimal.py
    @@ -627,6 +627,9 @@ def __new__(cls, value="0", context=None):
                     self._exp = value[2]
                     self._is_special = True
                 else:
    +                value = list(value)
    +                if isinstance(value[1], int):
    +                    value[1] = tuple(map(int, str(value[1])))
                     # process and validate the digits in value[1]
                     digits = []
                     for digit in value[1]:
    @@ -3731,7 +3734,7 @@ def shift(self, other, context=None):
    
         # Support for pickling, copy, and deepcopy
         def __reduce__(self):
    -        return (self.__class__, (str(self),))
    +        return (self.__class__, ((self._sign, int(self._int), self._exp),))
         def __copy__(self):
             if type(self) is Decimal:
    Simple test suggests that 2x size difference is possible:
    >>> import pickle
    >>> from test.support.import_helper import import_fresh_module
    >>> P = import_fresh_module('decimal', blocked=['_decimal'])
    >>> P.getcontext().prec = 1000
    >>> d = P.Decimal('101').exp()
    >>> len(pickle.dumps(d))
    1045
    
    vs
    >>> len(pickle.dumps(d))
    468

    with the above diff. (Some size reduction will be even if we
    don't convert back and forth the self._int, due to self._exp size.
    This is a less interesting case, but it's for free! No speed penalty.)

    Also, the conversion from decimal to string and back isn't quadratic,
    so we don't have the same worries.

    Yes, for a speed bonus - we need to do something more clever)

    Lastly, really large fractions happen naturally as they interoperate,
    but oversized decimals are uncommon.

    For financial calculations this, probably, is true. But perfectly
    legal usage of this module - to compute mathematical functions with
    arbitrary-precision (like mpmath does with mpmath.mpf).

    Let me know if it's worth openning an issue with above improvement.

    @rhettinger
    Copy link
    Contributor

    Let me know if it's worth openning an issue with above improvement

    I don't think so.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants