Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for int(string, base) wrong answers (take 2) #42513

Closed
alanmcintyre mannequin opened this issue Oct 24, 2005 · 8 comments
Closed

Fix for int(string, base) wrong answers (take 2) #42513

alanmcintyre mannequin opened this issue Oct 24, 2005 · 8 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@alanmcintyre
Copy link
Mannequin

alanmcintyre mannequin commented Oct 24, 2005

BPO 1335972
Nosy @tim-one, @birkenfeld
Files
  • python-mystrtoul3.tgz: Fix for int(string, base) errors and added tests
  • python-mystrtoul4.tgz: Corrected patch to fix problem with literals
  • python-mystrtoul5.tgz: Cleaned up digit lookup vector & added tests for base 10 and base 2**x
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2006-05-23.18:49:07.000>
    created_at = <Date 2005-10-24.05:33:10.000>
    labels = ['interpreter-core']
    title = 'Fix for int(string, base) wrong answers (take 2)'
    updated_at = <Date 2006-05-23.18:49:07.000>
    user = 'https://bugs.python.org/alanmcintyre'

    bugs.python.org fields:

    activity = <Date 2006-05-23.18:49:07.000>
    actor = 'tim.peters'
    assignee = 'none'
    closed = True
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2005-10-24.05:33:10.000>
    creator = 'alanmcintyre'
    dependencies = []
    files = ['6833', '6834', '6835']
    hgrepos = []
    issue_num = 1335972
    keywords = ['patch']
    message_count = 8.0
    messages = ['48895', '48896', '48897', '48898', '48899', '48900', '48901', '48902']
    nosy_count = 4.0
    nosy_names = ['tim.peters', 'georg.brandl', 'alanmcintyre', 'funny_falcon']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1335972'
    versions = []

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Oct 24, 2005

    This incorporates patch bpo-1334979, adds test cases for
    all the cases listed in bug bpo-1334662, and adds test
    cases for evaluation of 2**32+1. There seem to be some
    minor speed improvements (simplistic stats shown
    below). Some simple performance test scripts have been
    included in the attached file as well.

    A lookup table was added for the maximum number of
    digits that can never overflow on a 32-bit ulong for
    each base. Overflow is only checked when this limit is
    exceeded by 1, and once the input string is determined
    to be too long (2 over the limit), the evaluation is
    halted and an overflow indication is returned. This
    appears to help reduce the evaluation time for very
    long strings (no time is wasted trying to evaluate all
    of it into a 32-bit ulong).

    Evaluation of each character has also been replaced by
    a lookup table. I'm not certain of the amount of speed
    benefit obtained from this; I added it early on and
    haven't had time to go back and test. It may be that
    it's not worth the extra static table.

    Baseline Python from CVS:
    alan@tarantula:/python/dist/src# ./python -m timeit
    'int("9")'
    100000 loops, best of 3: 4 usec per loop
    alan@tarantula:
    /python/dist/src# ./python -m timeit
    'int("999999999")'
    100000 loops, best of 3: 5.49 usec per loop
    alan@tarantula:/python/dist/src# ./python -m timeit
    'int("999999999999")'
    100000 loops, best of 3: 11.8 usec per loop
    alan@tarantula:
    /python/dist/src# ./python -m timeit
    'int("999999999999999")'
    100000 loops, best of 3: 13.4 usec per loop
    alan@tarantula:~/python/dist/src# ./python -m timeit
    'int("1"*600)'
    1000 loops, best of 3: 997 usec per loop

    Modified:
    alan@tarantula:/python_testint/dist/src# ./python -m
    timeit 'int("9")'
    100000 loops, best of 3: 3.63 usec per loop
    alan@tarantula:
    /python_testint/dist/src# ./python -m
    timeit 'int("999999999")'
    100000 loops, best of 3: 3.93 usec per loop
    alan@tarantula:/python_testint/dist/src# ./python -m
    timeit 'int("999999999999")'
    100000 loops, best of 3: 9.79 usec per loop
    alan@tarantula:
    /python_testint/dist/src# ./python -m
    timeit 'int("999999999999999")'
    100000 loops, best of 3: 11 usec per loop
    alan@tarantula:~/python_testint/dist/src# ./python -m
    timeit 'int("1"*600)'
    1000 loops, best of 3: 905 usec per loop

    10.2% faster for 1-digit int
    39.7% faster for 9-digit int
    20.5% faster for 12-digit int
    21.8% faster for 15-digit int
    10.2% faster for 600-digit int

    Test program that takes 750k ints from [0, 2**32)
    through stdin:
    Baseline: 8.114 sec (best of 5 consecutive runs)
    Modified: 6.774 sec (best of 5 consecutive runs)

    19.8% faster

    NOTE: This patch causes new errors in test_array and
    test_compile, but it seems that these *should* be
    failing given the input string for long(), unless I'm
    missing something:

    ======================================================================
    ERROR: test_repr (main.FloatTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "Lib/test/test_array.py", line 187, in test_repr
        self.assertEqual(a, eval(repr(a), {"array":
    array.array}))
    ValueError: invalid literal for long(): 10000000000.0
     

    ======================================================================
    ERROR: test_repr (main.DoubleTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "Lib/test/test_array.py", line 187, in test_repr
        self.assertEqual(a, eval(repr(a), {"array":
    array.array}))
    ValueError: invalid literal for long(): 10000000000.0
     

    test test_compile crashed -- exceptions.ValueError:
    invalid literal for long():
    90000000000000.

    @alanmcintyre alanmcintyre mannequin closed this as completed Oct 24, 2005
    @alanmcintyre alanmcintyre mannequin closed this as completed Oct 24, 2005
    @alanmcintyre alanmcintyre mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Oct 24, 2005
    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Oct 24, 2005

    Logged In: YES
    user_id=1115903

    I forgot to add that these results were obtained on a PIIIm
    833MHz running Linux 2.4.2, GCC 3.2.2, with the Python 2.5a0
    CVS source from about 8pm EST Oct 23, 2005.

    @funnyfalcon
    Copy link
    Mannequin

    funnyfalcon mannequin commented Oct 24, 2005

    Logged In: YES
    user_id=1290388

    Instead of:
    overflowed:

    /* spool through remaining characters \*/
    
    while ((c = Py_CHARMASK(*str)) != '\\0')
    
    	str ++;
    
    Shoold be
    	while ((c = Py_CHARMASK(*str)) != '\0') {
    		c = digitlookup[c];
    	if (c \< 0 || c \>= base) /* non-"digit" character \*/
    
    		break;
    
    	str++;
    }
    

    And why not
    static int digitlookup[] = {

    37, 37, 37 ......
    

    };
    and
    if (c >= base) break;

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Oct 24, 2005

    Logged In: YES
    user_id=1115903

    Thanks, funny_falcon - that corrected the problem with the
    literals. I also included the change to the digit lookup
    table.

    The new patch is attached as python-mystrtoul4.tgz; it
    passes all tests now on my machine.

    @tim-one
    Copy link
    Member

    tim-one commented Nov 1, 2005

    Logged In: YES
    user_id=31435

    This looks pretty good to me -- talk about every trick in the
    book <wink>.

    Note that the digitlimit vector is too cautious for bases that
    are powers of 2. For example, it's obvious that any string of
    32 bits can't overflow an unsigned long, but the table cuts
    base 2 off at 31 instead. The formula should use log(2**32,
    base) instead:

    "N digits can't overflow" iff
    base**N-1 < 2**32 iff
    base**N < 2**32+1
    base**N <= 2**32 iff
    N <= log(2**32, base)

    Assuming exact calculation of log(2**32, base) then
    (dubious, really), the floor of that is exactly the maximum
    safe N.

    The power-of-2 bases, and base 10, should be added to the
    tests. We really want to check that _all_ supported bases
    work, right?

    @alanmcintyre
    Copy link
    Mannequin Author

    alanmcintyre mannequin commented Dec 21, 2005

    Logged In: YES
    user_id=1115903

    I cleaned up the digitlimit vector and test cases now
    include all bases on [2, 36]. Uploaded as
    python-mystrtoul5.tgz.

    @birkenfeld
    Copy link
    Member

    Logged In: YES
    user_id=1188172

    Tim, if it looks good can it be applied?

    @tim-one
    Copy link
    Member

    tim-one commented May 23, 2006

    Logged In: YES
    user_id=31435

    Thank you, Alan! A variant of the patch was checked in as
    revision 46133 for Python 2.5, affecting

    Lib/test/test_builtin.py
    Misc/ACKS
    Misc/NEWS
    Python/mystrtoul.c

    For future reference, note that C doesn't define whether an
    unqualified "char" is signed or unsigned, and your
    assumption that it was signed doesn't actually work
    everywhere ;-) Fixing that was the only real pain remaining
    here: thank you for the careful work and testing!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants