Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use shorter float repr when possible #45921

Closed
noam mannequin opened this issue Dec 10, 2007 · 109 comments
Closed

Use shorter float repr when possible #45921

noam mannequin opened this issue Dec 10, 2007 · 109 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@noam
Copy link
Mannequin

noam mannequin commented Dec 10, 2007

BPO 1580
Nosy @gvanrossum, @tim-one, @nascheme, @rhettinger, @amauryfa, @mdickinson, @pitrou, @ericvsmith, @tiran, @avassalotti, @ericsnowcurrently
Files
  • short_float_repr.diff
  • ieee_dbl.c
  • float.diff: Tentative patch for py3k
  • float2.diff: Improved tentative patch for py3k
  • unnamed
  • unnamed
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-04-17.11:19:21.231>
    created_at = <Date 2007-12-10.19:13:26.972>
    labels = ['interpreter-core', 'type-feature']
    title = 'Use shorter float repr when possible'
    updated_at = <Date 2014-03-07.21:30:36.354>
    user = 'https://bugs.python.org/noam'

    bugs.python.org fields:

    activity = <Date 2014-03-07.21:30:36.354>
    actor = 'eric.snow'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-04-17.11:19:21.231>
    closer = 'mark.dickinson'
    components = ['Interpreter Core']
    creation = <Date 2007-12-10.19:13:26.972>
    creator = 'noam'
    dependencies = []
    files = ['8910', '8920', '10808', '10840', '13176', '13199']
    hgrepos = []
    issue_num = 1580
    keywords = ['patch']
    message_count = 109.0
    messages = ['58359', '58364', '58374', '58394', '58398', '58399', '58401', '58403', '58405', '58407', '58411', '58416', '58419', '58422', '58424', '58427', '58429', '58430', '58432', '58433', '58436', '58443', '58445', '58449', '58452', '58454', '58456', '58457', '58466', '58467', '58473', '58474', '58477', '58478', '58480', '58509', '58510', '58524', '58707', '58710', '58726', '58727', '58728', '58731', '58733', '58736', '58748', '58753', '58787', '58790', '58966', '59053', '59054', '69232', '69235', '69242', '69243', '69368', '69421', '69422', '69429', '69552', '69554', '82708', '82710', '82711', '82718', '82724', '82727', '82728', '82730', '82752', '82774', '82776', '82777', '82801', '82807', '82817', '82832', '82994', '82995', '83049', '83050', '83189', '83199', '83200', '84666', '85690', '85692', '85693', '85696', '85697', '85698', '85705', '85733', '85739', '85741', '85747', '86048', '86069', '86127', '86129', '86131', '86141', '86144', '86146', '86149', '86158', '117723']
    nosy_count = 15.0
    nosy_names = ['gvanrossum', 'tim.peters', 'nascheme', 'rhettinger', 'amaury.forgeotdarc', 'mark.dickinson', 'pitrou', 'eric.smith', 'christian.heimes', 'alexandre.vassalotti', 'noam', 'jaredgrubb', 'preston', 'olau', 'eric.snow']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1580'
    versions = ['Python 3.1']

    @noam
    Copy link
    Mannequin Author

    noam mannequin commented Dec 10, 2007

    The current float repr() always calculates the 17 first digits of the
    decimal representation of a float, and displays all of them (discarding
    trailing zeros). This is a simple method for making sure that
    eval(repr(f)) == f. However, many times it results in a long string,
    where a shorter string would have sufficed. For example, currently
    repr(1.1) is '1.1000000000000001', where obviously '1.1' would have been
    good enough, and much easier to read.

    This patch implements an algorithm for finding the shortest string that
    will evaluate to the right number. It is based on the code from
    http://www.cs.indiana.edu/~burger/fp/index.html, and also on the
    floating-point formatting function of TCL, Tcl_PrintDouble.

    The patch also adds a test case, which takes a long list of floating
    point numbers, created especially for testing binary to decimal
    conversion, and makes sure that eval(repr(f)) == f. See
    floating_points.txt for the source of the list.

    @noam noam mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Dec 10, 2007
    @gvanrossum
    Copy link
    Member

    I like this; but I don't have time for a complete thourough review.
    Maybe Tim can lend a hand?

    If Tim has no time, I propose that if it works correctly without leaks
    on at least Windows, OSX and Linux, we check it in, and worry about more
    review later.

    Crys, can you test this on Windows and see if it needs any project file
    updates? I'll check OSX. Linux works and I see no leaks.

    @tiran
    Copy link
    Member

    tiran commented Dec 10, 2007

    Applied in r59457

    I had to add checks for _M_X64 and _M_IA64 to doubledigits.c. The rest
    was fine on Windows.

    @tiran tiran closed this as completed Dec 10, 2007
    @gvanrossum
    Copy link
    Member

    Noam, perhaps you can help with this? We checked this in but found a
    problem: repr(1e5) suddenly returns '1.0'. Can you think of a cause for
    this?

    @gvanrossum gvanrossum reopened this Dec 11, 2007
    @noam
    Copy link
    Mannequin Author

    noam mannequin commented Dec 11, 2007

    I don't know, for me it works fine, even after downloading a fresh SVN
    copy. On what platform does it happen?

    @tiran
    Copy link
    Member

    tiran commented Dec 11, 2007

    I've disabled the new repr() in trunk and py3k until somebody has sorted
    out the build problems. I've removed doubledigits.c from Makefile.in and
    disabled the code with #ifdef Py_BROKEN_REPR in floatobject.c.

    @tiran tiran removed their assignment Dec 11, 2007
    @gvanrossum
    Copy link
    Member

    On what platform does it happen?

    Linux on x86.

    It seems find on PPC OSX.

    This suggests it could be a byte order bug.

    @noam
    Copy link
    Mannequin Author

    noam mannequin commented Dec 11, 2007

    I also use linux on x86. I think that byte order would cause different
    results (the repr of a random float shouldn't be "1.0".)
    Does the test case run ok? Because if it does, it's really strange.

    @tim-one
    Copy link
    Member

    tim-one commented Dec 11, 2007

    There is nothing you can do to repr() that's sufficient by itself to
    ensure eval(repr(x)) == x.

    The other necessary part is correctly rounded float /input/ routines.

    The 754 standard does not require correctly rounding input or output
    routines. It does require that eval(repr(x)) == x provided that repr(x)
    produce at least 17 significant digits (not necessarily correctly
    rounded, but "good enough" so that eval(repr(x)) == x). That's why
    Python produces 17 significant digits. While there's no guarantee that
    all platform I/O routines are good enough to meet the 754 requirement,
    most do, and what Python's repr() actually does is the strongest that
    /can/ be done assuming no more than full 754 conformance.

    Again, this cannot be improved cross-platform without Python supplying
    both output and input routines. Burger's output routines supply only
    half of what's needed. An efficient way to do the other half (correctly
    rounding input) was first written up by Clinger. Note that neither the
    Burger/Dybvig output code nor the Clinger input code are used much in
    practice anymore; David Gay's code is usually used instead for "go fast"
    reasons:

    http://mail.python.org/pipermail/python-list/2004-July/272167.html

    Clinger also endorses Gay's code:

    ftp://ftp.ccs.neu.edu/pub/people/will/retrospective.pdf

    However, as the Python list link says, Gay's code is "mind-numbingly
    complicated".

    There is no easy cross-platform solution, where "cross-platform" means
    just "limited to 754 platforms".

    @tiran
    Copy link
    Member

    tiran commented Dec 11, 2007

    I've written a small C program for auto config that checks the
    endianness of IEEE doubles. Neil has promised to add something to
    configure.in when I come up with a small C program. It *should* do the
    right thing on a BE platform but I can't test it.

    Tim, does Python run on a non IEEE 754 platform? Should the new repr()
    be ripped out? I hate to break Python for scientists just to silence
    some ignorants and unexperienced newbies.

    @tim-one
    Copy link
    Member

    tim-one commented Dec 11, 2007

    Again, without replacing float input routines too, this is /not/ good
    enough to ensure eval(repr(x)) == x even across just 100%-conforming 754
    platforms.

    It is good enough to ensure (well, assuming the code is 100% correct)
    eval(repr(x)) == x across conforming 754 platforms that go /beyond/ the
    standard by also supplying correctly rounding input routines. Some
    platform C libraries do, some don't. For example, I believe glibc does
    (and builds its I/O code on David Gay's, mentioned before), but that
    Microsoft's does not (but that Microsoft's do meet the 754 standard,
    which-- again --isn't strong enough for "shortest output" to round-trip
    correctly in all cases).

    @noam
    Copy link
    Mannequin Author

    noam mannequin commented Dec 11, 2007

    Oh, this is sad. Now I know why Tcl have implemented also a decimal to
    binary routine.

    Perhaps we can simply use both their routines? If I am not mistaken,
    their only real dependency is on a library which allows arbitrary long
    integers, called tommath, from which they use a few basic functions.
    We can use instead the functions from longobject.c. It will probably
    be somewhat slower, since longobject.c wasn't created to allow
    in-place operations, but I don't think it should be that bad -- we are
    mostly talking about compile time.

    @noam
    Copy link
    Mannequin Author

    noam mannequin commented Dec 11, 2007

    The Tcl code can be fonund here:
    http://tcl.cvs.sourceforge.net/tcl/tcl/generic/tclStrToD.c?view=markup

    What Tim says gives another reason for using that code - it means that
    currently, the compilation of the same source code on two platforms can
    result in a code which does different things.

    Just to make sure - IEEE does require that operations on doubles will do
    the same thing on different platforms, right?

    @tiran
    Copy link
    Member

    tiran commented Dec 11, 2007

    It's really a shame. It was a nice idea ...

    Could we at least use the new formating for str(float) and the display
    of floats? In Python 2.6 floats are not displayed with repr(). They seem
    to use yet another hook.

    >>> repr(11./5)
    '2.2'
    >>> 11./5
    2.2000000000000002
    >>> str(11./5)
    '2.2'

    @noam
    Copy link
    Mannequin Author

    noam mannequin commented Dec 11, 2007

    I think that for str(), the current method is better - using the new
    repr() method will make str(1.1*3) == '3.3000000000000003', instead of
    '3.3'. (The repr is right - you can check, and 1.1*3 != 3.3. But for
    str() purposes it's fine.)

    But I actually think that we should also use Tcl's decimal to binary
    conversion - otherwise, a .pyc file created by python compiled with
    Microsoft will cause a different behaviour from a .pyc file created by
    python compiled with Gnu, which is quite strange.

    @noam
    Copy link
    Mannequin Author

    noam mannequin commented Dec 11, 2007

    If I think about it some more, why not get rid of all the float
    platform-dependencies and define how +inf, -inf and nan behave?

    I think that it means:

    • inf and -inf are legitimate floats just like any other float.
      Perhaps there should be a builtin Inf, or at least math.inf.
    • nan is an object of type float, which behaves like None, that is:
      "nan == nan" is true, but "nan < nan" and "nan < 3" will raise an
      exception. Mathematical operations which used to return nan will raise
      an exception (division by zero does this already, but "inf + -inf"
      will do that too, instead of returning nan.) Again, there should be a
      builtin NaN, or math.nan. The reason for having a special nan object
      is compatibility with IEEE floats - I want to be able to pass around
      IEEE floats easily even if they happen to be nan.

    This is basically what Tcl did, if I understand correctly - see item 6
    in http://www.tcl.tk/cgi-bin/tct/tip/132.html .

    @tiran
    Copy link
    Member

    tiran commented Dec 11, 2007

    Noam Raphael wrote:

    • nan is an object of type float, which behaves like None, that is:
      "nan == nan" is true, but "nan < nan" and "nan < 3" will raise an
      exception.

    No, that's not correct. The standard defines that nan is always unequal
    to nan.

    False
    >>> float("inf") == float("inf")
    True
    >>> float("inf") == -1*float("-inf")
    True

    The float module could gain three singletons nan, inf an neginf so you
    can do

    True

    Christian

    @noam
    Copy link
    Mannequin Author

    noam mannequin commented Dec 11, 2007

    ‎That's right, but the standard also defines that 0.0/0 -> nan, and
    1.0/0 -> inf, but instead we raise an exception. It's just that in
    Python, every object is expected to be equal to itself. Otherwise, how
    can I check if a number is nan?

    @tiran
    Copy link
    Member

    tiran commented Dec 11, 2007

    I propose that we add three singletons to the float implementation:

    PyFloat_NaN
    PyFloat_Inf
    PyFloat_NegInf

    The singletons are returned from PyFloat_FromString() for "nan", "inf"
    and "-inf". The other PyFloat_ method must return the singletons, too.
    It's easy to check the signum, special and nan/inf status of an IEEE
    double with a struct.

    I've already started to work on a way to create nan, inf and -inf cross
    platform conform. Right now float("nan"), float("inf") and float("-inf")
    works on Linux but doesn't work on Windows. We could also add four
    methods to math:

    signum(float) -> 1/-1, finity(float) -> bool, infinite(float), nan(float)

    Here is my work in progress:
    #define Py_IEEE_DBL *(double*)&(ieee_double)
    #if defined(LITTLE_ENDIAN_IEEE_DOUBLE) && !defined(BIG_ENDIAN_IEEE_DOUBLE)
    typedef struct {
            unsigned int    m4 : 16;
            unsigned int    m3 : 16;
            unsigned int    m2 : 16;
            unsigned int    m1 :  4;
            unsigned int   exp : 11;
            unsigned int  sign :  1;
    } ieee_double;
    #define Py_IEEE_NAN Py_IEEE_DBL{0xffff, 0xffff, 0xffff, 0xf, 0x7ff, 0}
    #define Py_IEEE_INF Py_IEEE_DBL{0, 0, 0, 0, 0x7ff, 0}
    #define Py_IEEE_NEGINF Py_IEEE_DBL{0, 0, 0, 0, 0x7ff, 1}
    #elif !defined(LITTLE_ENDIAN_IEEE_DOUBLE) && defined(BIG_ENDIAN_IEEE_DOUBLE)
    typedef struct {
            unsigned int  sign :  1;
            unsigned int   exp : 11;
            unsigned int    m1 :  4;
            unsigned int    m2 : 16;
            unsigned int    m3 : 16;
            unsigned int    m4 : 16;
    } ieee_double;
    #define Py_IEEE_NAN Py_IEEE_DBL{0, 0x7ff, 0xf, 0xffff, 0xffff, 0xffff}
    #define Py_IEEE_INF Py_IEEE_DBL{0, 0x7ff, 0, 0, 0, 0}
    #define Py_IEEE_NEGINF Py_IEEE_DBL{1, 0x7ff, 0, 0, 0, 0}
    #else
    #error Unknown or no IEEE double implementation
    #endif

    @gvanrossum
    Copy link
    Member

    (1) Despite Tim's grave language, I don't think we'll need to write our
    own correctly-rounding float input routine. We can just say that Python
    won't work correctly unless your float input routine is rounding
    correctly; a unittest should detect whether this is the case. I expect
    that more recent C runtimes for Windows are correct.

    (1a) Perhaps it's better to only do this for Python 3.0, which has a
    smaller set of platforms to support.

    (2) Have you two (Christian and Noam) figured out yet why repr(1e5) is
    '1.0' on some platforms? If it's not byte order (and I no longer
    believe it is) what could it be? An uninitialized variable? It doesn't
    seem to vary with the processor, but it could vary with the C runtime or OS.

    (3) Detecting NaNs and infs in a platform-independent way is tricky,
    probably beyond you (I know it's beyond me). (Of course, producing
    standardized output for them and recognizing these as input is easy, and
    that should definitely be done.)

    (4) Looks like we've been a bit too hasty checking this in. Let's be
    extra careful for round #2.

    @tiran
    Copy link
    Member

    tiran commented Dec 11, 2007

    Guido van Rossum wrote:

    (1a) Perhaps it's better to only do this for Python 3.0, which has a
    smaller set of platforms to support.

    +1
    Does Python depend on a working, valid and non-broken IEEE 754 floating
    point arithmetic? Could we state the Python's float type depends on
    IEEE754-1985 conform 64bit double precision format. Developers for a
    platform with no IEEE754 conform double must role their own floatobject
    implementation.

    (2) Have you two (Christian and Noam) figured out yet why repr(1e5) is
    '1.0' on some platforms? If it's not byte order (and I no longer
    believe it is) what could it be? An uninitialized variable? It doesn't
    seem to vary with the processor, but it could vary with the C runtime or OS.

    I wasn't able to reproduce the problem on my work stations. Can you give
    us more information about the computer with failing tests? CPU, OS,
    Kernel, libc etc.

    (3) Detecting NaNs and infs in a platform-independent way is tricky,
    probably beyond you (I know it's beyond me). (Of course, producing
    standardized output for them and recognizing these as input is easy, and
    that should definitely be done.)

    In fact detecting NaNs and Infs is platform-independent is dead easy IFF
    the platform has IEEE 754 support. A double uses 64bit to store the
    data, 1 bit signum, 11 bit exponent and 53bit fraction part (aka
    mantissa). When all bits of the exponent are set (0x7ff) the number is
    either a NaN or +/-Inf. For Infs the fraction part has no bits set (0)
    and the signum bit signals +Inf (sig: 0) or -Inf (sig: 1). If one to all
    bits of the mantissa is set, it's a NaN.

    The form (normalized or denormalized) is regardless for the detection of
    NaN and Inf. An autoconf macro can easily detect the endianness of the
    double implementation.

    We can also use the much slower version and check if x > DBL_MAX for
    +Inf, x < -DBL_MAX for -Inf and x != x for NaN.

    Christian

    @gvanrossum
    Copy link
    Member

    > (1a) Perhaps it's better to only do this for Python 3.0, which has a
    > smaller set of platforms to support.

    +1
    Does Python depend on a working, valid and non-broken IEEE 754 floating
    point arithmetic? Could we state the Python's float type depends on
    IEEE754-1985 conform 64bit double precision format. Developers for a
    platform with no IEEE754 conform double must role their own floatobject
    implementation.

    No, traditionally Python has just used whatever C's double provides.

    There are some places that benefit from IEEE 754, but few that require
    it (dunno about optional extension modules).

    > (2) Have you two (Christian and Noam) figured out yet why repr(1e5) is
    > '1.0' on some platforms? If it's not byte order (and I no longer
    > believe it is) what could it be? An uninitialized variable? It doesn't
    > seem to vary with the processor, but it could vary with the C runtime or OS.

    I wasn't able to reproduce the problem on my work stations. Can you give
    us more information about the computer with failing tests? CPU, OS,
    Kernel, libc etc.

    So far I have only one box where it is broken (even after make clobber
    and ./configure). I cannot reproduce it with a debug build.

    It's an x86 Linux box running in mixed 64-32 bit mode.

    From /etc/lsb-release: "Ubuntu 6.06 LTS"
    uname -a: Linux pythonic.corp.google.com 2.6.18.5-gg16-mixed64-32 #1
    SMP Wed Oct 10 17:45:08 PDT 2007 x86_64 GNU/Linux
    gcc -v: gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)

    I'm afraid I'll have to debug this myself, but not today.

    > (3) Detecting NaNs and infs in a platform-independent way is tricky,
    > probably beyond you (I know it's beyond me). (Of course, producing
    > standardized output for them and recognizing these as input is easy, and
    > that should definitely be done.)

    In fact detecting NaNs and Infs is platform-independent is dead easy IFF
    the platform has IEEE 754 support. A double uses 64bit to store the
    data, 1 bit signum, 11 bit exponent and 53bit fraction part (aka
    mantissa). When all bits of the exponent are set (0x7ff) the number is
    either a NaN or +/-Inf. For Infs the fraction part has no bits set (0)
    and the signum bit signals +Inf (sig: 0) or -Inf (sig: 1). If one to all
    bits of the mantissa is set, it's a NaN.

    The form (normalized or denormalized) is regardless for the detection of
    NaN and Inf. An autoconf macro can easily detect the endianness of the
    double implementation.

    Only for IEEE 754 though.

    We can also use the much slower version and check if x > DBL_MAX for
    +Inf, x < -DBL_MAX for -Inf and x != x for NaN.

    Of course the latter isn't guaranteed to help for non-IEEE-754
    platforms -- some platforms don't have NaNs at all!

    @rhettinger
    Copy link
    Contributor

    Of course the latter isn't guaranteed to help for
    non-IEEE-754 platforms -- some platforms don't have
    NaNs at all!

    ISTM, that years of toying with Infs and Nans has not
    yielded a portable, workable solution. I'm concerned
    that further efforts will contort and slow-down our
    code without benefit. NaNs in particular are a really
    difficult case because our equality testing routines
    have a fast path where identity implies equality. I don't
    think NaNs can be correctly implemented without breaking
    that fast path which appears throughout the code base
    (even inside the code for dictobject.c; otherwise, how
    could you lookup a NaN value).

    I recommend punting on the subject of NaNs. Attempting
    to "fix" it will necessarily entail introducing a
    number of little atrocities that just aren't worth it.
    IMO, Practicality beats purity on this one.

    The decimal module provides full support for NaNs.
    While it is slow, it may be sufficient for someone who
    isn't happy with the way float handles them.

    @tim-one
    Copy link
    Member

    tim-one commented Dec 11, 2007

    [Raymond]

    ...
    NaNs in particular are a really
    difficult case because our equality testing routines
    have a fast path where identity implies equality.

    Works as intended in 2.5; this is Windows output:

    1.#INF
    >>> nan = inf - inf
    >>> nan  # really is a NaN
    -1.#IND
    >>> nan is nan  # of course this is true
    True
    >>> nan == nan  # but not equal anyway
    False

    @mdickinson
    Copy link
    Member

    Would it be acceptable to use shorter float repr only on big-endian and
    little-endian IEEE 754 platforms, and use the full 17-digit repr on other
    platforms? This would greatly simplify the adaptation and testing of
    Gay's code.

    Notable platforms that would still have long repr under this scheme, in
    order of decreasing significance:

    • IBM zSeries mainframes when using their hexadecimal FPU (recent
      models also have IEEE 754 floating-point, and Linux on zSeries
      uses that in preference to the hex floating-point).
    • ARM, Old ABI (uses a mixed-endian IEEE 754 format; a newer set
      of ABIs appeared recently that use big or little-endian IEEE 754)
    • Cray SV1 (introduced 1998) and earlier Cray machines. Gay's code
      doesn't support the Cray floating-point format anyway. I believe
      that newer Crays (e.g., the X1, introduced in 2003) use IEEE
      floating-point.
    • VAX machines, using VAX D and G floating-point formats. More recent
      Alpha machines support IEEE 754 floats as well.
    • many ancient machines with weird and wonderful floating-point
      schemes

    As far as I can determine, apart from the IBM machines described above all
    new platforms nowadays use an IEEE 754 double format. Sometimes IEEE
    arithmetic is only emulated in software; often there are shortcuts taken
    with respect to NaNs and underflow, but in all cases the underlying format
    is still IEEE 754, so Gay's code should work. Gay's careful to avoid
    problems with machines that flush underflow to zero, for example.

    Gay's code only actually supports 3 floating-point formats: IEEE 754 (big
    and little-endian), IBM hex format and VAX D floating-point format. There
    are actually (at least?) two variants of D floating-point: the version
    used on VAX, which has a 56-bit mantissa and an 8-bit exponent, and the
    Alpha version, which only has a 53-bit mantissa (but still only an 8-bit
    exponent); Gay's code only supports the former.

    I'm quite convinced that the VAX stuff in Gay's code is not worth much to
    us, so really the only question is whether it's worth keeping the code to
    support IBM hexadecimal floating-point. Given the difficulty of testing
    this code and the (significant but not overwhelming) extra complexity it
    adds, I'd like to remove it.

    @gvanrossum
    Copy link
    Member

    Sounds good to me.

    @rhettinger
    Copy link
    Contributor

    +1 on the fallback strategy for platforms we don't know how to handle.

    @mdickinson
    Copy link
    Member

    Eric and I have set up a branch of py3k for work on this issue. URL for
    (read-only) checkout is:

    http://svn.python.org/projects/python/branches/py3k-short-float-repr

    @ericvsmith
    Copy link
    Member

    My changes on the py3k-short-float-repr branch include:

    • Create a new function PyOS_double_to_string. This will replace
      PyOS_ascii_formatd. All existing internal uses of PyOS_ascii_formatd
      follow this pattern: printf into a buffer to build up the format string,
      call PyOS_ascii_formatd, then parse the format string to determine what
      to do. The changes to the API are:
      • discrete parameters instead of a printf-like format string. No
        parsing is required.
      • returns PyMem_Malloc'd memory instead of writing into a supplied
        buffer.
    • Modify all callers of PyOS_ascii_formatd to call PyOS_double_to_string
      instead. These are:

    Remaining to be done:

    • Re-enable 'n' (locale-aware) formatting in float.__format__().
    • Decide what to do about the change in meaning of "alt" formatting with
      format code 'g' when an exponent is present, also in float.__format__().
    • Have marshal.c deal with potential memory failures returned from
      PyOS_double_to_string.
    • Deprecate PyOS_ascii_formatd.

    @mdickinson
    Copy link
    Member

    So work on the py3k-short-float-repr branch is nearing completion, and
    we (Eric and I) would like to get approval for merging these changes
    into the py3k branch before this month's beta.

    A proposal: I propose that the short float representation should be
    considered an implementation detail for CPython, not a requirement for
    Python the language. This leaves Jython and IronPython and friends free
    to do as they wish. All that should be required for Python itself is
    that float(repr(x)) == x for (non-nan, non-infinite) floats x. Does
    this seem reasonable to people following this issue? If it's
    controversial I'll take it to python-dev.

    Eric's summarized his changes above. Here are mine (mostly---some
    of this has bits of Eric's work in too):

    • change repr(my_float) to produce output with minimal number of
      digits, on IEEE 754 big- and little-endian systems (which includes all
      systems we can actually get our hands on right now).

    • there's a new file Python/dtoa.c (and a corresponding file
      Include/dtoa.h) with a cut-down version of Gay's dtoa and strtod in it.
      There are comments at the top of that file indicating the changes that
      have been made from Gay's original code.

    • one minor semantic change: repr(x) changes to exponential format at
      1e16 instead of 1e17. This avoids a strange situation where Gay's code
      produces a 16-digit 'shortest' integer string for a particular float and
      Python's formatting code then pads with an incorrect '0':

    >>> x = 2e16+8.  # 2e16+8. is exactly representable as a float
    >>> x
    20000000000000010.0

    There's no way that this padding with bogus digits can happen for
    numbers less than 1e16.

    @mdickinson
    Copy link
    Member

    Changing target Python versions.

    I'll upload a patchset to Rietveld sometime soon (later today, I hope).

    @mdickinson
    Copy link
    Member

    I've uploaded the current version to Rietveld:

    http://codereview.appspot.com/33084/show

    @mdickinson
    Copy link
    Member

    The Rietveld patch set doesn't show the three new files, which are:

    Python/dtoa.c
    Include/dtoa.h
    Lib/test/formatfloat_testcases.txt

    @mdickinson
    Copy link
    Member

    Those three missing files have now been added to Rietveld.

    Just for reference, in case anyone else encounters this: the reason those
    files were missing from the initial upload was that after I svn merge'd
    from py3k-short-float-repr to py3k, those files were being treated (by
    svn) as *copies* from py3k-short-float-repr, rather than new files, so svn
    diff didn't show any output for them. Doing:

    svn revert Python/dtoa.c
    svn add Python/dtoa.c

    (and similarly for the other two files) fixed this.

    @gvanrossum
    Copy link
    Member

    On Tue, Apr 7, 2009 at 3:10 AM, Mark Dickinson <report@bugs.python.org> wrote:

    A proposal: I propose that the short float representation should be
    considered an implementation detail for CPython, not a requirement for
    Python the language.  This leaves Jython and IronPython and friends free
    to do as they wish.

    In principle that's fine with me.

    All that should be required for Python itself is
    that float(repr(x)) == x for (non-nan, non-infinite) floats x.  Does
    this seem reasonable to people following this issue?  If it's
    controversial I'll take it to python-dev.

    Historically, we've had a stronger requirement: if you print repr(x)
    and ship that string to a different machine, float() of that string
    returns the same value, assuming both systems use the same internal FP
    representation (e.g. IEEE). Earlier in this bug (or elsewhere) Tim
    Peters has pointed out that on some platforms, in particular Windows,
    input conversion doesn't always round correctly and only works
    correctly when the input has at least 17 digits (in which case you
    apparently don't need to round, and truncation works just as well --
    that's all the C standard requires, I believe).

    Now that pickle and marshal no longer use repr() for floats I think
    this is less important, but still at least worth considering. I think
    by having our own input function we solve this if both hosts run
    CPython, but it might break for other implementations.

    In order to make progress I recommend that we just not this and don't
    use it to hold up the implementation, but I think it's worth noticing
    in docs somewhere.

    @jaredgrubb
    Copy link
    Mannequin

    jaredgrubb mannequin commented Apr 7, 2009

    I think ANY attempt to rely on eval(repr(x))==x is asking for trouble,
    and it should probably be removed from the docs.

    Example: The following C code can vary *even* on a IEEE 754 platform,
    even in two places in the same source file (so same compile options),
    even in the same 5 lines of code recompiled after changing code that
    does even not touch/read 'x' or 'y':

    double x, y;
    x = 3.0/7.0;
    y = x;
    /* ... code that doesnt touch/read x or y ... */
    printf(" x==y: %s", (x==y) ? "true" : "false");

    So, how can we hope that eval(repr(x))==x is EVER stable? Equality and
    floating point should raise the hairs on the back of everyone's neck...

    (Code above based on
    http://docs.sun.com/source/806-3568/ncg_goldberg.html in section
    "Current IEEE 754 Implementations", along with a great explanation on
    why this is true. The code example is a little different, but the same
    concept applies.)

    @mdickinson
    Copy link
    Member

    Historically, we've had a stronger requirement: if you print repr(x)
    and ship that string to a different machine, float() of that string
    returns the same value, assuming both systems use the same internal FP
    representation (e.g. IEEE).

    Hmm. With the py3k-short-float-repr stuff, we should be okay moving
    things between different CPython boxes, since float() would use Gay's code
    and so would be correctly rounded for any length input, not just input
    with 17 significant digits.

    But for a CPython-generated short repr to give the correct value on some
    other implementation, that implementation would also have to have a
    correctly rounded string->float.

    For safety's sake, I'll make sure that marshal (version 1) and pickle
    (protocol 0) are still outputting the full 17 digits.

    @mdickinson
    Copy link
    Member

    I think ANY attempt to rely on eval(repr(x))==x is asking for trouble,
    and it should probably be removed from the docs.

    I disagree. I've read the paper you refer to; nevertheless, it's still
    perfectly possible to guarantee eval(repr(x)) == x in spite of the
    various floating-point details one has to deal with. In the worst
    case, assuming that C doubles are IEEE 754 binary64 format, for double -

    string conversion one can simply cast the double to an array of 8
    bytes, extract the sign, exponent and mantissa from those bytes, and
    then compute the appropriate digit string using nothing but integer
    arithmetic; similarly for the reverse operation: take a digit string,
    do some integer arithmetic to figure out what the exponent, mantissa and
    sign should be, and pack those values back into the double. Hey presto:
    correctly rounded string -> double and double -> string conversions!
    Add a whole bag of tricks to speed things up in common cases and you end
    up with something like David Gay's code.

    It *is* true that the correctness of Gay's code depends on the FPU being
    set to use 53-bit precision, something that isn't true by default on x87
    FPUs. But this isn't hard to deal with, either: first, you make sure
    that you're not using the x87 FPU unless you really have to (all Intel
    machines that are newer than Pentium 3 have the SSE2 instruction set,
    which doesn't have the problems that x87 does); second, if you *really*
    have to use x87 then you figure out how to set its control word to get
    53-bit precision and round-half-to-even rounding; third, if you can't
    figure out how to set the control word then you use a fallback something
    like that described above.

    @jaredgrubb
    Copy link
    Mannequin

    jaredgrubb mannequin commented Apr 7, 2009

    The process that you describe in msg85741 is a way of ensuring
    "memcmp(&x, &y, sizeof(x))==0", and it's portable and safe and is the
    Right Thing that we all want and expect. But that's not "x==y", as that
    Sun paper explains. It's close, but not technically accurate, as the
    implication arrow only goes one way (just as "x=1/y" implies "xy=1" in
    algebra, but not the other way around)

    I'd be interested to see if you could say that the Python object
    model/bytecode interpretation enforces a certain quauntum of operations
    that actually does imply "eval(repr(x))==x"; but I suspect it's
    unprovable, and it's fragile as Python grows to have more support in
    CLI/LLVM/JVM backends.

    My pedantic mind would strip any and all references to floating-point
    equality out of the docs, as it's dangerous and insidiously misleading,
    even in "obvious" cases. But, I'll stop now :) (FYI: I've enjoyed the
    ~100 messages here.. Great stuff!)

    @mdickinson
    Copy link
    Member

    The py3k-short-float-repr branch has been merged to py3k in two parts:

    r71663 is mostly concerned with the inclusion of David Gay's code into the
    core, and the necessary floating-point fixups to allow Gay's code to be
    used (SSE2 detection, x87 control word manipulation, etc.)

    r71665 contains Eric's *mammoth* rewrite and upgrade of the all the float
    formatting code to use the new _Py_dg_dtoa and _Py_dg_strtod functions.

    Note: the new code doesn't give short float repr on *all* platforms,
    though it's close. The sticking point is that Gay's code needs 53-bit
    rounding precision, and the x87 FPU uses 64-bit rounding precision by
    default (though some operating systems---e.g., FreeBSD, Windows---change
    that default). For the record, here's the strategy that I used: feedback
    (esp. from experts) would be appreciated.

    • If the float format is not IEEE 754, don't use Gay's code.

    • Otherwise, if we're not on x86, we're probably fine.
      (Historically, there are other FPUs that have similar
      problems---e.g., the Motorola 68881/2 FPUs---but I think they're
      all too old to be worth worrying about by now.)

    • x86-64 in 64-bit mode is also fine: there, SSE2 is the default.
      x86-64 in 32-bit mode (e.g., 32-bit Linux on Core 2 Duo) has
      the same problems as plain x86. (OS X is fine: its gcc has
      SSE2 enabled by default even for 32-bit mode.)

    • Windows/x86 appears to set rounding precision to 53-bits by default,
      so we're okay there too.

    So:

    • On gcc/x86, detect the availability of SSE2 (by examining
      the result of the cpuid instruction) and add the appropriate
      flags (-msse2 -mfpmath=sse2) to BASECFLAGS if SSE2 is available.

    • On gcc/x86, if SSE2 is *not* available, so that we're using the
      x87 FPU, use inline assembler to set the rounding precision
      (and rounding mode) before calling Gay's code, and restore
      the FPU state directly afterwards. Use of inline assembler is pretty
      horrible, but it seems to be *more* portable than any of the
      alternatives. The official C99 way is to use fegetenv/fesetenv to get
      and set the floating-point environment, but the fenv_t type used to
      store the environment can (and will) vary from platform to platform.

    • There's an autoconf test for double-rounding. If there's no
      evidence of double rounding then it's likely to be safe to use
      Gay's code: double rounding is an almost unavoidable symptom of
      64-bit precision on x87.

    So on non-Windows x86 platforms that *aren't* using gcc and *do* exhibit
    double rounding (implying that they're not using SSE2, and that the OS
    doesn't change the FPU rounding precision to 53 bits), we're out of luck.
    In this case the old long float repr is used.

    The most prominent platform that I can think of that's affected by this
    would be something like Solaris/x86 with Sun's own compiler, or more
    generally any Unix/x86 combination where the compiler isn't gcc. Those
    platforms need to be dealt with on a case-by-case basis, by figuring out
    for each such platform how to detect and use SSE2, and how to get and set
    the x87 control word if SSE2 instructions aren't available.

    Note that if any of the above heuristics is wrong and we end up using
    Gay's code inappropriately, then there will be *loud* failure: we'll know
    about it.

    @mdickinson
    Copy link
    Member

    Closing this. There are a few known problems remaining, but they've all
    got their own issue numbers: see bpo-5780, bpo-4482.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 18, 2009

    Hello folks,

    IIUC, autoconf tries to enable SSE2 by default without asking. Isn't it
    a problem for people distributing Python binaries (e.g. Linux vendors)
    and expecting these binaries to work on legacy systems even though the
    system on which the binaries were compiled supports SSE2?

    Or am I misunderstanding what the changes are?

    @mdickinson
    Copy link
    Member

    Yes, I think you're right.

    Perhaps the SSE2 support should be turned into an --enable-sse2 configure
    option, that's disabled by default? One problem with this is that I don't
    know how to enable SSE2 instructions for compilers other than gcc, so the option would only apply to gcc-based systems.

    Disabling SSE2 would just mean that all x86/gcc systems would end up using
    the inline assembler to get and set the control word; so we'd still get
    short float repr everywhere we did before.

    @mdickinson
    Copy link
    Member

    Perhaps better to drop the SSE2 bits completely. Anybody who
    actually wants SSE2 instructions in their binary can do a

    CC="gcc -msse2 -mfpmath=sse" configure && ...

    Unless there are objections, I'll drop everything involving SSE2 from
    the configure script.

    It's a bit of a shame, though: it's definitely desirable to be using
    the SSE2 instructions for floating-point whenever possible. Fewer
    surprises due to double rounding or random 80-bit register spilling,
    better performance on underflow and NaNs, ...

    @mdickinson
    Copy link
    Member

    SSE2 detection and flags removed in r71723. We'll see how the buildbots
    fare...

    @rhettinger
    Copy link
    Contributor

    Is there a way to use SSE when available and x86 when it's not. IIRC,
    floating point used to work that way (using a C lib as a fallback on
    systems w/o coprocessor support).

    @pitrou
    Copy link
    Member

    pitrou commented Apr 18, 2009

    Is there a way to use SSE when available and x86 when it's not.

    Probably, but I don't think there is any point doing so. The main
    benefit of SSE2 is to get higher performance on floating point intensive
    code, which no pure Python code could be regarded as (the CPU cycles
    elapsed between two FP instructions would dwarf the cost of executing
    the FP instructions themselves).

    The situation is different for specialized packages like numpy, but
    their decisions need not be the same as ours AFAIK.

    @rhettinger
    Copy link
    Contributor

    The advantage is accuracy. No double rounding. This will also help the
    math.fsum() function that is also susceptible to double rounding.

    @mdickinson
    Copy link
    Member

    [Raymond]

    Is there a way to use SSE when available and x86 when it's not.

    I guess it's possible in theory, but I don't know of any way to do this in
    practice. I suppose one could trap the SIGILL generated by the attempted
    use of an SSE2 instruction on a non-supported platform---is this how
    things used to work for 386s without the 387? That would make a sequence
    of floating-point instructions on non-SSE2 x86 horribly slow, though.

    Antoine: as Raymond said, the advantage of SSE2 for numeric work is
    accuracy, predictability, and consistency across platforms. The SSE2
    instructions finally put an end to all the problems arising from the
    mismatch between the precision of the x87 floating-point registers (64-
    bits) and the precision of a C double (53-bits). Those difficulties
    include (1) unpredictable rounding of intermediate values from 64-bit
    precision to 53-bit precision, due to spilling of temporaries from FPU
    registers to memory, and (2) double-rounding. The arithmetic of Python
    itself is largely immune to the former, but not the latter. (And of
    course the register spilling still causes headaches for various bits of
    CPython).

    Those difficulties can be *mostly* dealt with by setting the x87 rounding
    precision to double (instead of extended), though this doesn't fix the
    exponent range, so one still ends up with double-rounding on underflow.
    The catch is that one can't mess with the x87 state globally, as various
    library functions (especially libm functions) might depend on it being in whatever the OS considers to be the default state.

    There's a very nice paper by David Monniaux that covers all this:
    definitely recommended reading after you've finished Goldberg's "What
    Every Computer Scientist...". It can currently be found at:

    http://hal.archives-ouvertes.fr/hal-00128124/en/

    An example: in Python (any version), try this:

    >>> 1e16 + 2.9999
    10000000000000002.0

    On OS X, Windows and FreeBSD you'll get the answer above.
    (OS X gcc uses SSE2 by default; Windows and FreeBSD both
    make the default x87 rounding-precision 53 bits).

    On 32-bit Linux/x86 or Solaris/x86 you'll likely get the answer

    10000000000000004.0

    instead, because Linux doesn't (usually?) change the Intel default
    rounding precision of 64-bits. Using SSE2 instead of the x87 would have
    fixed this.

    </standard x87 rant>

    @olau
    Copy link
    Mannequin

    olau mannequin commented Sep 30, 2010

    Just came across this bug, I don't want to reopen this or anything, but regarding the SSE2 code I couldn't help thinking that why can't you just detect the presence of SSE2 when the interpreter starts up and then switch implementations based on that? I think that's what liboil does (http://liboil.freedesktop.org/wiki/).

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants