This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: struct returns incorrect 4 byte float
Type: behavior Stage:
Components: Extension Modules Versions: Python 2.5.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Robert.Withrow, TD22057, loewis, mark.dickinson, rhettinger, vstinner
Priority: normal Keywords:

Created on 2008-10-13 17:11 by TD22057, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (22)
msg74690 - (view) Author: (TD22057) Date: 2008-10-13 17:11
FYI Actual version is 2.5.2 running on Linux RHE4.

>>> import struct
>>> fmt ='>f'
>>> v=1.8183e-7
>>> v
1.8183000000000001e-07
>>> s=struct.pack(fmt,v)
>>> struct.unpack(fmt,s)
(1.818300034983622e-07,)

Looks to me like the float->double conversion is not being zeroed out
before the 4 bytes are written to it.  FYI this is a fairly serious
issue since it leads to incorrect results be read from files (at least
for me anyway).
msg74691 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-13 17:17
Why do you use float (32 bits) instead of double (64 bits)? Your 
example use:
 double (python) -> float (C) -> double (python)

If you convert 64 bits float to 32 bits, for sure you will loose some 
digits. It's not a bug in Python, but a problem in your code ;-)
msg74692 - (view) Author: (TD22057) Date: 2008-10-13 17:44
That's not my code - it's an example ;)

My code reads binary data from a hardware system that is encoding 32 bit
floats.  The numbers I get back from struct.decode have garbage appended
on the end of the floating point numbers beyond the 32 bit range.

There is no 32 bit float type in python that I can allocate.  If you
want a 32 bit type as an input, try this:

>>> v=123456789
>>> struct.unpack(fmt,struct.pack(fmt,v))
(123456792.0,)
msg74694 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-13 20:42
I don't understand your problem/question. It's not a bug of Python, 
it's a problem of conversion between the types float (32 bits) and 
double (64 bits).
msg74695 - (view) Author: (TD22057) Date: 2008-10-13 21:13
I'm receiving a 32 bit floating point number encoded in binary format. 
The correct value for that number is 1.8183e-7 which can be expressed in
single precision just fine.  Given that the number in the binary
encoding is 1.8183e-7, I expected to get that back, not
1.818300034983622e-07 (which is NOT the closest double precision number
to the actual single precision number).

After doing some experiments, I think the problem is a basic fact of
life in C code which is that casting that single precision value to a
double precision one does not specify what happens in the extra digits.
 So I'm getting garbage in the extra digits when the struct module C
code casts the single to a double.  The problem is that C doesn't assume
that the non-precise digits are zero.

Since this is a function of the underlying C language, I'll withdraw the
bug.  (Hmm - I don't seem to have permission to do that)
msg74696 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-13 21:21
The problem is not from Python but from your FPU during the conversion 
from 64 bits float to 32 float (made by struct.pack). Example in C:

#include <stdio.h>
int main()
{
    float f;
    double d;
    d = 1.8183;
    printf("d=%.20f\n", d);
    f = (float)d;
    d = (double)f;
    printf("f=%.20f\n", f);
    printf("d=%.20f\n", d);
    return 0;
}

Result:
d=1.81830000000000002736   # ok
f=1.81830000877380371094   # 64->32: loose precision
d=1.81830000877380371094   # 32->64: no change
msg74698 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-10-13 21:45
I think the complaint is that presumably, when expanding the float to
double in unpacking, the result is not zero-padded. I cannot reproduce
the problem, though:

py>
hexlify(struct.pack(">d",struct.unpack(">f",struct.pack(">f",1.8183e-7))[0]))
'3e8867a1a0000000'

Seems to me that the zero-padding works just well.

TD22057, why do you say that 1.818300034983622e-07 is not the closest
number. AFAICT, this is not true: this *is* the closest number.
msg74700 - (view) Author: (TD22057) Date: 2008-10-13 21:59
Martin is correct.  I expected (naively) that struct would zero pad the
digits beyond the significant digits of a float.  As to whether it's
exact or not, see my first message:
>>> v=1.8183e-7
>>> v
1.8183000000000001e-07

Since 32 bit floats only have ~7 digits of precision, I expected to get
the same thing back.  Not 7 digits + garbage.

Like I said, you can mark this bug as invalid since Python is just
reflecting what C is doing.
msg74702 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-13 22:06
"Since 32 bit floats only have ~7 digits of precision, I expected to 
get the same thing back. Not 7 digits + garbage."

This problem is a well known problem of conversion from base 2 (IEEE 
float) to base 10 (Python unicode string). Search for any programming 
FAQ, eg.
http://www.python.org/doc/faq/general/#why-are-floating-point-calculations-so-inaccurate

"Python is just reflecting what C is doing": the problem is deeper in 
the silicium. If you want a better precision, use an arbitrary 
precision float type like decimal.Decimal() or the GMP library 
(Python: gmpy)
msg74704 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-10-13 22:22
> "Python is just reflecting what C is doing": the problem is deeper in 
> the silicium. If you want a better precision, use an arbitrary 
> precision float type like decimal.Decimal() or the GMP library 
> (Python: gmpy)

The problem is indeed deeper, however, I doubt GMP is an answer here:
we are talking about the struct module, which, *by design* gives access
to 32-bit (inprecise) floating point numbers - not because people
deliberately want to perform computations inaccurately, but because
there is often a need to interface with this specific representation
(which originally probably was created for its own reasons, such as
to save space, or because some hardware didn't support double
precision).
msg74705 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-13 22:27
@loewis: Yes, the initial problem is about unpack("f", bytes). It's 
not possible to exact original 32 bits float value because Python 
forces a conversion to 64 bits float. The behaviour should be 
documented. Don't hesitate to reopen the bug if you consider that 
something should be fixed in Python.
msg74708 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-10-13 22:36
> Don't hesitate to reopen the bug if you consider that
> something should be fixed in Python.

I agree that it should be closed; people should read general CS
introductory material to learn how floating point numbers work.

> @loewis: Yes, the initial problem is about unpack("f", bytes). It's 
> not possible to exact original 32 bits float value

Interestingly enough, it is possible - using the OPs approach.
If you want to truncate a 64-bit floating point number to a
32-bit one, pack it as float in the struct module, then unpack it.
Python will automatically pad the mantissa bytes with null bytes.

> because Python 
> forces a conversion to 64 bits float. The behaviour should be 
> documented. 

I think it's documented somewhere that a Python float is represented
with a C double. That should suffice, IMO.
msg131195 - (view) Author: Robert Withrow (Robert.Withrow) Date: 2011-03-16 23:25
I have to disagree.  It seems entirely reasonable to expect that unpack should return the same value passed to pack.  That it doesn't (as of 2.6.5 at least) is completely unexpected and undocumented.  And yes I understand the limitations of floating point numbers.

I suggest that struct should be fixed so that struct.unpack(fmt,struct.pack(fmt,v)) == v and format is something like '!f'.

This can be done in C code (I do it) for IEEE 754 floats.

At the very list this unexpected behavior should be documented in struct.
msg131203 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-03-17 00:37
Robert: Can you please suggest an algorithm that would have given the result you expect (preferably as a C program, using a string literal as its input, for some platform)? Ideally, we would stick to the example given by the OP.
msg131234 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-03-17 08:05
I don't think this needs to be documented beyond the limitations of floating-point that are already documented in the tutorial.  It's the obvious behaviour:  double to float (when packing) converts to the nearest float;  the float to double conversion is exact.
msg131235 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-03-17 08:15
[Robert]
> I have to disagree.  It seems entirely reasonable to expect that
> unpack should return the same value passed to pack.

Robert:  notice that a *Python* float (a *64-bit* C double internally) is here being stored as a *32-bit* float, losing precision.  So no, it's not at all reasonable to expect that unpack should return the same value passed to pack---it's mathematically impossible for it to do so.  There are (around) 2**64 distinct Python floats, and only 2**32 ways to pack them using '<f'.

When packing / unpacking using '<d', it *is* reasonable to expect the value to be recovered exactly, and as far as I know that's always what happens (barring peculiarities like NaN payloads not being reproduced exactly).
msg131264 - (view) Author: Robert Withrow (Robert.Withrow) Date: 2011-03-17 16:13
Martin: in C I have the luxury of using 32 bit floats; not an option in Python.  Simple code doing the moral equivalent of NTOHL(HTONL()) works in this case for C but wouldn't help for Python.

Mark: I understand about the precision truncation issue and how Python does floating point arithmetic.  This C code clearly demonstrates what is going on:

#include <stdio.h>

int main(int argc, char *argv[])
{
  double d1 = 6.21;
  float f = 6.21;
  double d2 = f;
  
  printf("double: %.15f\n", d1);
  printf("float: %.15f\n", f);
  printf("double converted from float: %15.15f\n", d2);
}

The point here is about the contract of struct, NOT how Python does floating point arithmetic.  The contract is: what pack packs, unpack will unpack resulting in the original value.  At least, that is what the documentation leads you to believe.

For the 'f' format character, this contract is broken because of a basic implementation detail of Python and there is nothing in the documentation for struct that *directly* lets you know this will happen.  After all, the mentions in the documentation about 32 bit versus 64 bit talk about C not Python!

Even worse, there is no straightforward way (that I'm aware of) to write portable tests for code using the 'f' format character.  In my case I'm writing a tool that creates message codecs in multiple languages and the most basic unit test goes something like this:

m1 = example.message()
m1.f1 = 6.21
b = m.encode() # uses struct pack
m2 = example.message(b) # uses struct unpack
if m1 != m2:  # rich comparison
  print('fail')

This test will fail when you use the 'f' format code.

I suggest two things could be done to improve the situation:

1) Add a note to the documentation for struct that tells you that unpack(pack) using the 'f' format code will not generally give you the results you probably expect because <insert pointer to discussion of pythons use of C double versus C float here>.

2) Create a way in Python to write portable code related to 32 bit floats.  For example, if I had a way in Python to cause the precision truncation programmatically:

m1 = example.message()
m1.f1 = 6.21.as_32_bit_float() # Does the precision truncation upfront
b = m.encode() # uses struct pack
m2 = example.message(b) # uses struct unpack
if m1 != m2:  # rich comparison
  print('fail')

I'd expect this test to pass.

Hope this long-winded note helps.
msg131266 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-03-17 16:35
> Martin: in C I have the luxury of using 32 bit floats; not an option
> in Python.  Simple code doing the moral equivalent of NTOHL(HTONL())
> works in this case for C but wouldn't help for Python.

If you agree that Python actually behaves correct, I fail to
understand what it is that you disagree with in msg131195

If all you want is a documentation change, can you please propose
specific wording?

> Even worse, there is no straightforward way (that I'm aware of) to
> write portable tests for code using the 'f' format character.

If you use numbers that are exactly representable as floats,
the test should be portable to all platforms that use 32-bit
IEEE-754 floats. If you then also use numbers without a fractional
part, it should even port to non-IEEE platforms (as long as you
don't test for the intermediate bytes).

[...]
> This test will fail when you use the 'f' format code.

So use 6.25 instead.
msg131273 - (view) Author: Robert Withrow (Robert.Withrow) Date: 2011-03-17 17:55
> If you agree that Python actually behaves correct, I fail to
> understand what it is that you disagree with in msg131195

I don't agree that Python is behaving correctly as far as the documented contract for struct is concerned.

I disagree with the statement in the preceding msg74708 which says:

> people should read general CS introductory material
> to learn how floating point numbers work.

Aside from being patronizing it doesn't solve the problem in any meaningful way.

> If you use numbers that are exactly representable as floats,
> the test should be portable to all platforms that use 32-bit
> IEEE-754 floats.

A reasonable suggestion, but it is a constrained definition of "portable".  Since most (or nearly all?) modern platforms use '754 it is probably not a bad constraint, given that struct explicitly uses '754.

> If you then also use numbers without a fractional
> part, it should even port to non-IEEE platforms

I confess, the "CS introductory material" I read 30 years ago (predating '754) don't give me enough information to know if this is correct.

Anyway:

> If all you want is a documentation change, can you please propose
> specific wording?

It isn't exactly "all I want", but it is a good start.  I note that msg74705 suggests adding documentation to struct about the 'f' format code.

First of all, as far as I know, struct is the only place where this issue of 32 bit versus 64 bit floating point numbers shows up in Python because the rest of Python uses only 64 bit numbers.  (Is there anywhere else in Python where a 32 bit float is converted to a 64 bit float?) So the documentation probably belongs in struct.

I would add to note 4 of 7.3.2.2 (in the 2.7.1 documentation) something like:

"Note that 32 bit representations do not generally convert exactly to 64 bit representations (which Python uses internally) so that the results of unpack(fmt,pack(fmt,number)) may not equal number when using the 'f' format character."

It would be friendly to add an example at the bottom demonstrating the issue and incorporating your comments about fractions and non-fractional values.

>>> x = unpack('!f', pack('!f', 6.24))[0]
>>> x == 6.24
False
>>> x = unpack('!f', pack('!f', 6.25))[0]
>>> x == 6.25
True
msg131276 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-03-17 18:21
The suggested examples are misleading because they use 6.24 which is not exactly representable in binary floating point.  Representation issues are orthogonal to the OP's issue which is really just a simple rounding example:

>>> x = float.fromhex('0x0.1234560000001')
>>> unpack('!f', pack('!f', x))[0].hex()
'0x1.2345600000000p-4'

Also, if something like the suggested note is adopted, it needs to be worded in a way that doesn't imply that the struct implementation is broken or misdesigned.  

A better note would focus on the basic (and obvious) fact that downgrading from double precision to single precision  
entails a loss of precision.
msg131278 - (view) Author: Robert Withrow (Robert.Withrow) Date: 2011-03-17 18:59
> it needs to be worded in a way that doesn't
> imply that the struct implementation is broken or misdesigned. 

Agree.

> A better note would focus on the basic (and obvious)
> fact that downgrading from double precision to single
> precision entails a loss of precision.

Sort of where I was going, but I'm sure my text could be vastly improved.

> The suggested examples are misleading because they 
> use 6.24 which is not exactly representable in binary
> floating point.

I'd quibble with this for two reasons:

1) to be precise, numbers which are not exactly representable in binary floating point would nonetheless pass the unpack(pack) test if you use the 'd' format character.  The key issue is, as you said, loss of precision.

2) I don't understand why the 6.24 example is "misleading" when it accurately demonstrates the issue.

One comment about portability I forgot to mention earlier:  I don't know how wed Python is to '754 or even binary floating point representations.  My personal belief is that it should be possible to write a test so that the unpack(fmt, pack(fmt, precision_truncate(number))) == precision_truncate(number) test works for any legal number on any platform.  I don't like the idea that one has to pick specific numbers based on knowledge of the platform's floating point format.

I acknowledge that this may not bother others as much as it bothers me though.  I'm a portability nut.
msg131356 - (view) Author: Robert Withrow (Robert.Withrow) Date: 2011-03-18 19:07
For completeness: msg131234 states that the issue of 64 bit -> 32 bit precision truncation is covered in the floating point tutorial.  I believe that is incorrect; at least I can't find it explicitly mentioned. Ref: http://docs.python.org/tutorial/floatingpoint.html.

If struct is the only place this (64->32 bit precision truncation) can happen in Python, the lack of discussion in the tutorial makes sense.  Otherwise, a sentence about it should be added to the tutorial.

As it is, there is no _explicit_ mention of this anywhere in Python documentation.  It is all well and good to state that it is "obvious", but it seems that explicit documentation is preferable to implicit documentation, given the rarity of the issue in Python and the meager cost of adding a sentence here or there.

Incidentally, it is simple to create the truncation routine I mention earlier:

>>> def fptrunc(value):
...   return unpack('!f', pack('!f', value))[0]
... 
>>> fptrunc(6.24)
6.2399997711181641
>>> fptrunc(6.25)
6.25

But this has the questionable smell of using pack/unpack in a test of pack/unpack.  It's sorta OK for _users_ of pack/unpack though.

A quick scan of the Python source code shows that only two things try to pack 4 byte floats: struct and ctypes and both of these use the underlying Python float object routines.  So a better way of doing the truncation is to use ctypes:

>>> def fptrunc(value):
...   return c_float(value).value
... 
>>> fptrunc(6.24)
6.2399997711181641
>>> fptrunc(6.25)
6.25

Doing this allows you to write tests that work for any number and don't require the use of magic numbers or knowledge of the underlying floating point implementation.

Even if nothing gets put into the documentation, people will probably find this discussion by Googling.

I can't imagine there is much more that can be said about this, so I'll leave you guys alone now...  ;-)
History
Date User Action Args
2022-04-11 14:56:40adminsetgithub: 48364
2011-03-18 19:07:54Robert.Withrowsetnosy: loewis, rhettinger, mark.dickinson, vstinner, TD22057, Robert.Withrow
messages: + msg131356
2011-03-17 18:59:28Robert.Withrowsetnosy: loewis, rhettinger, mark.dickinson, vstinner, TD22057, Robert.Withrow
messages: + msg131278
2011-03-17 18:21:47rhettingersetnosy: + rhettinger
messages: + msg131276
2011-03-17 17:55:10Robert.Withrowsetnosy: loewis, mark.dickinson, vstinner, TD22057, Robert.Withrow
messages: + msg131273
2011-03-17 16:35:10loewissetnosy: loewis, mark.dickinson, vstinner, TD22057, Robert.Withrow
messages: + msg131266
2011-03-17 16:13:37Robert.Withrowsetnosy: loewis, mark.dickinson, vstinner, TD22057, Robert.Withrow
messages: + msg131264
2011-03-17 08:15:32mark.dickinsonsetnosy: loewis, mark.dickinson, vstinner, TD22057, Robert.Withrow
messages: + msg131235
2011-03-17 08:05:29mark.dickinsonsetnosy: loewis, mark.dickinson, vstinner, TD22057, Robert.Withrow
messages: + msg131234
2011-03-17 00:37:28loewissetnosy: loewis, mark.dickinson, vstinner, TD22057, Robert.Withrow
messages: + msg131203
2011-03-16 23:31:28r.david.murraysetnosy: + mark.dickinson
2011-03-16 23:25:16Robert.Withrowsetnosy: + Robert.Withrow
messages: + msg131195
2008-10-13 22:36:05loewissetmessages: + msg74708
2008-10-13 22:27:02vstinnersetmessages: + msg74705
2008-10-13 22:22:50loewissetmessages: + msg74704
2008-10-13 22:06:45vstinnersetmessages: + msg74702
2008-10-13 21:59:22TD22057setmessages: + msg74700
2008-10-13 21:45:58loewissetnosy: + loewis
messages: + msg74698
2008-10-13 21:21:01vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg74696
2008-10-13 21:13:32TD22057setmessages: + msg74695
2008-10-13 20:42:43vstinnersetmessages: + msg74694
2008-10-13 17:44:52TD22057setmessages: + msg74692
2008-10-13 17:17:46vstinnersetnosy: + vstinner
messages: + msg74691
2008-10-13 17:11:41TD22057create