classification
Title: Issues with reading large float values in AIFC files
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: mark.dickinson, serhiy.storchaka, tim.peters
Priority: normal Keywords: patch

Created on 2018-03-01 16:50 by serhiy.storchaka, last changed 2018-03-02 17:45 by mark.dickinson.

Pull Requests
URL Status Linked Edit
PR 5952 open serhiy.storchaka, 2018-03-01 16:53
Messages (10)
msg313098 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-01 16:50
The frequency rate is saved as a 80-bit floating point value in AIFC files. There are issues with reading large values.

1. Values with maximal exponent are read as aifc._HUGE_VAL which is less then sys.float_info.max. Thus greater values can be read as lesser values.

>>> aifc._read_float(io.BytesIO(b'\x7f\xff\xff\xff\xff\xff\xff\xff\xf8\x00'))
1.79769313486231e+308
>>> aifc._read_float(io.BytesIO(b'\x43\xfe\xff\xff\xff\xff\xff\xff\xf8\x00'))
1.7976931348623157e+308
>>> aifc._read_float(io.BytesIO(b'\x43\xfe\xff\xff\xff\xff\xff\xff\xff\xff'))
inf

2. If exponent is not maximal, but large enough, this cause an OverflowError. It would be better consistently return the maximal value or inf.

>>> aifc._read_float(io.BytesIO(b'\x44\xfe\xff\xff\xff\xff\xff\xff\xf8\x00'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython3.7/Lib/aifc.py", line 198, in _read_float
    f = (himant * 0x100000000 + lomant) * pow(2.0, expon - 63)
OverflowError: (34, 'Numerical result out of range')

OverflowError when read a broken aifc file can be unexpected.

The proposed PR tries to make reading floats more consistent. I'm not sure it is correct.
msg313102 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-03-01 18:12
I'm finding it hard to imagine why you'd ever have a sample rate greater than 1e308 in an audio file. (In fact, it's hard to imagine needing a sample rate greater than 1e6.) Raising an OverflowError for a value that's too large to fit in an IEEE 754 binary64 float seems entirely reasonable.
msg313107 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-01 19:01
I'm fine with an OverflowError, but it looks inconsistent to me that for some huge values an OverflowError is not raised, but aifc._HUGE_VAL is returned instead. And I think that using math.ldexp() can be more preferable that pow(2.0, ...), especially for exponent around sys.float_info.max_exp and sys.float_info.min_exp.
msg313136 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-03-02 15:49
> And I think that using math.ldexp() can be more preferable that pow(2.0, ...)

Yes, absolutely agreed. I'll take a look at the PR.
msg313141 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-02 17:14
Thank you for your review Mark.

If left the OverflowError propagated I would catch it at the caller place (_read_float() is used only once) and reraise as aifc.Error. OverflowError is not expected exception. It never is raised for valid AIFC files, and the probability of raising it rather of aifc.Error for a random binary file is very small. I suppose that most users of this module don't catch it. See also issue32056.

But on other side, if there are files with an exponent of 0x7fff in wild, they are currently opened without errors. Raising an exception can be a regression.
msg313143 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-03-02 17:21
> If left the OverflowError propagated I would catch it at the caller place (_read_float() is used only once) and reraise as aifc.Error.

Ah, if there's a dedicated exception type already then yes, agreed that raising aifc.Error (with a suitable message) makes much more sense than raising an OverflowError or a ValueError.

> if there are files with an exponent of 0x7fff in wild

I hope there aren't, and I'd consider any such file to be broken/corrupted. That is, unless there's a common practice of using NaNs or infinities for the frame rate. I suppose it's conceivable that people us this as a placeholder meaning "I don't know the sample rate". No idea whether that's true in practice: I'm not a frequent user of this format.
msg313144 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-02 17:27
What if return 80-bit infinities and NaN as 64-bit infinities and NaN and raise an error in the case of finite numbers overflow?
msg313145 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-03-02 17:28
> What if return 80-bit infinities and NaN as 64-bit infinities and NaN and raise an error in the case of finite numbers overflow?

That's certainly reasonable from a pure floating-point-conversion perspective: it's exactly what I'd expect a general purpose extended-precision to double-precision converter to do.
msg313147 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-02 17:31
But I don't know the exact format for infinities and NaNs.
msg313149 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-03-02 17:45
> But I don't know the exact format for infinities and NaNs.

From the Intel software developer manuals:

For infinities:

Sign bit: 0 or 1 (positive or negative infinity)
Exponent field (15 bits): all ones
Significand (64 bits): first bit 1, all remaining bits 0.

NaN:

Sign bit: 0 or 1
Exponent field (15 bits): all ones
Significand (64 bits): first bit 1, at least one of the remaining bits 1.
History
Date User Action Args
2018-03-02 17:45:16mark.dickinsonsetmessages: + msg313149
2018-03-02 17:31:32serhiy.storchakasetmessages: + msg313147
2018-03-02 17:28:41mark.dickinsonsetmessages: + msg313145
2018-03-02 17:27:31serhiy.storchakasetmessages: + msg313144
2018-03-02 17:21:53mark.dickinsonsetmessages: + msg313143
2018-03-02 17:14:29serhiy.storchakasetmessages: + msg313141
2018-03-02 15:49:12mark.dickinsonsetmessages: + msg313136
2018-03-01 19:01:43serhiy.storchakasetmessages: + msg313107
2018-03-01 18:12:46mark.dickinsonsetmessages: + msg313102
2018-03-01 16:53:30serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request5718
2018-03-01 16:50:35serhiy.storchakacreate