classification
Title: [EASY] sndhdr.what() throws exceptions on unknown files
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Barro, ammar2, davidawad, lys.nikolaou, vstinner
Priority: normal Keywords: patch

Created on 2018-07-10 20:50 by Barro, last changed 2018-07-28 12:49 by steve.dower.

Pull Requests
URL Status Linked Edit
PR 8319 open davidawad, 2018-07-17 18:12
Messages (8)
msg321396 - (view) Author: Jussi Judin (Barro) Date: 2018-07-10 20:50
sndhdr.what() function throws several types of exceptions on unknown files instead of returning None (as documentation says).

Following code can replicate these crashes:

```
import sndhdr
import sys

sndhdr.what(sys.argv[1])
```

First crash is from wave or chunk module (input data is base64 encoded in the echo command):

```
$ echo UklGRjAwMDBXQVZFZm10IDAwMDABADAwMDAwMDAwMDAwMDAw | python3.7 -mbase64 -d > in.file
$ python3.7 sndhdr/test.py in.file
Traceback (most recent call last):
  File "sndhdr/test.py", line 4, in <module>
    sndhdr.what(sys.argv[1])
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 54, in what
    res = whathdr(filename)
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 63, in whathdr
    res = tf(h, f)
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 163, in test_wav
    w = wave.open(f, 'r')
  File "/tmp/python-3.7-bin/lib/python3.7/wave.py", line 510, in open
    return Wave_read(f)
  File "/tmp/python-3.7-bin/lib/python3.7/wave.py", line 164, in __init__
    self.initfp(f)
  File "/tmp/python-3.7-bin/lib/python3.7/wave.py", line 153, in initfp
    chunk.skip()
  File "/tmp/python-3.7-bin/lib/python3.7/chunk.py", line 160, in skip
    self.file.seek(n, 1)
  File "/tmp/python-3.7-bin/lib/python3.7/chunk.py", line 113, in seek
    raise RuntimeError
RuntimeError
```

Second crash comes from sndhdr module itself (again base64 encoded data is first decoded on command line):

```
$ echo AAA= | python3.7 -mbase64 -d > in.file
$ python3.7 sndhdr/test.py in.fileTraceback (most recent call last):
  File "sndhdr/test.py", line 4, in <module>
    sndhdr.what(sys.argv[1])
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 54, in what
    res = whathdr(filename)
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 63, in whathdr
    res = tf(h, f)
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 192, in test_sndr
    rate = get_short_le(h[2:4])
  File "/tmp/python-3.7-bin/lib/python3.7/sndhdr.py", line 213, in get_short_le
    return (b[1] << 8) | b[0]
IndexError: index out of range
```
msg321457 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-11 14:32
I agree that sndhdr.what() should return None and not raise an exception if the file format is not supported or the file is invalid.
msg321543 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * Date: 2018-07-12 12:09
Will you make a PR or shall I try to fix this?
msg321545 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-12 12:19
If someone writes a PR, I can review it ;-)
msg321573 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * Date: 2018-07-12 18:20
Upon checking to see where the RuntimeError is coming from, I noticed that there is some notorious behaviour in chunk.py. If one runs python with the exact same parameters as Jussi Judin did in their first case and after stepping through the call stack until Chunk/skip is reached, there is a call to Chunk/seek. On the time of the call size_read has the value 16, whereas within the seek method it has the value 28. Is this intended? If so, where is that implemented? I cannot figure out why this is happening. 

The following is my pdb output.

> cpython/Lib/chunk.py(160)skip()
-> self.file.seek(n, 1)
(Pdb) p self.size_read
16
(Pdb) s
--Call--
> cpython/Lib/chunk.py(98)seek()
-> def seek(self, pos, whence=0):
(Pdb) p self.size_read
28
msg321738 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-16 13:22
Hey Lysandros.

Take a look at https://github.com/python/cpython/blob/master/Lib/wave.py#L126-L139

Notice how there's a Chunk made from the `file` argument but then another Chunk created using the previous chunk as a file.

While it might seem like self.size_read is magically changing between the steps, the thing to realize there is that self.file is another chunk and when you step, you're stepping into the seek of another Chunk instance, which has its own self.size_read
msg321849 - (view) Author: david awad (davidawad) * Date: 2018-07-17 18:18
Hey, I saw the issue and spent some time piecing together a PR for it. I've never contributed to Python before but this seemed like a good place to start!  

I posted these questions on the PR but I'll echo them here as well.

- Should we only handle the specific exceptions that are created in these two examples? or is there a cleaner, more pythonic way to handle it?

- It doesn't appear that test_sndhdr.py does any testing of bad inputs. Can I add test cases to the test_sndhdr.py file?

Thanks! I'm here to learn so if there's anything I missed please feel free to let me know.
msg321850 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-07-17 18:49
Hey David, thanks for your contribution, I've added some feedback in context of the code on Github.
History
Date User Action Args
2018-07-28 12:49:02steve.dowersetkeywords: - easy
2018-07-17 18:49:51ammar2setmessages: + msg321850
2018-07-17 18:18:03davidawadsetnosy: + davidawad
messages: + msg321849
2018-07-17 18:12:42davidawadsetkeywords: + patch
stage: patch review
pull_requests: + pull_request7854
2018-07-16 13:22:27ammar2setnosy: + ammar2
messages: + msg321738
2018-07-12 18:20:48lys.nikolaousetmessages: + msg321573
2018-07-12 12:19:55vstinnersetmessages: + msg321545
2018-07-12 12:09:08lys.nikolaousetnosy: + lys.nikolaou
messages: + msg321543
2018-07-11 14:32:01vstinnersettitle: sndhdr.what() throws exceptions on unknown files -> [EASY] sndhdr.what() throws exceptions on unknown files
nosy: + vstinner

messages: + msg321457

keywords: + easy
2018-07-10 20:50:06Barrocreate