classification
Title: wave.py cannot write wave files into a shell pipeline
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: 18919 Superseder:
Assigned To: serhiy.storchaka Nosy List: drj, gpolo, python-dev, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2009-02-10 11:45 by drj, last changed 2014-03-08 17:54 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
wave-20090210.patch drj, 2009-02-10 11:51 wave.py patch
writeheader_without_tell.diff gpolo, 2009-02-10 16:56 review
wave_write_unseekable.patch serhiy.storchaka, 2013-09-06 21:40 review
wave_write_unseekable_2.patch serhiy.storchaka, 2013-11-11 18:16 review
Messages (20)
msg81539 - (view) Author: David Jones (drj) Date: 2009-02-10 11:44
When using the wave module to output wave files, the output file cannot 
be a Unix pipeline.

Example.  The following program outputs a (trivial) wave file on stdout:

#!/usr/bin/env python
import sys
import wave
w = wave.open(sys.stdout, 'w')
w.setnchannels(1)
w.setsampwidth(1)
w.setframerate(32000)
w.setnframes(0)
w.close()


It can create a wave file like this:

$ ./bugex > foo.wav

When used in a pipeline we get:

$ ./bugex | wc
Traceback (most recent call last):
  File "./bugex", line 9, in <module>
    w.close()
  File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/wave.py
", line 437, in close
    self._ensure_header_written(0)
  File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/wave.py
", line 458, in _ensure_header_written
    self._write_header(datasize)
  File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/wave.py
", line 465, in _write_header
    self._form_length_pos = self._file.tell()
IOError: [Errno 29] Illegal seek
Exception exceptions.IOError: (29, 'Illegal seek') in <bound method 
Wave_write.__del__ of <wave.Wave_write instance at 0x71418>> ignored
       0       1       8

The wave module has almost all it needs to work around this problem.  
The wave module will only seek the output if it needs to patch the 
header.  If you use setnframes to write the correct number of frames 
before writing them with writeframesraw then the header will not be 
patched upon calling close.  However...

The problem is that the "tell" method is invoked on the output stream 
(to record where the header is, in the event that we need to patch it); 
the "tell" method fails with an exception when the output is a pipeline 
(see example, above).

Exceptions from "tell" when writing the header initially (in 
_write_header) should be ignored.  If _patchheader is later invoked it 
will fail due to lack of pos.
msg81541 - (view) Author: David Jones (drj) Date: 2009-02-10 11:51
Attached is a patch which is a diff from this version of wave.py :

http://svn.python.org/view/*checkout*/python/trunk/Lib/wave.py?rev=54394
msg81542 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-02-10 12:28
Wouldn't it be better if you only ignored the 'illegal seek' error
instead of ignoring any ioerror (should it even be always discarded) ? I
get a 'bad file descriptor' under Windows 7, but, again, can it be
always discarded ? 

You can also reproduce the problem without using wave:

>>> import sys
>>> sys.stdout.tell()

I'm really unsure about the proposed patch.
msg81546 - (view) Author: David Jones (drj) Date: 2009-02-10 13:02
On 10 Feb 2009, at 12:28, Guilherme Polo wrote:

>
> Guilherme Polo <ggpolo@gmail.com> added the comment:
>
> Wouldn't it be better if you only ignored the 'illegal seek' error
> instead of ignoring any ioerror (should it even be always discarded) ?

No.

> I
> get a 'bad file descriptor' under Windows 7, but, again, can it be
> always discarded ?

Yes.

To expand: Observe that the exception is raised when we are writing  
the header for the first time.  The exception is not raised when we  
attempt to seek to patch the header, it is raised when we recording  
the file position so that we can seek to it later.

We record the file position even though we might not use it later  
(the file position is only needed if we need to patch the header).

So if we don't need to patch the header, we do not need the file  
position.  So we can clearly ignore any error in attempting to get  
the file position.

If we do need to patch the header, then we need the file position.   
If we do not have the file position (because the earlier attempt to  
get it failed), then patching the header will fail when it attempts a  
seek.  This seems reasonable to me.

>
> You can also reproduce the problem without using wave:
>
>>>> import sys
>>>> sys.stdout.tell()

That does not reproduce the problem.  The problem is not that tell  
raises an exception, the problem is that tell raises an exception and  
it is only being used to get some information that may be not needed  
later.  Therefore the exception should be ignored, and a problem  
should only be raised if it turns out that we did need for  
information that we couldn't get.

>
> I'm really unsure about the proposed patch.

Noted.

I also note that my patch can be improved by removing its last 11 lines.
msg81547 - (view) Author: David Jones (drj) Date: 2009-02-10 13:09
On 10 Feb 2009, at 12:28, Guilherme Polo wrote:

>
> Guilherme Polo <ggpolo@gmail.com> added the comment:
>
> I'm really unsure about the proposed patch.

Perhaps my example was too trivial.  The point is that if you call  
setnframes then you can get wave.py to avoid patching the header; so  
it does not need to seek on the output file.  However, that _still_  
doesn't let you pipe the output, because of the "tell" problem.   
That's what the patch is for.

Here is a (slightly) less trivial example:

#!/usr/bin/env python
import sys
import wave
w = wave.open(sys.stdout, 'w')
w.setnchannels(1)
w.setsampwidth(1)
w.setframerate(2000)
w.setnframes(100)
for _ in range(50): w.writeframesraw('\x00\xff')
w.close()

(The wave file that it outputs is 100ms of 1000 Hz sine wave by the way)

Note the call to setnframes _before_ the data is written.  That's  
what means the header does not need to be patched.  With my patch  
applied the output of this program can be fed to a pipe.

If you remove the call to setnframes then the header will need to be  
patched, and this still (correctly, usefully) raises an error with my  
patch applied.
msg81553 - (view) Author: David Jones (drj) Date: 2009-02-10 14:36
On 10 Feb 2009, at 13:02, David Jones wrote:
>
> I also note that my patch can be improved by removing its last 11  
> lines.

Er, no it can't.  What was I thinking?
msg81563 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-02-10 16:56
I see what you want to do, but I fell really uncomfortable by totally
ignoring IOError. I could get a bad file descriptor under Linux too, and
I wouldn't like to see it discarded for no reason.

Now, is there some problem if we remove the calls to the "tell" method
in _write_header ? See patch attached (tests are very welcome too).
msg81601 - (view) Author: David Jones (drj) Date: 2009-02-10 21:15
On 10 Feb 2009, at 16:57, Guilherme Polo wrote:

>
> Guilherme Polo <ggpolo@gmail.com> added the comment:
>
> Now, is there some problem if we remove the calls to the "tell" method
> in _write_header ? See patch attached (tests are very welcome too).

Yes
msg81665 - (view) Author: David Jones (drj) Date: 2009-02-11 20:34
On 10 Feb 2009, at 21:15, David Jones wrote:

>
> David Jones <drj@pobox.com> added the comment:
>
> On 10 Feb 2009, at 16:57, Guilherme Polo wrote:
>
>>
>> Guilherme Polo <ggpolo@gmail.com> added the comment:
>>
>> Now, is there some problem if we remove the calls to the "tell"  
>> method
>> in _write_header ? See patch attached (tests are very welcome too).
>
> Yes

Ahem.  Pardon me for answering you without reading your patch.  I  
have now read your patch and it does more than just remove the calls  
to "tell".  In fact it looks very fine.

It makes wave.py more like sunau.py in that it "just knows" what the  
offsets into the header are.  I think I like that (especially the way  
you use the struct format string to compute the second offset).  It  
also removes that nagging question at the back of my mind: "why does  
wave.py use tell when it could simply just know the offsets, which  
are constant anyway?".

And it works.  How cool is that?  I had changed my project to use  
sunau anyway, because that worked with pipes already.

Tests, you say...
msg81676 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-02-11 21:36
Nice.

I said tests in hope wave gets more tests, since right one there is a
single one. I will see if I can produce something.
msg81735 - (view) Author: David Jones (drj) Date: 2009-02-12 09:00
The following program does a very basic do-i-get-back-what-i-wrote test.  
sunau can't cope; I am investigating.

#!/usr/bin/env python
# $Id$
# Audio File Tests

import aifc
import sunau
import wave

import struct
import sys
from StringIO import StringIO

frames = struct.pack('256B', *range(256))
log = sys.stderr

# Basic test of reproducability.
# We test that a set of frames (an entirely artifical set, see `frames`, 
# above) can be written to an audio file and read back again to get the 
# same set of frames.
# We test mono/stereo, 8-bit/16-bit, and a few framerates.    
# As of 2009-02-12 sunau does not pass these tests, so I recommend that 
# you remove it.
for af in (aifc, sunau, wave):
  for nchannels in (1, 2):
    for sampwidth in (1, 2):
      for framerate in (11000, 44100, 96000):
        print >> log, "%s %d/%d/%d" % (af.__name__,
          nchannels, sampwidth, framerate)
        f = StringIO()
        w = af.open(f, 'w')
        w.setnchannels(nchannels)
        w.setsampwidth(sampwidth)
        w.setframerate(framerate)
        w.writeframesraw(frames)
        w.close()
        s = f.getvalue()
        f = StringIO(s)
        w = af.open(f)
        assert w.getnchannels() == nchannels
        assert w.getsampwidth() == sampwidth
        assert w.getframerate() == framerate
        assert w.readframes(len(frames)//nchannels//sampwidth) == frames
        assert w.readframes(1) == ''
msg81903 - (view) Author: David Jones (drj) Date: 2009-02-13 10:45
On 12 Feb 2009, at 09:00, David Jones wrote:

>
> David Jones <drj@pobox.com> added the comment:
>
> The following program does a very basic do-i-get-back-what-i-wrote  
> test.
> sunau can't cope; I am investigating.

I see.  sunau uses mu-law compression by default which makes it non- 
invertable.  How stupid.  Inserting:

         w.setcomptype('NONE', 'Pointless Argument')

just after setframerate fixes the tests so that all 3 modules pass.

Of course, this is still only the most very basic test that one might  
want to do.  And it doesn't cover the case mentioned in this bug  
report anyway.

(drat, just found this, should've sent it yesterday)
msg112649 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-03 18:39
Is this still a problem with 2.7-3.2?
GP, what state do you think either patch is in?
msg182113 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-14 17:58
> Now, is there some problem if we remove the calls to the "tell" method
> in _write_header ? See patch attached (tests are very welcome too).

Yes, there is a problem. User can pass already open file to wave.open() and file position can be not 0 at the start of the WAVE file. But you can do with only one tell(). Note a magic number 36 in many places of the code. This is struct.calcsize(wave_header_format).

Test is needed as well as a documentation change.

I think this is rather a new feature and should be added only in 3.4. Actually the current behavior is documented: "If *file* is a string, open the file by that name, otherwise treat it as a seekable file-like object."
msg196848 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-03 14:47
Here is corrected patch (it uses relative seek()) with a lot of tests.
msg197110 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-06 21:39
Oh, I forgot attach a patch. In any case it already slightly outdated.

After looking at other audio modules I think David's approach is better. It is also used in the chunk module. Here is updated patch with tests (tests are not final, issue18919 provides better tests).
msg197111 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-06 21:40
Oh, I forgot attach a patch. In any case it already slightly outdated.

After looking at other audio modules I think David's approach is better. It is also used in the chunk module. Here is updated patch with tests (tests are not final, issue18919 provides better tests).
msg202636 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-11 18:16
Here is simplified and updated to tip patch.
msg203025 - (view) Author: Roundup Robot (python-dev) Date: 2013-11-16 11:04
New changeset 6a599249e8b7 by Serhiy Storchaka in branch 'default':
Issue #5202: Added support for unseekable files in the wave module.
http://hg.python.org/cpython/rev/6a599249e8b7
msg212938 - (view) Author: Roundup Robot (python-dev) Date: 2014-03-08 17:54
New changeset b861c7717c79 by R David Murray in branch 'default':
whatsnew: Wave_write handles unseekable files. (#5202)
http://hg.python.org/cpython/rev/b861c7717c79
History
Date User Action Args
2014-03-08 17:54:16python-devsetmessages: + msg212938
2013-11-16 11:57:36serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013-11-16 11:04:27python-devsetnosy: + python-dev
messages: + msg203025
2013-11-11 18:16:08serhiy.storchakasetfiles: + wave_write_unseekable_2.patch

messages: + msg202636
2013-09-06 21:40:45serhiy.storchakasetassignee: serhiy.storchaka
stage: needs patch -> patch review
2013-09-06 21:40:09serhiy.storchakasetfiles: + wave_write_unseekable.patch

messages: + msg197111
2013-09-06 21:39:23serhiy.storchakasetdependencies: + Unify audio modules tests
messages: + msg197110
2013-09-03 14:47:34serhiy.storchakasetmessages: + msg196848
2013-02-14 17:58:51serhiy.storchakasetversions: + Python 3.4, - Python 3.1, Python 2.7, Python 3.2
nosy: + serhiy.storchaka

messages: + msg182113

type: behavior -> enhancement
stage: patch review -> needs patch
2010-08-03 18:39:04terry.reedysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6, Python 2.5
nosy: + terry.reedy

messages: + msg112649

stage: patch review
2009-02-13 10:45:52drjsetmessages: + msg81903
2009-02-12 09:00:10drjsetmessages: + msg81735
2009-02-11 21:36:52gpolosetmessages: + msg81676
2009-02-11 20:34:15drjsetmessages: + msg81665
2009-02-10 21:15:50drjsetmessages: + msg81601
2009-02-10 16:57:00gpolosetfiles: + writeheader_without_tell.diff
messages: + msg81563
2009-02-10 14:36:30drjsetmessages: + msg81553
2009-02-10 13:09:29drjsetmessages: + msg81547
2009-02-10 13:02:16drjsetmessages: + msg81546
2009-02-10 12:28:41gpolosetnosy: + gpolo
messages: + msg81542
2009-02-10 11:51:51drjsetfiles: + wave-20090210.patch
keywords: + patch
messages: + msg81541
2009-02-10 11:45:00drjcreate