classification
Title: wave.py: add writesamples() and readsamples()
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Joeboy, alex_python_org, gpolo, r.david.murray, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2009-01-11 02:16 by alex_python_org, last changed 2014-05-13 11:14 by serhiy.storchaka.

Files
File name Uploaded Description Edit
wave_futz.zip alex_python_org, 2009-01-12 14:37 wave.py read/write-samples logic and test program.
wave_futz.zip alex_python_org, 2009-02-08 21:22 Code that could be added to wave.py. And amended test_wave.py.
wave_futz.py alex_python_org, 2010-08-09 07:21 patches and such-like for Python's wave.py
test_wave.py alex_python_org, 2010-08-09 07:23 Modified test for wave.py.
Messages (27)
msg79586 - (view) Author: Alex Robinson (alex_python_org) Date: 2009-01-11 02:16
Corrected code in writeframesraw():

            self._datawritten = self._datawritten + len(data) *
self._sampwidth
        else:
            self._file.write(data)
            self._datawritten = self._datawritten + len(data) *
self._sampwidth


Note that the default (not byte swapped) assignment to _datawritten must
also be multiplied by _sampwidth. If not, audio programs will ignore the
second half of a 16-bit-sample file.

As a side note, the calls to _patchheader() do not need to be protected
by this "if" statement:

        if self._datalength != self._datawritten:

_patchheader does the same test to optimize its operation.
msg79616 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-01-11 19:14
Wave_read.initfp also needs fixing on counting the frame number, correct
me if its wrong.

Patch added.
msg79617 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-01-11 19:17
Oops, _framesize already takes sampwidth into account. So there is a
problem somewhere else, since reading the wave file is returning the
number of frames multiplied by the sampwidth.
msg79618 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-01-11 19:39
Given the name of the function related to the problem: "writeframesraw",
it seems to be more correct to remove the sampwidth multiplication from
the other case (not add it in the other one), since you must already
pass the data multiplied by it.

Does that make sense to you Alex ?
msg79619 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-01-11 19:47
Ah, yes :) But in the other case (the one where it is currently
multiplied) the multiplication happens because data is formatted to
either bytes, shorts or longs, so without the multiplication data length
would end up being divided by 1, 2 or 4.

So, besides the extras "if" statements all is good.
msg79643 - (view) Author: Alex Robinson (alex_python_org) Date: 2009-01-12 04:07
Oh golly. I was confused. For some reason I was thinking
"writesamples()" when using "writeframes()".

So the current code reads ok. Which makes this "bug" a request for
writesamples() and readsamples() to be added to wave.py. They would
shield sleep deprived saps from the .wav file data frame format. :)


Here are python2.4-ish versions written for outside wave.py. Combos of 8
and 16 bit samples, mone and stereo, are tested. I did not test the
32-bit sample logic.

Sample values are expected to be +-32767 or +-128 ints (or +-2.x gig if
32-bit).


def readsamples(wf, nframes) :
    """ Read an array of number-of-channels normalized int sample
arrays. """

    wav = wf.readframes(nframes)

    if    wf.getsampwidth() == 4 :
        wav = struct.unpack("<%ul" % (len(wav) / 4), wav)
    elif  wf.getsampwidth() == 2 :
        wav = struct.unpack("<%uh" % (len(wav) / 2), wav)
    else :
        wav = struct.unpack("%uB"  %  len(wav),      wav)
        wav = [ s - 128 for s in wav ]

    nc  = wf.getnchannels()
    if  nc > 1  :
        wavs    = []
        for i in xrange(nc) :
            wavs.append([ wav[si] for si in xrange(0, len(wav), nc) ])
        pass
    else :
        wavs    = [ wav ]

    return(wavs)



def writesamples(wf, wavs) :
    """
        Write samples to the wave file.
        'wavs' looks like this:
               [ left_channel_samples,  right_channel_samples ]
            or [ left_channel_samples                         ]
            or   mono_samples
        This routine calls setnchannels() from information about 'wavs'
length.
    """

    if  wavs :
        if  len(wavs) not in [ 1, 2, 4 ] :
            wavs    = [ wavs, wv ]

        wf.setnchannels(len(wavs))

        if  len(wavs)   > 1 :
            wav         = []
            for w in zip(*wavs):
                wav    += w
            pass
        else :
            wav         = wavs[0]

        if    wf.getsampwidth() == 4 :
            ws  = array.array('l', [ s       for s in wav ])
        elif  wf.getsampwidth() == 2 :
            ws  = array.array('h', [ s       for s in wav ])
        else :
            ws  = array.array('B', [ s + 128 for s in wav ])

        ws  = ws.tostring()

        wf.writeframes(ws)

    pass

# end of code to edit and insert in wave.py
msg79644 - (view) Author: Alex Robinson (alex_python_org) Date: 2009-01-12 04:36
Oh gob. I left a debug artifact in that code.

            wavs    = [ wavs, wv ]

needs to be without the 'wv'.
msg79660 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-01-12 10:36
Documentation, tests and patch against trunk are needed to get this into
Python, but to me the request is fine.
msg79668 - (view) Author: Alex Robinson (alex_python_org) Date: 2009-01-12 11:32
I might be able to do doc/test/patch in a month or two, but know
zero.zero about the process so would expect it to take far more than a
few hours when I do have time.
msg79669 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-01-12 12:02
I could do it, but I'm in disagreement with big part of your patch.

Can you add some kind of test you used for it ? Raw data, sample file,
or something like this.
msg79678 - (view) Author: Alex Robinson (alex_python_org) Date: 2009-01-12 14:37
Polo: "I could do it, but I'm in disagreement with big part of your patch."

Why surely you can't mean the bug. :) (The test program has it fixed.)

What is the disagreement?

Apparently this bug system allows file attachments, so I will upload a
test program and wave file.

The program is hard coded to read the wave file and write a bunch of
wave files, the names of which describe what they sound like.
msg79681 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-01-12 15:34
Aren't 8 bit samples stored as unsigned bytes ? If yes, they don't range
between -128 and 127 (first disagreement). So this line: wav = [ s - 128
for s in wav ] and the respective one (that adds +128 in writesamples)
should go.

Why is this check: "if len(wavs) not in [ 1, 2, 4 ]" needed ?

Calling setnchannels inside writesamples looks very wrong to me, weren't
you going to writesamples ? Then why is it modified the number of
channels ? The caller should be responsible for calling setnchannels,
besides, what is the use of calling setnchannels here ?

I see writesamples is expecting "wavs" to be a list of lists containing
integers, is that the best format to expect ? writeframes works with
strings (which are actually byte strings).

The code layout didn't help me to get in agreement with it either.

The above paragraphs are the things I disagree with the patch, hopefully
you can help on those questions. Also, it would be better to hand write
the wave file for testing so we can be sure about its content without
needing much analysis.
msg79686 - (view) Author: Alex Robinson (alex_python_org) Date: 2009-01-12 17:18
"8 bit samples stored as unsigned bytes"?
8 bit samples are 0..255 in the file. But to work with them, you'll want
them -128..127. The code assumes DC==0 sample values for simplicity.

"if len(wavs) not in [ 1, 2, 4 ]" ?
That way if you're working with mono, you can simply pass your samples
down to writesamples() without having to remember to "[ samples ]" them.
If you forget, no big deal. It's too bad that readsamples() can't know
that you want only mono samples. That would make mono work simpler.
Anyway, I don't argue very strongly for this spiff. In some ways it's
worse to be there. After all, the caller may be writing 1 frame at a
time, though I don't think that such logic would work. And would be
pretty slow, too, probably.

"Calling setnchannels"?
Since the number of channels *must* be the number of sample arrays
passed to writesamples(), either writesamples() must rely on the caller
already having gratuitously set the number of channels (correctly), or
writesamples() can simply force the issue. If the caller set the number
of channels wrongly, then the output file will be corrupt or
writesamples() would need to raise an exception. Both just make work for
the caller. get/set_nchannels() are not particularly useful if you are
using the read/write_sample API. If there were no ..frame() API,
getnchannels() might still be handy to use to find out how many channels
a wave file being read has before any samples have been read. But that's
about it.

"integers, is that the best format"?
Far bettern than the byte stream form which is useless, confusing, error
prone, and exposes the internal wave file format to the caller who could
generally care less how a wave file stores the samples. But, you bring
up a very good point, I think. I forgot to int(s) the samples when they
are put in to the arrays. The reason for an int() call on all the
samples is so that the caller can deal with samples as floats. (Which is
how he will want to deal with them if he's doing anything interesting.)
So, this:
            ws  = array.array('l', [ int(s)       for s in wav ])
            ws  = array.array('h', [ int(s)       for s in wav ])
            ws  = array.array('B', [ int(s + 128) for s in wav ])
And, for testing, in normalize_samples():
        samps   = [ s * mxm for s in samps ]
So normalize_samples() always sends floats to writesamples().


"The code layout"?
:) Well. Whatever. I know that the "official" python thing is to push
colons left. I don't like that. I've experimented in the last few years
with doing a lot of vertical alignment. Over time, I've found that it is
a great way to do things. It's pretty amazing how much easier it is to
scan and read code that has, as a start, the "="'s vertically aligned.
And, over the last few years, I've put more and more blank lines in.
Vertically compressed code tends to look like assembler language. I edit
with 200+ character wide screen so, since I stopped forcing everything
to fit on TTY, text mode CGA, or punch cards, I've lost a taste for
narrow code. In fact, I personally have a real hard time reading
line-broken code. That said, multi-lining "if" statements in ways that
allow delete/insert of lines, 1 for each "operator ()" expression, can
be very nice. Calls to routines with a gob of parameters, each on a
separate line, can sure be a good way to deal with a bad thing (routines
that take a lot of params are bad, that is). Etc. Anyway, I assumed that
the code would be reformatted by whomever maintains wave.py. No biggy.

"hand write the wave file for testing"?
Good point! Allows a test to do things like odd numbers of frames, 1
frame, max'ed out sample values, long runs of silence, DC offsets, etc.
Do you know whether there are already test files for wave.py? They'd
have those sorts of things in them. Hmmm. It's odd that wave.py doesn't
run from the command line and dump the header or something, at least.
Maybe do some simple conversions (8/16, mono/stereo switches, reverses
... that sort of thing). Would be handy.


This code is not tested on a big-endian machine. I ran it under XP
(py2.4) and Ubuntu64 (py2.5) and all the output files CRC the same on
both PCs.
msg79691 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-01-12 18:28
1) wave.py doesn't do assumptions about what the user wants, so I don't
think it is the place to put the DC (0 hz) assumption.

3) writesamples would raise an exception in the case of the current
number of channels set being wrong.

4) Well, lets fix a format then. I said list of lists of integers, or it
could use generators, and you didn't disagree here so it seems to be
fine. The problem in the current code is that you are making mono
channels special by being the one where a list of lists of integers is
not returned, but instead a single is returned. This is troublesome for
the caller to set the number of channels then, it is also a different
format then when something with 2 channels or other configuration is
used. With that in mind I have simplified some of your code as this:


def readsamples(self, nframes) :
    """Return a list of lists of integers.

    The number of these inner lists is given by the number of channels in
    the wave file. Each list contains the channel samples formatted as
    integers.
    """
    wav = self.readframes(nframes)

    sampwidth = self.getsampwidth()
    wav = struct.unpack(
            '<%d%s' % (len(wav) / sampwidth, wave._array_fmts[sampwidth]),
            wav)

    nc = self.getnchannels()
    if nc > 1:
        wavs = []
        for c in xrange(nc):
            wavs.append([wav[si] for si in xrange(c, len(wav), nc)])
    else:
        wavs = [[wav]]

    return wavs

def writesamples(self, *wavs) :
    """Write samples to the wave file.

    wavs must follow the structure returned by readsamples.
    """
    if self.getnchannels() != len(wavs):
        raise wave.Error("# of channels != # of samples")

    wav = []
    for w in zip(*wavs):
        wav.extend(w)

    ws = array.array(wave._array_fmts[self.getsampwidth()], wav)
    ws = ws.tostring()

    # we want all the samples in writeframes() format so that _convert
    # can be called on them
    self.writeframes(ws)


You can monkey patch wave then by doing:

wave.Wave_write.writesamples = writesamples
wave.Wave_read.readsamples = readsamples

And then change some other parts of your code.

5) There is a very small test for wave in Python's source,
Lib/test/test_wave.py
msg79692 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-01-12 18:39
I was going to reply about your "code layout" answer but forgot. Well,
each one has their preferences so I'm not going to question yours.

The only problem is that there is no maintainer for wave.py, so, the
more you follow the rules for Python code (or at least code that gets
included in Python) the greater are the chances for them getting included.
msg79714 - (view) Author: Alex Robinson (alex_python_org) Date: 2009-01-13 02:20
"DC (0 hz) assumption"?
wave.py makes the assumption that what the user wants is whatever
happens to be in the file, however arbitrary. (That 8 bit samples are
unsigned bytes is probably an artifact of early ADC logic. Typically you
got an absolute, n-bit value from an old ADC. Newer chips often return
signed values.) It's very unlikely that anything but a copy program
would try to work with unsigned char samples. Too many things to go
wrong. Too much confusion. Zero means zero in most of the world, in and
out of audio processing. :)  That said, not having to offset the 8 bit
samples sparsifies the read/write_sample code. But, I'm thinking that
that's at the expense of every program that uses it. When in doubt, I
figure, do what is more useful. Don't force the caller to write a
wrapper if he'll need to do it 99% of the time. But this is not a
religious thing with me. A wrapper can be written. And, in fact, I'd
sure think it would be nice to include wrappers like auto-scaling and
auto-zeroing in wave.py. But maybe not, as these ops probably belong in
some array.py type module. Anyway, a non-audio guy who just wants to
read a wave file, diddle with it, and write it out. Or who just wants to
generate some sound and write it out. Or who just wants to read a wave
file and graph it or something. All of these guys will be stunned when
they find out to their hours-of-work chagrin that wave files' 8 bit
samples are not signed chars. And, if I were one of them, I'd be plenty
peeved after having to spend all that time learning about some
historical artifact just to read an danged audio file, for gosh sakes.
But not putting the 8-bit offset in the read/write_samples logic does
eliminate 2 lines of code in each routine.

"writesamples would raise an exception"
Yep, taste. I'm inclined to find this irritating and I don't like being
irritated by packages I use. Makes for a poor out-of-box experience.
But, taste. :)

"4) Well, lets fix a format then. else: wavs = [[wav]]" ?
That's an extra [] I think. [[samples]] would be an array of array of an
array of samples. s = [1,2]; print [ [ a ] ];    [[[1,2]]]
On reflection, I'd say I agree with you more than I do with me on the
ability of writesamples() to take a simple array of mono samples. Not a
good thing to do.

"wavs.extend(wav)" ?
I had to look up extend() and try it in the Python shell! :) To each his
own. But when I found out that one could do list+=added_list in Python I
never looked back. Intuitive. I special-cased mono for speed purposes.
No reason to do the +=/extend for mono samples. But, maybe the
interpretor handles all that. Don't know. Didn't measure it.

"monkey patch"?'
Wonderful! This makes your rewrite of the code *so* much cleaner. Thanks
for the tip!

"code layout"?
Har, har. Yep, no one in software has ever spent any time "discussing"
code layout before. Let's do it for the first time in history.

"test_wave.py"?
Oooo. Bit minimal, that. Yeah, I think a couple of things could be
fleshed out there.

Gotta run now. But will try to update the code in wave_futz later. Other
things on plate, though.

Guilherme, I really appreciate your handling this and your guidance. Thanks!
msg79716 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2009-01-13 03:02
> "DC (0 hz) assumption"?
> wave.py makes the assumption that what the user wants is whatever
> happens to be in the file, however arbitrary. (That 8 bit samples are
> unsigned bytes is probably an artifact of early ADC logic. Typically you
> got an absolute, n-bit value from an old ADC. Newer chips often return
> signed values.) It's very unlikely that anything but a copy program
> would try to work with unsigned char samples. Too many things to go
> wrong. Too much confusion. Zero means zero in most of the world, in and
> out of audio processing. :)

Every document/text I've found so far talks about 8 bit samples being
unsigned, it is not like I'm trying to enforce it just because I want,
just following the specification.
But, SDL for example, accepts wav files with unsigned 8 bits, signed 8
bits, unsigned 16 bits, signed 16 bits, and with different byte
orders, so apparently different libraries write different wave files.
I wonder which of these would be good to get included in wave.py, but
for this current issue I would prefer to not even touch this another
problem and stick with what is the documented for the wave format.

> .
> .
> But not putting the 8-bit offset in the read/write_samples logic does
> eliminate 2 lines of code in each routine.

It is good that it is just two lines, it means they could be added
back (and adapted) when we start supporting different output/input
formats for wave.

> "4) Well, lets fix a format then. else: wavs = [[wav]]" ?
> That's an extra [] I think.

Right.

> "wavs.extend(wav)" ?
> I had to look up extend() and try it in the Python shell! :) To each his
> own. But when I found out that one could do list+=added_list in Python I
> never looked back. Intuitive. I special-cased mono for speed purposes.
> No reason to do the +=/extend for mono samples. But, maybe the
> interpretor handles all that.

It won't just do this optimization, keeping the special case for mono is fine.

> "test_wave.py"?
> Oooo. Bit minimal, that. Yeah, I think a couple of things could be
> fleshed out there.

I'm waiting for the new hand written tests now :)
msg81419 - (view) Author: Alex Robinson (alex_python_org) Date: 2009-02-08 21:22
I'll upload the latest monkey-patch file, wave_futz.py, and
test_wave.py, which has a gob of tests added to it.

I found a 64-bit bug in the wave.py formats for 32-bit sample wave files.

The pcm files read in to CoolEdit ok, including the 32-bit sample files.
msg113363 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-09 03:17
Please upload plain-text files with unique names for each file uploaded.
msg113393 - (view) Author: Alex Robinson (alex_python_org) Date: 2010-08-09 07:21
Here go, Terry. Copies of the two files in the latest ZIP file.

Hmmm. Well. Maybe just one of 'em. Looks like the only way to upload files is to add a msg, so I'll upload the other file in another msg.
msg113395 - (view) Author: Alex Robinson (alex_python_org) Date: 2010-08-09 07:23
OK, here's the other.
msg217034 - (view) Author: Joe Button (Joeboy) Date: 2014-04-22 21:59
Forgive my unfamiliarity with python's development process, but, what is happening with this? Is there any chance of this enhancement making it into the python libs? What would need to happen?

Thanks.
msg217036 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-04-22 22:17
Someone has to find the time to do a commit review on the patch.  As Guilherme said, there's no specific maintainer for wave, so I'm afraid it just got forgotten about.  On the other hand, as a new feature it would now go in 3.5, and we're at the start of the approximately one year window for new features, so if you ping this issue (as you just did) periodically, someone will get to it ;)

What you could do to help move it along is to do your own review of the patch, including making sure it still applies to default...which it may not, since there have in fact been some changes in wave.py.  If that's the case you can also help by updating the patch.
msg217042 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-04-22 23:04
Serhiy, is this something you can review?
msg217045 - (view) Author: Joe Button (Joeboy) Date: 2014-04-22 23:45
On quickly looking at this, the immediate issue seems to me to be that there is no patch, as I understand the term. If it would be helpful I can look at turning the code in the attached files into a patch against default and ensure the tests pass (but not right now as it's ~1am here).
msg217053 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-04-23 05:20
A patch against default, including a test, would be helpful.
msg218434 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-05-13 11:14
I hope all mentioned bugs were already fixed in the wave module.

As for new writesamples() and readsamples() methods, perhaps it would be better to add utility functions in the audioop module for packing/unpacking integers. In any case a user can use array.array.
History
Date User Action Args
2014-05-13 11:14:33serhiy.storchakasetmessages: + msg218434
2014-04-23 05:20:28terry.reedysetmessages: + msg217053
2014-04-22 23:45:52Joeboysetmessages: + msg217045
2014-04-22 23:04:03terry.reedysetnosy: + serhiy.storchaka
messages: + msg217042
2014-04-22 22:17:35r.david.murraysetversions: + Python 3.5, - Python 3.2
nosy: + r.david.murray

messages: + msg217036

stage: test needed -> patch review
2014-04-22 21:59:12Joeboysetnosy: + Joeboy
messages: + msg217034
2010-08-09 07:23:08alex_python_orgsetfiles: + test_wave.py

messages: + msg113395
2010-08-09 07:22:02alex_python_orgsetfiles: + wave_futz.py

messages: + msg113393
2010-08-09 03:17:33terry.reedysetversions: + Python 3.2, - Python 3.1, Python 2.7
nosy: + terry.reedy

messages: + msg113363

stage: test needed
2009-02-08 21:22:45alex_python_orgsetfiles: + wave_futz.zip
messages: + msg81419
2009-01-13 03:02:52gpolosetmessages: + msg79716
2009-01-13 02:20:26alex_python_orgsetmessages: + msg79714
2009-01-12 18:39:22gpolosetmessages: + msg79692
2009-01-12 18:28:17gpolosetmessages: + msg79691
2009-01-12 17:18:47alex_python_orgsetmessages: + msg79686
2009-01-12 15:34:31gpolosetmessages: + msg79681
2009-01-12 14:37:34alex_python_orgsetfiles: + wave_futz.zip
messages: + msg79678
2009-01-12 12:02:15gpolosetmessages: + msg79669
2009-01-12 11:32:55alex_python_orgsetmessages: + msg79668
2009-01-12 10:43:30gpolosetversions: - Python 2.6, Python 3.0
2009-01-12 10:36:04gpolosettitle: wave.py writes 16 bit sample files of half the correct duration -> wave.py: add writesamples() and readsamples()
messages: + msg79660
versions: - Python 2.5, Python 2.4
2009-01-12 04:36:41alex_python_orgsetmessages: + msg79644
2009-01-12 04:07:19alex_python_orgsettype: behavior -> enhancement
messages: + msg79643
2009-01-11 19:47:58gpolosetmessages: + msg79619
2009-01-11 19:39:29gpolosetmessages: + msg79618
2009-01-11 19:17:34gpolosetfiles: - issue_4913.diff
2009-01-11 19:17:28gpolosetmessages: + msg79617
2009-01-11 19:15:00gpolosetfiles: + issue_4913.diff
keywords: + patch
messages: + msg79616
nosy: + gpolo
2009-01-11 02:16:42alex_python_orgcreate