Issue 4913: wave.py: add writesamples() and readsamples()

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/49163

classification

Title:	wave.py: add writesamples() and readsamples()
Type:	enhancement	Stage:	patch review
Components:	Library (Lib)	Versions:	Python 3.5

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Joeboy, alex_python_org, gpolo, r.david.murray, serhiy.storchaka, terry.reedy
Priority:	normal	Keywords:	patch

Created on 2009-01-11 02:16 by alex_python_org, last changed 2022-04-11 14:56 by admin.

Files
File name	Uploaded	Description	Edit
wave_futz.zip	alex_python_org, 2009-01-12 14:37	wave.py read/write-samples logic and test program.
wave_futz.zip	alex_python_org, 2009-02-08 21:22	Code that could be added to wave.py. And amended test_wave.py.
wave_futz.py	alex_python_org, 2010-08-09 07:21	patches and such-like for Python's wave.py
test_wave.py	alex_python_org, 2010-08-09 07:23	Modified test for wave.py.

Messages (27)
msg79586 - (view)	Author: Alex Robinson (alex_python_org)	Date: 2009-01-11 02:16
Corrected code in writeframesraw(): self._datawritten = self._datawritten + len(data) * self._sampwidth else: self._file.write(data) self._datawritten = self._datawritten + len(data) * self._sampwidth Note that the default (not byte swapped) assignment to _datawritten must also be multiplied by _sampwidth. If not, audio programs will ignore the second half of a 16-bit-sample file. As a side note, the calls to _patchheader() do not need to be protected by this "if" statement: if self._datalength != self._datawritten: _patchheader does the same test to optimize its operation.
msg79616 - (view)	Author: Guilherme Polo (gpolo) *	Date: 2009-01-11 19:14
Wave_read.initfp also needs fixing on counting the frame number, correct me if its wrong. Patch added.
msg79617 - (view)	Author: Guilherme Polo (gpolo) *	Date: 2009-01-11 19:17
Oops, _framesize already takes sampwidth into account. So there is a problem somewhere else, since reading the wave file is returning the number of frames multiplied by the sampwidth.
msg79618 - (view)	Author: Guilherme Polo (gpolo) *	Date: 2009-01-11 19:39
Given the name of the function related to the problem: "writeframesraw", it seems to be more correct to remove the sampwidth multiplication from the other case (not add it in the other one), since you must already pass the data multiplied by it. Does that make sense to you Alex ?
msg79619 - (view)	Author: Guilherme Polo (gpolo) *	Date: 2009-01-11 19:47
Ah, yes :) But in the other case (the one where it is currently multiplied) the multiplication happens because data is formatted to either bytes, shorts or longs, so without the multiplication data length would end up being divided by 1, 2 or 4. So, besides the extras "if" statements all is good.
msg79643 - (view)	Author: Alex Robinson (alex_python_org)	Date: 2009-01-12 04:07
Oh golly. I was confused. For some reason I was thinking "writesamples()" when using "writeframes()". So the current code reads ok. Which makes this "bug" a request for writesamples() and readsamples() to be added to wave.py. They would shield sleep deprived saps from the .wav file data frame format. :) Here are python2.4-ish versions written for outside wave.py. Combos of 8 and 16 bit samples, mone and stereo, are tested. I did not test the 32-bit sample logic. Sample values are expected to be +-32767 or +-128 ints (or +-2.x gig if 32-bit). def readsamples(wf, nframes) : """ Read an array of number-of-channels normalized int sample arrays. """ wav = wf.readframes(nframes) if wf.getsampwidth() == 4 : wav = struct.unpack("<%ul" % (len(wav) / 4), wav) elif wf.getsampwidth() == 2 : wav = struct.unpack("<%uh" % (len(wav) / 2), wav) else : wav = struct.unpack("%uB" % len(wav), wav) wav = [ s - 128 for s in wav ] nc = wf.getnchannels() if nc > 1 : wavs = [] for i in xrange(nc) : wavs.append([ wav[si] for si in xrange(0, len(wav), nc) ]) pass else : wavs = [ wav ] return(wavs) def writesamples(wf, wavs) : """ Write samples to the wave file. 'wavs' looks like this: [ left_channel_samples, right_channel_samples ] or [ left_channel_samples ] or mono_samples This routine calls setnchannels() from information about 'wavs' length. """ if wavs : if len(wavs) not in [ 1, 2, 4 ] : wavs = [ wavs, wv ] wf.setnchannels(len(wavs)) if len(wavs) > 1 : wav = [] for w in zip(*wavs): wav += w pass else : wav = wavs[0] if wf.getsampwidth() == 4 : ws = array.array('l', [ s for s in wav ]) elif wf.getsampwidth() == 2 : ws = array.array('h', [ s for s in wav ]) else : ws = array.array('B', [ s + 128 for s in wav ]) ws = ws.tostring() wf.writeframes(ws) pass # end of code to edit and insert in wave.py
msg79644 - (view)	Author: Alex Robinson (alex_python_org)	Date: 2009-01-12 04:36
Oh gob. I left a debug artifact in that code. wavs = [ wavs, wv ] needs to be without the 'wv'.
msg79660 - (view)	Author: Guilherme Polo (gpolo) *	Date: 2009-01-12 10:36
Documentation, tests and patch against trunk are needed to get this into Python, but to me the request is fine.
msg79668 - (view)	Author: Alex Robinson (alex_python_org)	Date: 2009-01-12 11:32
I might be able to do doc/test/patch in a month or two, but know zero.zero about the process so would expect it to take far more than a few hours when I do have time.
msg79669 - (view)	Author: Guilherme Polo (gpolo) *	Date: 2009-01-12 12:02
I could do it, but I'm in disagreement with big part of your patch. Can you add some kind of test you used for it ? Raw data, sample file, or something like this.
msg79678 - (view)	Author: Alex Robinson (alex_python_org)	Date: 2009-01-12 14:37
Polo: "I could do it, but I'm in disagreement with big part of your patch." Why surely you can't mean the bug. :) (The test program has it fixed.) What is the disagreement? Apparently this bug system allows file attachments, so I will upload a test program and wave file. The program is hard coded to read the wave file and write a bunch of wave files, the names of which describe what they sound like.
msg79681 - (view)	Author: Guilherme Polo (gpolo) *	Date: 2009-01-12 15:34
Aren't 8 bit samples stored as unsigned bytes ? If yes, they don't range between -128 and 127 (first disagreement). So this line: wav = [ s - 128 for s in wav ] and the respective one (that adds +128 in writesamples) should go. Why is this check: "if len(wavs) not in [ 1, 2, 4 ]" needed ? Calling setnchannels inside writesamples looks very wrong to me, weren't you going to writesamples ? Then why is it modified the number of channels ? The caller should be responsible for calling setnchannels, besides, what is the use of calling setnchannels here ? I see writesamples is expecting "wavs" to be a list of lists containing integers, is that the best format to expect ? writeframes works with strings (which are actually byte strings). The code layout didn't help me to get in agreement with it either. The above paragraphs are the things I disagree with the patch, hopefully you can help on those questions. Also, it would be better to hand write the wave file for testing so we can be sure about its content without needing much analysis.
msg79686 - (view)	Author: Alex Robinson (alex_python_org)	Date: 2009-01-12 17:18
"8 bit samples stored as unsigned bytes"? 8 bit samples are 0..255 in the file. But to work with them, you'll want them -128..127. The code assumes DC==0 sample values for simplicity. "if len(wavs) not in [ 1, 2, 4 ]" ? That way if you're working with mono, you can simply pass your samples down to writesamples() without having to remember to "[ samples ]" them. If you forget, no big deal. It's too bad that readsamples() can't know that you want only mono samples. That would make mono work simpler. Anyway, I don't argue very strongly for this spiff. In some ways it's worse to be there. After all, the caller may be writing 1 frame at a time, though I don't think that such logic would work. And would be pretty slow, too, probably. "Calling setnchannels"? Since the number of channels must be the number of sample arrays passed to writesamples(), either writesamples() must rely on the caller already having gratuitously set the number of channels (correctly), or writesamples() can simply force the issue. If the caller set the number of channels wrongly, then the output file will be corrupt or writesamples() would need to raise an exception. Both just make work for the caller. get/set_nchannels() are not particularly useful if you are using the read/write_sample API. If there were no ..frame() API, getnchannels() might still be handy to use to find out how many channels a wave file being read has before any samples have been read. But that's about it. "integers, is that the best format"? Far bettern than the byte stream form which is useless, confusing, error prone, and exposes the internal wave file format to the caller who could generally care less how a wave file stores the samples. But, you bring up a very good point, I think. I forgot to int(s) the samples when they are put in to the arrays. The reason for an int() call on all the samples is so that the caller can deal with samples as floats. (Which is how he will want to deal with them if he's doing anything interesting.) So, this: ws = array.array('l', [ int(s) for s in wav ]) ws = array.array('h', [ int(s) for s in wav ]) ws = array.array('B', [ int(s + 128) for s in wav ]) And, for testing, in normalize_samples(): samps = [ s * mxm for s in samps ] So normalize_samples() always sends floats to writesamples(). "The code layout"? :) Well. Whatever. I know that the "official" python thing is to push colons left. I don't like that. I've experimented in the last few years with doing a lot of vertical alignment. Over time, I've found that it is a great way to do things. It's pretty amazing how much easier it is to scan and read code that has, as a start, the "="'s vertically aligned. And, over the last few years, I've put more and more blank lines in. Vertically compressed code tends to look like assembler language. I edit with 200+ character wide screen so, since I stopped forcing everything to fit on TTY, text mode CGA, or punch cards, I've lost a taste for narrow code. In fact, I personally have a real hard time reading line-broken code. That said, multi-lining "if" statements in ways that allow delete/insert of lines, 1 for each "operator ()" expression, can be very nice. Calls to routines with a gob of parameters, each on a separate line, can sure be a good way to deal with a bad thing (routines that take a lot of params are bad, that is). Etc. Anyway, I assumed that the code would be reformatted by whomever maintains wave.py. No biggy. "hand write the wave file for testing"? Good point! Allows a test to do things like odd numbers of frames, 1 frame, max'ed out sample values, long runs of silence, DC offsets, etc. Do you know whether there are already test files for wave.py? They'd have those sorts of things in them. Hmmm. It's odd that wave.py doesn't run from the command line and dump the header or something, at least. Maybe do some simple conversions (8/16, mono/stereo switches, reverses ... that sort of thing). Would be handy. This code is not tested on a big-endian machine. I ran it under XP (py2.4) and Ubuntu64 (py2.5) and all the output files CRC the same on both PCs.
msg79691 - (view)	Author: Guilherme Polo (gpolo) *	Date: 2009-01-12 18:28
1) wave.py doesn't do assumptions about what the user wants, so I don't think it is the place to put the DC (0 hz) assumption. 3) writesamples would raise an exception in the case of the current number of channels set being wrong. 4) Well, lets fix a format then. I said list of lists of integers, or it could use generators, and you didn't disagree here so it seems to be fine. The problem in the current code is that you are making mono channels special by being the one where a list of lists of integers is not returned, but instead a single is returned. This is troublesome for the caller to set the number of channels then, it is also a different format then when something with 2 channels or other configuration is used. With that in mind I have simplified some of your code as this: def readsamples(self, nframes) : """Return a list of lists of integers. The number of these inner lists is given by the number of channels in the wave file. Each list contains the channel samples formatted as integers. """ wav = self.readframes(nframes) sampwidth = self.getsampwidth() wav = struct.unpack( '<%d%s' % (len(wav) / sampwidth, wave._array_fmts[sampwidth]), wav) nc = self.getnchannels() if nc > 1: wavs = [] for c in xrange(nc): wavs.append([wav[si] for si in xrange(c, len(wav), nc)]) else: wavs = [[wav]] return wavs def writesamples(self, wavs) : """Write samples to the wave file. wavs must follow the structure returned by readsamples. """ if self.getnchannels() != len(wavs): raise wave.Error("# of channels != # of samples") wav = [] for w in zip(wavs): wav.extend(w) ws = array.array(wave._array_fmts[self.getsampwidth()], wav) ws = ws.tostring() # we want all the samples in writeframes() format so that _convert # can be called on them self.writeframes(ws) You can monkey patch wave then by doing: wave.Wave_write.writesamples = writesamples wave.Wave_read.readsamples = readsamples And then change some other parts of your code. 5) There is a very small test for wave in Python's source, Lib/test/test_wave.py
msg79692 - (view)	Author: Guilherme Polo (gpolo) *	Date: 2009-01-12 18:39
I was going to reply about your "code layout" answer but forgot. Well, each one has their preferences so I'm not going to question yours. The only problem is that there is no maintainer for wave.py, so, the more you follow the rules for Python code (or at least code that gets included in Python) the greater are the chances for them getting included.
msg79714 - (view)	Author: Alex Robinson (alex_python_org)	Date: 2009-01-13 02:20
"DC (0 hz) assumption"? wave.py makes the assumption that what the user wants is whatever happens to be in the file, however arbitrary. (That 8 bit samples are unsigned bytes is probably an artifact of early ADC logic. Typically you got an absolute, n-bit value from an old ADC. Newer chips often return signed values.) It's very unlikely that anything but a copy program would try to work with unsigned char samples. Too many things to go wrong. Too much confusion. Zero means zero in most of the world, in and out of audio processing. :) That said, not having to offset the 8 bit samples sparsifies the read/write_sample code. But, I'm thinking that that's at the expense of every program that uses it. When in doubt, I figure, do what is more useful. Don't force the caller to write a wrapper if he'll need to do it 99% of the time. But this is not a religious thing with me. A wrapper can be written. And, in fact, I'd sure think it would be nice to include wrappers like auto-scaling and auto-zeroing in wave.py. But maybe not, as these ops probably belong in some array.py type module. Anyway, a non-audio guy who just wants to read a wave file, diddle with it, and write it out. Or who just wants to generate some sound and write it out. Or who just wants to read a wave file and graph it or something. All of these guys will be stunned when they find out to their hours-of-work chagrin that wave files' 8 bit samples are not signed chars. And, if I were one of them, I'd be plenty peeved after having to spend all that time learning about some historical artifact just to read an danged audio file, for gosh sakes. But not putting the 8-bit offset in the read/write_samples logic does eliminate 2 lines of code in each routine. "writesamples would raise an exception" Yep, taste. I'm inclined to find this irritating and I don't like being irritated by packages I use. Makes for a poor out-of-box experience. But, taste. :) "4) Well, lets fix a format then. else: wavs = [[wav]]" ? That's an extra [] I think. [[samples]] would be an array of array of an array of samples. s = [1,2]; print [ [ a ] ]; [[[1,2]]] On reflection, I'd say I agree with you more than I do with me on the ability of writesamples() to take a simple array of mono samples. Not a good thing to do. "wavs.extend(wav)" ? I had to look up extend() and try it in the Python shell! :) To each his own. But when I found out that one could do list+=added_list in Python I never looked back. Intuitive. I special-cased mono for speed purposes. No reason to do the +=/extend for mono samples. But, maybe the interpretor handles all that. Don't know. Didn't measure it. "monkey patch"?' Wonderful! This makes your rewrite of the code so much cleaner. Thanks for the tip! "code layout"? Har, har. Yep, no one in software has ever spent any time "discussing" code layout before. Let's do it for the first time in history. "test_wave.py"? Oooo. Bit minimal, that. Yeah, I think a couple of things could be fleshed out there. Gotta run now. But will try to update the code in wave_futz later. Other things on plate, though. Guilherme, I really appreciate your handling this and your guidance. Thanks!
msg79716 - (view)	Author: Guilherme Polo (gpolo) *	Date: 2009-01-13 03:02
> "DC (0 hz) assumption"? > wave.py makes the assumption that what the user wants is whatever > happens to be in the file, however arbitrary. (That 8 bit samples are > unsigned bytes is probably an artifact of early ADC logic. Typically you > got an absolute, n-bit value from an old ADC. Newer chips often return > signed values.) It's very unlikely that anything but a copy program > would try to work with unsigned char samples. Too many things to go > wrong. Too much confusion. Zero means zero in most of the world, in and > out of audio processing. :) Every document/text I've found so far talks about 8 bit samples being unsigned, it is not like I'm trying to enforce it just because I want, just following the specification. But, SDL for example, accepts wav files with unsigned 8 bits, signed 8 bits, unsigned 16 bits, signed 16 bits, and with different byte orders, so apparently different libraries write different wave files. I wonder which of these would be good to get included in wave.py, but for this current issue I would prefer to not even touch this another problem and stick with what is the documented for the wave format. > . > . > But not putting the 8-bit offset in the read/write_samples logic does > eliminate 2 lines of code in each routine. It is good that it is just two lines, it means they could be added back (and adapted) when we start supporting different output/input formats for wave. > "4) Well, lets fix a format then. else: wavs = [[wav]]" ? > That's an extra [] I think. Right. > "wavs.extend(wav)" ? > I had to look up extend() and try it in the Python shell! :) To each his > own. But when I found out that one could do list+=added_list in Python I > never looked back. Intuitive. I special-cased mono for speed purposes. > No reason to do the +=/extend for mono samples. But, maybe the > interpretor handles all that. It won't just do this optimization, keeping the special case for mono is fine. > "test_wave.py"? > Oooo. Bit minimal, that. Yeah, I think a couple of things could be > fleshed out there. I'm waiting for the new hand written tests now :)
msg81419 - (view)	Author: Alex Robinson (alex_python_org)	Date: 2009-02-08 21:22
I'll upload the latest monkey-patch file, wave_futz.py, and test_wave.py, which has a gob of tests added to it. I found a 64-bit bug in the wave.py formats for 32-bit sample wave files. The pcm files read in to CoolEdit ok, including the 32-bit sample files.
msg113363 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-08-09 03:17
Please upload plain-text files with unique names for each file uploaded.
msg113393 - (view)	Author: Alex Robinson (alex_python_org)	Date: 2010-08-09 07:21
Here go, Terry. Copies of the two files in the latest ZIP file. Hmmm. Well. Maybe just one of 'em. Looks like the only way to upload files is to add a msg, so I'll upload the other file in another msg.
msg113395 - (view)	Author: Alex Robinson (alex_python_org)	Date: 2010-08-09 07:23
OK, here's the other.
msg217034 - (view)	Author: Joe Button (Joeboy)	Date: 2014-04-22 21:59
Forgive my unfamiliarity with python's development process, but, what is happening with this? Is there any chance of this enhancement making it into the python libs? What would need to happen? Thanks.
msg217036 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-04-22 22:17
Someone has to find the time to do a commit review on the patch. As Guilherme said, there's no specific maintainer for wave, so I'm afraid it just got forgotten about. On the other hand, as a new feature it would now go in 3.5, and we're at the start of the approximately one year window for new features, so if you ping this issue (as you just did) periodically, someone will get to it ;) What you could do to help move it along is to do your own review of the patch, including making sure it still applies to default...which it may not, since there have in fact been some changes in wave.py. If that's the case you can also help by updating the patch.
msg217042 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2014-04-22 23:04
Serhiy, is this something you can review?
msg217045 - (view)	Author: Joe Button (Joeboy)	Date: 2014-04-22 23:45
On quickly looking at this, the immediate issue seems to me to be that there is no patch, as I understand the term. If it would be helpful I can look at turning the code in the attached files into a patch against default and ensure the tests pass (but not right now as it's ~1am here).
msg217053 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2014-04-23 05:20
A patch against default, including a test, would be helpful.
msg218434 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-05-13 11:14
I hope all mentioned bugs were already fixed in the wave module. As for new writesamples() and readsamples() methods, perhaps it would be better to add utility functions in the audioop module for packing/unpacking integers. In any case a user can use array.array.

History
Date	User	Action	Args
2022-04-11 14:56:44	admin	set	github: 49163
2014-05-13 11:14:33	serhiy.storchaka	set	messages: + msg218434
2014-04-23 05:20:28	terry.reedy	set	messages: + msg217053
2014-04-22 23:45:52	Joeboy	set	messages: + msg217045
2014-04-22 23:04:03	terry.reedy	set	nosy: + serhiy.storchaka messages: + msg217042
2014-04-22 22:17:35	r.david.murray	set	versions: + Python 3.5, - Python 3.2 nosy: + r.david.murray messages: + msg217036 stage: test needed -> patch review
2014-04-22 21:59:12	Joeboy	set	nosy: + Joeboy messages: + msg217034
2010-08-09 07:23:08	alex_python_org	set	files: + test_wave.py messages: + msg113395
2010-08-09 07:22:02	alex_python_org	set	files: + wave_futz.py messages: + msg113393
2010-08-09 03:17:33	terry.reedy	set	versions: + Python 3.2, - Python 3.1, Python 2.7 nosy: + terry.reedy messages: + msg113363 stage: test needed
2009-02-08 21:22:45	alex_python_org	set	files: + wave_futz.zip messages: + msg81419
2009-01-13 03:02:52	gpolo	set	messages: + msg79716
2009-01-13 02:20:26	alex_python_org	set	messages: + msg79714
2009-01-12 18:39:22	gpolo	set	messages: + msg79692
2009-01-12 18:28:17	gpolo	set	messages: + msg79691
2009-01-12 17:18:47	alex_python_org	set	messages: + msg79686
2009-01-12 15:34:31	gpolo	set	messages: + msg79681
2009-01-12 14:37:34	alex_python_org	set	files: + wave_futz.zip messages: + msg79678
2009-01-12 12:02:15	gpolo	set	messages: + msg79669
2009-01-12 11:32:55	alex_python_org	set	messages: + msg79668
2009-01-12 10:43:30	gpolo	set	versions: - Python 2.6, Python 3.0
2009-01-12 10:36:04	gpolo	set	title: wave.py writes 16 bit sample files of half the correct duration -> wave.py: add writesamples() and readsamples() messages: + msg79660 versions: - Python 2.5, Python 2.4
2009-01-12 04:36:41	alex_python_org	set	messages: + msg79644
2009-01-12 04:07:19	alex_python_org	set	type: behavior -> enhancement messages: + msg79643
2009-01-11 19:47:58	gpolo	set	messages: + msg79619
2009-01-11 19:39:29	gpolo	set	messages: + msg79618
2009-01-11 19:17:34	gpolo	set	files: - issue_4913.diff
2009-01-11 19:17:28	gpolo	set	messages: + msg79617
2009-01-11 19:15:00	gpolo	set	files: + issue_4913.diff keywords: + patch messages: + msg79616 nosy: + gpolo
2009-01-11 02:16:42	alex_python_org	create