Title: wave module: wrong integer format
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.2, Python 3.3, Python 2.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ckern, jcea, python-dev, terry.reedy
Priority: normal Keywords:

Created on 2012-11-12 15:04 by ckern, last changed 2012-11-17 02:45 by jcea. This issue is now closed.

File name Uploaded Description Edit ckern, 2012-11-12 15:04 Simple script for reproducing the error. Call with argument "203".
Messages (6)
msg175450 - (view) Author: Christian Kern (ckern) Date: 2012-11-12 15:04
Writing .wav files is limited to a file size of 2 Gib, while
the WAV file format itself supports up to 4 Gib.
Trying to write a file beyond 2 Gib (e.g. 203 minutes at
CD quality (i.e. 44.1 kHz, 2 channels, 16 bit)) will crash
at the moment when self._datawritten exceeds 2^31-1 bytes.
This is due to the fact that, in method "_patchheader",
the variable "self._datawritten" is written with
"struct.pack('<l')" (signed long integer)
instead of
"struct.pack('<L')" (unsigned long integer---which would
conform to the WAV file format spefication).

patch to
<         self._file.write(struct.pack('<l', self._datalength))
>         self._file.write(struct.pack('<L', self._datalength))
<         self._file.write(struct.pack('<l', 36 + self._datawritten))
>         self._file.write(struct.pack('<L', 36 + self._datawritten))
<         self._file.write(struct.pack('<l', self._datawritten))
>         self._file.write(struct.pack('<L', self._datawritten))

This patch also patches the "_write_header" method, which
has the same problem (but will lead to a crash only
in very rare cases).

By the way: "_patchheader" should be renamed to "_patch_header"
in order to be in harmony with the other function/method names
of this module.

Attached you'll find a very simple python 2 script which will
reproduce the problem. Usage: $duration_in_minutes

Maybe the problem also occurs at python 3, I don't know.
msg175459 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-11-12 15:56
Python 3.x affected too.

Python 2.6 is open only for security fixes.

4GB support confirmed:

But trying to find a "canonical" description of the format, I see tons of inconsistencies. For instance defines "chunkSize" as "long". That is, signed.
msg175462 - (view) Author: Christian Kern (ckern) Date: 2012-11-12 16:28
Addendum: 4 Gib file size can only be achieved with "unsigned
long". Moreover, for numbers < 2^31, "signed long" and
"unsigned long" seem to be the same, so there should arise
no problem. (Tested on x86_64 linux)

BTW: Writing .wav files could gain performance it there'd be
an option for updating the header only at the end of writing.
msg175712 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-11-16 20:16
The Wikipedia sentence "The WAV format is limited to files that are less than 4 GB, because of its use of a 32-bit unsigned integer to record the file size header" is unambiguous and appears correct (see below). The rest of the Wikipedia sentence "(some programs limit the file size to 2 GB)" must be because some programs mistakenly read into signed instead of unsigned ints and fail to later adjust (by, for instance, later casting to unsigned).

The statement reflects the original specification given in reference 3, a .pdf. On page 11, it has
The basic building block of a RIFF file is called a chunk. Using C syntax, a chunk can be defined
as follows:
typedef unsigned long DWORD;
typedef DWORD CKSIZE; // 32-bit unsigned size value
typedef struct { // Chunk structure
  CKID ckID; // Chunk type identifier
  CKSIZE ckSize; // Chunk size field (size of ckData)
and on page 19
"<WORD> 16-bit unsigned quantity in Intel format    unsigned int"
INT and LONG are defined as 16 and 32 bit signed versions.

The WAVE specification, start on p. 56, uses WORD and DWORD, not INT and LONG for chunk header fields. Certainly, the 2 bytes for samples/sec should be unsigned to allow the standard 44100 CD rate.
(reference 4) summarized the .wav chunk formats. I think it just takes it for granted that sizes and counts are unsigned.

The patch given did not touch the write format in line 469(3.3.0):
The first 'l' is filled with 36 + self._datalength and I believe the whole thing should be '<L4s4sLHHLLHH4s'. The two struct.unpack formats on lines 264 and 266 should also be changed.

A workaround is to write large numbers as signed negatives.
>>> struct.unpack('L', struct.pack('l', 3000000000 -2**32))
>>> struct.unpack('H', struct.pack('h', 44100 - 2**16))[0]

It is possible that someone is using this to write CD-quality files. It is also possible that anyone who has tried just gave up.
msg175720 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-17 02:43
New changeset b8ece33ce27d by Jesus Cea in branch '2.7':
Closes #16461: Wave library should be able to deal with 4GB wav files, and sample rate of 44100 Hz.

New changeset 542bf1c1f2e3 by Jesus Cea in branch '3.2':
Closes #16461: Wave library should be able to deal with 4GB wav files, and sample rate of 44100 Hz.

New changeset f380d749f6bd by Jesus Cea in branch '3.3':
MERGE: Closes #16461: Wave library should be able to deal with 4GB wav files, and sample rate of 44100 Hz.

New changeset 269498958b97 by Jesus Cea in branch 'default':
MERGE: Closes #16461: Wave library should be able to deal with 4GB wav files, and sample rate of 44100 Hz.
msg175722 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-11-17 02:45
Thanks to Christian for the report and to Terry for digging the spec.

Patched. Thanks.
Date User Action Args
2012-11-17 02:45:07jceasetmessages: + msg175722
2012-11-17 02:43:31python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg175720

resolution: fixed
stage: needs patch -> resolved
2012-11-16 20:16:53terry.reedysetnosy: + terry.reedy
messages: + msg175712
2012-11-12 16:28:16ckernsetmessages: + msg175462
2012-11-12 15:56:05jceasetnosy: + jcea
messages: + msg175459
2012-11-12 15:52:07serhiy.storchakasetstage: needs patch
components: + Library (Lib), - Extension Modules
versions: + Python 3.2, Python 3.3, Python 3.4, - Python 2.6
2012-11-12 15:04:11ckerncreate