Message 175712 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	ckern, jcea, terry.reedy
Date	2012-11-16.20:16:52
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1353097013.24.0.811804311496.issue16461@psf.upfronthosting.co.za>
In-reply-to

Content
The Wikipedia sentence "The WAV format is limited to files that are less than 4 GB, because of its use of a 32-bit unsigned integer to record the file size header" is unambiguous and appears correct (see below). The rest of the Wikipedia sentence "(some programs limit the file size to 2 GB)" must be because some programs mistakenly read into signed instead of unsigned ints and fail to later adjust (by, for instance, later casting to unsigned). The statement reflects the original specification given in reference 3, a .pdf. On page 11, it has ''' The basic building block of a RIFF file is called a chunk. Using C syntax, a chunk can be defined as follows: typedef unsigned long DWORD; typedef DWORD CKSIZE; // 32-bit unsigned size value typedef struct { // Chunk structure CKID ckID; // Chunk type identifier CKSIZE ckSize; // Chunk size field (size of ckData) ''' and on page 19 "<WORD> 16-bit unsigned quantity in Intel format unsigned int" INT and LONG are defined as 16 and 32 bit signed versions. The WAVE specification, start on p. 56, uses WORD and DWORD, not INT and LONG for chunk header fields. Certainly, the 2 bytes for samples/sec should be unsigned to allow the standard 44100 CD rate. http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html (reference 4) summarized the .wav chunk formats. I think it just takes it for granted that sizes and counts are unsigned. The patch given did not touch the write format in line 469(3.3.0): self._file.write(struct.pack('<l4s4slhhllhh4s', The first 'l' is filled with 36 + self._datalength and I believe the whole thing should be '<L4s4sLHHLLHH4s'. The two struct.unpack formats on lines 264 and 266 should also be changed. A workaround is to write large numbers as signed negatives. >>> struct.unpack('L', struct.pack('l', 3000000000 -232)) (3000000000,) >>> struct.unpack('H', struct.pack('h', 44100 - 216))[0] 44100 It is possible that someone is using this to write CD-quality files. It is also possible that anyone who has tried just gave up.

The Wikipedia sentence "The WAV format is limited to files that are less than 4 GB, because of its use of a 32-bit unsigned integer to record the file size header" is unambiguous and appears correct (see below). The rest of the Wikipedia sentence "(some programs limit the file size to 2 GB)" must be because some programs mistakenly read into signed instead of unsigned ints and fail to later adjust (by, for instance, later casting to unsigned).

The statement reflects the original specification given in reference 3, a .pdf. On page 11, it has
'''
The basic building block of a RIFF file is called a chunk. Using C syntax, a chunk can be defined
as follows:
typedef unsigned long DWORD;
typedef DWORD CKSIZE; // 32-bit unsigned size value
typedef struct { // Chunk structure
  CKID ckID; // Chunk type identifier
  CKSIZE ckSize; // Chunk size field (size of ckData)
'''
and on page 19
"<WORD> 16-bit unsigned quantity in Intel format    unsigned int"
INT and LONG are defined as 16 and 32 bit signed versions.

The WAVE specification, start on p. 56, uses WORD and DWORD, not INT and LONG for chunk header fields. Certainly, the 2 bytes for samples/sec should be unsigned to allow the standard 44100 CD rate.

http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html
(reference 4) summarized the .wav chunk formats. I think it just takes it for granted that sizes and counts are unsigned.

The patch given did not touch the write format in line 469(3.3.0):
        self._file.write(struct.pack('<l4s4slhhllhh4s',
The first 'l' is filled with 36 + self._datalength and I believe the whole thing should be '<L4s4sLHHLLHH4s'. The two struct.unpack formats on lines 264 and 266 should also be changed.

A workaround is to write large numbers as signed negatives.
>>> struct.unpack('L', struct.pack('l', 3000000000 -2**32))
(3000000000,)
>>> struct.unpack('H', struct.pack('h', 44100 - 2**16))[0]
44100

It is possible that someone is using this to write CD-quality files. It is also possible that anyone who has tried just gave up.

History
Date	User	Action	Args
2012-11-16 20:16:53	terry.reedy	set	recipients: + terry.reedy, jcea, ckern
2012-11-16 20:16:53	terry.reedy	set	messageid: <1353097013.24.0.811804311496.issue16461@psf.upfronthosting.co.za>
2012-11-16 20:16:53	terry.reedy	link	issue16461 messages
2012-11-16 20:16:52	terry.reedy	create