classification
Title: assumption about unsigned long byte size in struct module usage
Type: behavior Stage:
Components: Documentation, Library (Lib) Versions: Python 3.0, Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: Joseph Turian, gregory.p.smith, mark.dickinson, spe-anatol
Priority: normal Keywords: 64bit, easy, patch

Created on 2008-01-11 11:30 by spe-anatol, last changed 2010-06-29 20:12 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
struct-long-nuke-01.diff gregory.p.smith, 2008-01-23 21:49 fix Lib struct usage (and the documentation)
issue1789_doc.patch mark.dickinson, 2010-06-25 19:41
Messages (13)
msg59704 - (view) Author: Christopher Yeleighton (spe-anatol) Date: 2008-01-11 11:30
Text of section 11.3 "Working with Binary Data Record Layouts" at
<http://docs.python.org/tut/node13.html#SECTION0013300000000000000000>
incorrectly assumes that unsigned long is 4 bytes long (with pack codes
"H" and "L" representing two and four byte unsigned numbers
respectively).  It is may be a valid description of the author’s
environment, but it is invalid as a general statement and it creates the
wrong impression that the sample programme is portable and ready-to-go.
msg61569 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-23 08:48
the docs I see in the URL you give are correct as it states 'two and
four byte unsigned integer respectively' for H and L.  However the
example is missing the < for little-endian; I'll add that.

Also, I notice that the struct module documentation itself incorrectly
uses the C terms 'int' and 'long' to assume 32bit and long long when it
means 64bit.  I'll fix that.
msg61571 - (view) Author: Christopher Yeleighton (spe-anatol) Date: 2008-01-23 09:03
Python 2.5.1 (r251:54863, Oct  5 2007, 13:50:07) 
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import struct
>>> g_f = open('/dev/null')
>>> struct.unpack('L', g_f.read(04))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/struct.py", line 87, in unpack
    return o.unpack(s)
struct.error: unpack requires a string argument of length 8
msg61593 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-23 17:54
eew, so the struct module does map as documented to the C types listed
in the docs.  yuck.  regardless, changing the Ls to Is will fix the
tutorial document code.  I'll do that.

I believe there are also other instances of wrong uses of L in struct in
the code such as the zipfile module that will fail on LP64 platforms. 
I'll look for those (and why they weren't caught by unit tests).
msg61602 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-23 20:38
The documentation for the struct module says:

 "short is 2 bytes; int and long are 4 bytes; long long (__int64 on
Windows) is 8 bytes"

and lists 'l' and 'L' as the pack code for a C long.

As it is implemented today, the documentation is incorrect.  On an LP64
host (pretty much any 64-bit linux, bsd or unixish OS; not windows) a
long is 8 bytes.
msg61611 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-23 21:49
A significant portion of the python standard library is broken due to
incorrect use of the struct module on LP64 platforms.  I'm attaching a
patch that should clean it up.

I need Mac OS X people to confirm that the Mac changes are sane.

Its possible that the Lib/posixfile.py change could be a candidate for a
2.4 security backport if it turns out the _posixfile_.lock()
implementation is indeed broken when a 64bit python is used on some
platforms.  I'll leave that up to the 2.4 releaser to decide.
msg61622 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-24 09:40
Anders J. Munch on python-dev correctly says:

You overlooked the words "Standard size and alignment are as follows"
that start the quoted paragraph.  It's a little confusing because
standard size is not the default.  The default is platform-specific
sizes.  Only if you start the format string with >, <, ! or = do you
get standard sizes.

The reference documentation is correct as it stands, and, I suspect,
so is the LP64 implementation.  Doesn't struct.pack('>l',42) produce a
4-byte string on LP64?

The tutorial at
http://docs.python.org/tut/node13.html#SECTION0013300000000000000000%3E
has a bug though: the format string should use '<'.

I believe zipfile.py correctly uses '<' throughout.
msg61623 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-24 09:45
The giant patch was unnecessary, I misread what the struct module was
actually doing.  Auditing all uses of struct in the standard library the
only ones that look suspicious to me are:

Lib/posixfile.py
and all of the uses in Lib/plat-mac/

posixfile is probably fine, but it wouldn't surprise me if there are
bugs on some platforms in there due to using 'l' to decode off_t and 'h'
to decode pid_t within a fcntl structure.

The plat-mac stuff might break assuming those APIs are even available to
64bit programs.  I'll leave that up to the mac porting folks, as it is
its difficult if not impossible to even build a 64-bit Python under OS X.

Fixed in trunk r60234.
msg108618 - (view) Author: Joseph Turian (Joseph Turian) Date: 2010-06-25 19:29
I just got bit by this bug. This documentation (http://docs.python.org/library/struct.html) claims that an unsigned long is 4 bytes. On my machine, it's 8.
msg108621 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-25 19:34
> This documentation (http://docs.python.org/library/struct.html)
> claims that an unsigned long is 4 bytes.

I don't think it does, if you read it closely.  Of course, patches to improve the docs are always welcome. :)

If you're looking at the table of format codes, note that it refers to the *standard* size, which is the size used when standard size and alignment are in effect (i.e., when the struct format string starts with '<', '>', etc.).  That standard size is indeed 4 bytes for the 'l' and 'L' codes.
msg108622 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-25 19:41
Here's a proposed doc clarification.
msg108625 - (view) Author: Joseph Turian (Joseph Turian) Date: 2010-06-25 20:15
That patch gives good clarification.
msg108946 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-29 20:12
Doc patch committed in revisions r82379 through r82382.
History
Date User Action Args
2010-06-29 20:12:35mark.dickinsonsetstatus: open -> closed

messages: + msg108946
2010-06-25 20:25:05mark.dickinsonsetstatus: closed -> open
assignee: gregory.p.smith -> mark.dickinson
2010-06-25 20:15:36Joseph Turiansetmessages: + msg108625
2010-06-25 19:41:35mark.dickinsonsetfiles: + issue1789_doc.patch
keywords: + patch
messages: + msg108622
2010-06-25 19:34:32mark.dickinsonsetnosy: + mark.dickinson
messages: + msg108621
2010-06-25 19:29:00Joseph Turiansetnosy: + Joseph Turian
messages: + msg108618
2008-01-24 09:45:12gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: + msg61623
2008-01-24 09:40:17gregory.p.smithsetpriority: critical -> normal
keywords: + easy, - patch
messages: + msg61622
severity: major -> normal
versions: - Python 2.4
2008-01-23 22:27:17georg.brandlsetpriority: high -> critical
2008-01-23 21:49:48gregory.p.smithsetfiles: + struct-long-nuke-01.diff
priority: normal -> high
severity: normal -> major
components: + Library (Lib)
title: incorrect assumption about unsigned long byte size -> assumption about unsigned long byte size in struct module usage
keywords: + patch, 64bit, - easy
versions: + Python 2.6, Python 2.5, Python 2.4
messages: + msg61611
2008-01-23 20:38:00gregory.p.smithsetmessages: + msg61602
2008-01-23 17:54:52gregory.p.smithsetmessages: + msg61593
2008-01-23 09:03:49spe-anatolsetmessages: + msg61571
2008-01-23 08:48:50gregory.p.smithsetpriority: normal
assignee: gregory.p.smith
messages: + msg61569
keywords: + easy
nosy: + gregory.p.smith
2008-01-11 11:30:57spe-anatolcreate