classification
Title: confusing action of struct.pack and struct.unpack with fmt 'p'
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: liyu, mark.dickinson, mhammond, nzjrs
Priority: normal Keywords:

Created on 2008-05-27 15:56 by liyu, last changed 2010-06-14 15:04 by mark.dickinson. This issue is now closed.

Messages (4)
msg67411 - (view) Author: Yu LI (liyu) Date: 2008-05-27 15:56
As documented, build in module struct has two format for string objects,
such as 's' 'p'. They suggest following actions

>>> struct.pack('5s', 'hello')
'hello'
>>> struct.pack('6p', 'hello')
'\x05hello'

However, the second action really confuses people. In the documentation:
the "p" format character encodes a "Pascal string", meaning a short
variable-length string stored in a fixed number of bytes. So people
naturally assumes following action

>>> struct.pack('p', 'hello')
'\x05hello'

which makes more sense. Otherwise, why people should use format 'p'?
Either when you struct.pack or struct.unpack you have to know the size
of string at first, why not turn to format 's'? Also the the bigger
number (6) before 'p' makes people confuse. Why should it be string size
+ 1? If we know there is a padding character and the string size, why
not struct.unpack('x5s', abuf) instead?

So the suggestion is to modify the behavior of format string 'p' to be
the same as people's intuition. In detail, the actions should be

>>> s = struct.pack('p', 'hello')
'\x05hello'
>>> struct.unpack('p', s)
('hello',)

And also these actions should be consist with struct.pack_into and
struct.unpack_from
msg67466 - (view) Author: John Stowers (nzjrs) Date: 2008-05-29 01:25
I Agree. AIUI the benefit of pascal strings when transmitted is that the
decoder does not need to know its length ahead of time. The requirement
that the unpack pascal format string be given the length of the string
makes the implementation here useless
msg67821 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2008-06-08 04:37
What should struct.calcsize() do with a 'p' format string?
msg107789 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-06-14 15:04
I think you're misunderstanding how the 'p' format works.

> Otherwise, why people should use format 'p'?
> Either when you struct.pack or struct.unpack you have to know the size
> of string at first, why not turn to format 's'?

No;  you don't need to know the size of the string beforehand;  you just need to know the *maximum* size of the string;  the number of bytes allocated to store that string.  For example (Python 2.6):

>>> import struct
>>> s = struct.Struct('20p')  # variable-length string stored in 20 bytes
>>> s.pack('abc')
'\x03abc\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> s.unpack(_)
('abc',)
>>> s.pack('abcdef')
'\x06abcdef\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> s.unpack(_)
('abcdef',)

Note that the packed sizes are the same (20 bytes each time), but you can pack and unpack any (byte)string of length up to 19 bytes, without needing to know its length beforehand.

Handling true variable-length fields is really outside the scope of the struct module.
History
Date User Action Args
2010-06-14 15:04:58mark.dickinsonsetstatus: open -> closed
resolution: rejected
messages: + msg107789
2010-06-14 14:35:08mark.dickinsonsetnosy: + mark.dickinson
2009-05-13 22:37:53ajaksu2setpriority: normal
type: behavior -> enhancement
versions: + Python 2.7, Python 3.2, - Python 2.5
2008-06-08 04:37:35mhammondsetnosy: + mhammond
messages: + msg67821
2008-05-29 01:25:33nzjrssetnosy: + nzjrs
messages: + msg67466
2008-05-27 15:56:09liyucreate