This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mgiuca
Recipients georg.brandl, mgiuca
Date 2008-07-31.14:26:27
SpamBayes Score 3.8374304e-11
Marked as misclassified No
Message-id <1217514391.83.0.200271369945.issue3478@psf.upfronthosting.co.za>
In-reply-to
Content
The documentation for the "struct" module still uses the term "string"
even though the struct module itself deals entirely in bytes objects in
Python 3.0.

I propose updating the documentation to reflect the 3.0 terminology.

I've attached a patch for the Docs/library/struct.rst file. It mostly
renames "string" to "bytes". It also notes that pack for 'c', 's' and
'p' accepts either string or bytes, but unpack spits out a bytes.

One important point: If you pass a str to 'c', 's' or 'p', it will get
encoded with UTF-8 before being packed. I've described this behaviour in
the documentation. I'm not sure if this should be described as the
"official" behaviour, or just informatively.

I've traced this behaviour to Modules/_struct.c lines 607, 1650 and 1676
(for 'c', 's' and 'p' respectively), which calls
_PyUnicode_AsDefaultEncodedString. This is found in
Object/unicodeobject.c:1410, which directly calls PyUnicode_EncodeUTF8.

Hence the UTF-8 encoding is not system or locale specific - it will
always happen. However, perhaps we should loosen the documentation to
say "which are encoded using a default encoding scheme".

It would be good if the authors of the struct module read over these
changes first, to make sure I am describing it correctly.

I have also updated Modules/_struct.c's doc strings and exception
messages to reflect this new terminology. (I've changed nothing besides
the contents of these strings - test case passes, just to be safe).

Patch is for /python/branches/py3k/, revision 65324.

Commit Log:

Docs/library/struct.rst: Updated documentation to Python 3.0 terminology
(bytes instead of strings). Added note that packing 'c', 's' or 'p'
accepts either str or bytes.

Modules/_struct.c: Updated doc strings and exception messages to the same.
History
Date User Action Args
2008-07-31 14:26:31mgiucasetrecipients: + mgiuca, georg.brandl
2008-07-31 14:26:31mgiucasetmessageid: <1217514391.83.0.200271369945.issue3478@psf.upfronthosting.co.za>
2008-07-31 14:26:30mgiucalinkissue3478 messages
2008-07-31 14:26:29mgiucacreate