classification
Title: Python 3 crc32 documentation clarifications
Type: Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: martin.panter Nosy List: docs@python, gregory.p.smith, martin.panter, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-09-05 11:55 by martin.panter, last changed 2015-12-11 06:07 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
crc-sign.patch martin.panter, 2014-12-19 05:14 review
crc-sign.v2.patch martin.panter, 2015-03-09 02:28 review
crc-sign.v3.patch martin.panter, 2015-03-30 12:19 review
crc-sign.v4.patch martin.panter, 2015-03-30 12:40 review
crc-sign.v5.patch martin.panter, 2015-12-09 04:47 review
Messages (15)
msg226419 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-09-05 11:55
This is regarding the Python 3 documentation for binascii.crc32(), <https://docs.python.org/dev/library/binascii.html#binascii.crc32>. It repeatedly recommends correcting the sign by doing crc32() & 0xFFFFFFFF, but it is not immediately clear why. Only after reading the Python 2 documentation does one realise that the value is always unsigned for Python 3, so you only really need the workaround if you want to support earlier Pythons.

I also suggest documenting the initial CRC input value, which is zero. Suggested wording:

binascii.crc32(data[, crc])
Compute CRC-32, the 32-bit checksum of “data”, starting with the given “crc”. The default initial value is zero. The algorithm is consistent with the ZIP file checksum. Since the algorithm is designed for use as a checksum algorithm, it is not suitable for use as a general hash algorithm. Use as follows:

print(binascii.crc32(b"hello world"))
# Or, in two pieces:
crc = binascii.crc32(b"hello", 0)
crc = binascii.crc32(b" world", crc)
print('crc32 = {:#010x}'.format(crc))

I would simply drop the notice box with the workaround, because I gather that the Python 3 documentation generally omits Python 2 details. (There are no “new in version 2.4 tags” for instance.) Otherwise, clarify if “packed binary format” is a reference to the “struct” module, or something else.

Similar fixes are probably appropriate for zlib.crc32() and zlib.alder32().

Also, what is the relationship between binascii.crc32() and zlib.crc32()? I vaguely remember reading that “zlib” is not always available, so I tend to use “binascii” instead. Is there any advantage in using the “zlib” version? The “hashlib” documentation points to “zlib” without mentioning “binascii” at all.
msg232927 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-12-19 05:14
Here is a patch that fixes the binascii, zlib.crc32() and adler32() documentation as I suggested.

I’m still interested why there are two ways to do a CRC-32, each equally non-obvious as the other.
msg236881 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-28 12:06
crc & 0xffffffff is still used in gzip, zipfile and tarfile. And some comments say about signess of 32-bit checksums.
msg237589 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-09 02:28
Posting a new patch that also removes the masking from the gzip, zipfile, tarfile, and test_zlib modules. I removed the comment about signedness in tarfile; let me know if you saw any others.
msg238715 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-20 19:35
These notes was added by Gregory in r68535. Ask him if they still are needed.
msg238716 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-20 19:36
See also issue4903.
msg238723 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-03-20 21:07
I do not object to the removal of the & 0xfffffff from the stdlib library code if these functions have actually been fixed to always return unsigned now.  (double check the behavior, and if good, just do it!)

But I think the docs should still mention that the & 0xffffffff is a good practice if code needs to be compatible with Python versions prior to X.Y (list the last release before the behavior was corrected).  Possibly within a .. versionchanged: section.

People often use the latest docs when writing code in any version of Python as we continually improve docs and are pretty good about noting old behaviors and when behaviors changed.
msg238735 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-20 22:52
Hi Gregory, I think the three functions have been fixed since Python 3.0. It looks like you changed them in revisions 6a7fa8421902 (zlib module) and 132cf3a79126 (crc32 functions). Looking at the 3.0 branch, all three functions use PyLong_FromUnsignedLong(), but in the 2.7 branch, they use PyInt_FromLong(). This is tested in test_zlib.ChecksumTestCase.test_crc32_adler32_unsigned(). See also my changes to test_penguins() in the patch here for further testing. There is no explicit testing of binascii.crc32() unsignedness, but it is implicitly tested by test_same_as_binascii_crc32(), because the expected CRC is beyond the positive 32 bit signed limit.

I am happy to put back the suggestion of masking for backwards compatibility if you really think it is necessary. But since it only involves Python < 3.0 compatibility, I thought it would be okay to remove it; see my original post. The documentation is often pretty good about noting when Python 3 behaviours changed, but you usually have to look elsewhere to see the differences from Python 2.
msg239599 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-30 12:19
Patch v3:

* Reverted to original crc32(b"hello") example call with the implicit initial CRC
* Added “Changed in version 3.0” notices, restoring a brief version of the suggestion to use the bit mask, along with an explanation.

Python 2 compatibility information is generally unprecedented in the Python 3 documentation though, but hopefully this version should make more sense to people not already familiar with the Python 2 odd behaviour.
msg239602 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-30 12:40
V4 fixes a merge conflict with recent gzip changes.
msg256107 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-08 09:45
The patch LGTM.

But while we are here, wouldn't be worth to unify the name of second parameter of binascii and zlib CRC calculation functions? And unify the description of crc_hqx() with other functions.
msg256138 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-09 04:47
In crc-sign.v5.patch I have changed the binascii.crc_hqx() and crc32() documentation to use “value”, matching the zlib documentation. Is that what you had in mind?
msg256140 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-09 06:56
Yes, thank you Martin.
msg256198 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-12-11 06:04
New changeset 03c4ca1a34e2 by Martin Panter in branch '3.5':
Issue #22341: Drop Python 2 workaround and document CRC initial value
https://hg.python.org/cpython/rev/03c4ca1a34e2

New changeset 360c1326d8d1 by Martin Panter in branch 'default':
Issue #22341: Merge CRC doc from 3.5
https://hg.python.org/cpython/rev/360c1326d8d1
msg256200 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-12-11 06:07
Thanks for the reviews
History
Date User Action Args
2015-12-11 06:07:02martin.pantersetstatus: open -> closed
resolution: fixed
messages: + msg256200

stage: commit review -> resolved
2015-12-11 06:04:45python-devsetnosy: + python-dev
messages: + msg256198
2015-12-09 06:56:31serhiy.storchakasetassignee: docs@python -> martin.panter
messages: + msg256140
stage: patch review -> commit review
2015-12-09 04:47:14martin.pantersetfiles: + crc-sign.v5.patch

stage: patch review
messages: + msg256138
versions: + Python 3.6
2015-12-08 09:45:33serhiy.storchakasetmessages: + msg256107
2015-03-30 12:40:55martin.pantersetfiles: + crc-sign.v4.patch

messages: + msg239602
2015-03-30 12:19:38martin.pantersetfiles: + crc-sign.v3.patch

messages: + msg239599
2015-03-20 22:52:10martin.pantersetmessages: + msg238735
2015-03-20 21:07:32gregory.p.smithsetmessages: + msg238723
2015-03-20 19:36:55serhiy.storchakasetmessages: + msg238716
2015-03-20 19:35:11serhiy.storchakasetnosy: + gregory.p.smith
messages: + msg238715
2015-03-09 02:28:51martin.pantersetfiles: + crc-sign.v2.patch

messages: + msg237589
versions: + Python 3.5, - Python 3.4
2015-02-28 12:06:47serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg236881
2014-12-19 05:14:30martin.pantersetfiles: + crc-sign.patch
keywords: + patch
messages: + msg232927
2014-09-05 11:55:29martin.pantercreate