Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib.crc32() and adler32() return value #45543

Closed
arigo mannequin opened this issue Sep 25, 2007 · 23 comments
Closed

zlib.crc32() and adler32() return value #45543

arigo mannequin opened this issue Sep 25, 2007 · 23 comments
Assignees
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@arigo
Copy link
Mannequin

arigo mannequin commented Sep 25, 2007

BPO 1202
Nosy @gvanrossum, @arigo, @facundobatista, @gpshead, @jcea
Files
  • Document.odt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2009-06-26.08:19:28.401>
    created_at = <Date 2007-09-25.11:31:11.430>
    labels = ['extension-modules', 'type-bug']
    title = 'zlib.crc32() and adler32() return value'
    updated_at = <Date 2009-06-26.08:20:10.922>
    user = 'https://github.com/arigo'

    bugs.python.org fields:

    activity = <Date 2009-06-26.08:20:10.922>
    actor = 'gregory.p.smith'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2009-06-26.08:19:28.401>
    closer = 'gregory.p.smith'
    components = ['Extension Modules']
    creation = <Date 2007-09-25.11:31:11.430>
    creator = 'arigo'
    dependencies = []
    files = ['13630']
    hgrepos = []
    issue_num = 1202
    keywords = ['64bit']
    message_count = 23.0
    messages = ['56130', '56146', '58336', '58344', '58362', '58375', '58376', '63664', '63666', '63668', '63676', '63756', '64416', '64420', '74380', '74384', '79408', '79527', '79528', '85557', '85626', '86192', '89723']
    nosy_count = 9.0
    nosy_names = ['gvanrossum', 'arigo', 'facundobatista', 'gregory.p.smith', 'jcea', 'tlesher', 'mbecker', 'jdavid.ibp@gmail.com', 'gdementen']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1202'
    versions = ['Python 2.6', 'Python 3.0', 'Python 3.1']

    @arigo
    Copy link
    Mannequin Author

    arigo mannequin commented Sep 25, 2007

    The functions zlib.crc32() and zlib.adler32() return a signed value
    in the range(-2147483648, 2147483648) on 32-bit platforms, but an
    unsigned value in the range(0, 4294967296) on 64-bit platforms. This
    means that half the possible answers are numerically different on these
    two kinds of platforms.

    Ideally, this should be fixed by having them always return unsigned
    numbers (their C return type is unsigned too). It's unclear if we can
    do this without breaking existing code, though :-(

    @arigo arigo mannequin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Sep 25, 2007
    @gvanrossum
    Copy link
    Member

    Since it's basically a magic cookie, not a meaningful numeric value, I'd
    propose sticking with backwards compatibility and fixing the 64-bit
    version to return a signed version.

    return x - ((x & 0x80000000) <<1)

    anyone?

    @tlesher
    Copy link
    Mannequin

    tlesher mannequin commented Dec 10, 2007

    Both CRC-32 and ADLER32 are standards (described in ISO 3309 and RFC
    1950 respectively); whatever fix implemented should make sure that the
    output complies.

    ISO 3309 isn't available online as far as I can see, but CRC-32
    reference code is published in RFC 1952; RFC 1950 contains reference
    code for ADLER32.

    @arigo
    Copy link
    Mannequin Author

    arigo mannequin commented Dec 10, 2007

    The C reference code in rfc1950 for Adler-32 and in rfc1952 for CRC-32
    compute with and return "unsigned long" values. From this point of
    view, returning negative values on 32-bit machines from CPython's zlib
    module can be considered a bug. That only leaves open the question of
    backward compatibility.

    @gvanrossum
    Copy link
    Member

    How about, in Python 2.6 we make the 64-bit version return a signed
    value for better compatibility with the 32-bit version, and in Python
    3.0 we make both versions return the signed value for better compliance
    with the standard?

    @arigo
    Copy link
    Mannequin Author

    arigo mannequin commented Dec 10, 2007

    Obscure but reasonable. (I suspect you meant to say that py3k should
    return the *unsigned* value for better compliance with the standard.)

    @gvanrossum
    Copy link
    Member

    Obscure but reasonable. (I suspect you meant to say that py3k should
    return the *unsigned* value for better compliance with the standard.)

    Yes. :)

    @gpshead gpshead added the easy label Jan 23, 2008
    @gpshead gpshead self-assigned this Jan 23, 2008
    @gpshead
    Copy link
    Member

    gpshead commented Mar 17, 2008

    working on this now. foo = 'abcdefghijklmnop'

    2.x 32-bit long: zlib.crc32(foo) returns -1808088941
    2.x 64-bit long: zlib.crc32(foo) returns 2486878355

    This is because PyInt_FromLong() happily fits the value into a signed
    long internally to the integer object on 64-bit platforms. They are
    both the same number if considered with & 0xffffffff.

    I'm doing as guido suggests and leaving this slightly odd behavior for
    2.x so that crc32 and adler32 always return an integer object. in 3.0
    they'll always return an unsigned value.

    @gpshead
    Copy link
    Member

    gpshead commented Mar 17, 2008

    question: should I also make 64-bit 2.x return a signed value as well to
    be consistent with 32bit python 2.x?

    Consistency in case someone ever pickles the value and sends it to
    another python instance of a different word length would be good...

    @gvanrossum
    Copy link
    Member

    Sure. (Though sending pickles to 3.0 would still cause problems.
    Consumers of pickled checksums would do wise to *always* take the CRC
    mod 2**32 before doing comparisons.)

    @jcea
    Copy link
    Member

    jcea commented Mar 17, 2008

    I store CRC in reed-solomon schema of mine. I compare with equality, so,
    I think we should enforce "CRC(python 32 bits) == CRC(python 64 bits)".

    I will need to touch my code in python 3.0, but that will be inevitable
    anyway...

    @gpshead
    Copy link
    Member

    gpshead commented Mar 17, 2008

    fixed.

    3.0 always returns unsigned.
    2.6 always returns signed, 2**31...2**31-1 come back as negative integers.

    trunk r61449
    branches/py3k r61459

    @gpshead gpshead closed this as completed Mar 17, 2008
    @mbecker
    Copy link
    Mannequin

    mbecker mannequin commented Mar 24, 2008

    In case it isn't obvious the work around for pre 3.0 to get the right
    sum is something like:
    x=zlib.adler32(str)
    if x < 0:
    x=(long(x) + 4294967296L) # 2^32, long may or may not be needed here

    @gpshead
    Copy link
    Member

    gpshead commented Mar 24, 2008

    The workaround I prefer to use is:

    x = zlib.adler32(mystr) & 0xffffffffL

    @facundobatista
    Copy link
    Member

    Let me reopen this, I think we have an issue with this fix.

    The conclusion of this discussion so far is that in 3.0 the crc32 will
    behave like the standard, which is a good thing (tm), but in 2.6 it will
    not: it should return a signed integer. I agree with this outcome!

    The documentation for 2.6, the commit message for the fix and what it's
    said here states that: "2.6 always returns signed, 2**31...2**31-1 come
    back as negative integers."

    This is *not* actually happening:

    >>> s = "*"*100000
    >>> print zlib.crc32(s)  # 2.6, 32 bits
    -872059092
    >>> print zlib.crc32(s)  # 2.6, 64 bits
    3422908204

    The problem in the code is, IMHO, that the "32b rounding" is being
    forced by assigning the result to an int (Modules/zlibmodule.c, line
    929), but this "rounding" does not actually work for 64b (because the
    int has 64 bit, and even as it's signed, the number stays big and positive).

    Thank you!

    @facundobatista facundobatista reopened this Oct 6, 2008
    @gpshead
    Copy link
    Member

    gpshead commented Oct 6, 2008

    An int is 32-bits on all popular platforms. Anyways i'll double check
    things. What platforms did you run your test on?

    @jdavidibpgmailcom
    Copy link
    Mannequin

    jdavidibpgmailcom mannequin commented Jan 8, 2009

    I believe I have hit this bug. With Python 2.6.1 in a Gentoo Linux
    64 bits.

    This code:

      from zipfile import ZipFile
      inzip = ZipFile('Document.odt')
      outzip = ZipFile('out.zip', 'w')
      for f in inzip.infolist():
          if f.filename != 'content.xml':
              print f.filename, '(CRC: %s)' % f.CRC
              outzip.writestr(f, inzip.read(f.filename))
      outzip.close()

    Produces this output:

    ...
    styles.xml (CRC: 3930581528)
    test_odt.py:10: DeprecationWarning: 'l' format requires -2147483648 <=
    number <= 2147483647
    outzip.writestr(f, inzip.read(f.filename))
    ...

    The CRC is not a 32bits signed, and then the module struct complains,
    here:

    zipfile.py:1098
    self.fp.write(struct.pack("<lLL", zinfo.CRC, zinfo.compress_size,

    Apparently 'struct.pack' expects 'zinfo.CRC' to be a 32 signed it,
    which is not.

    I can attach the 'Document.odt' file if you want.

    @gpshead
    Copy link
    Member

    gpshead commented Jan 10, 2009

    seems there are bugs with it not staying signed as it should on some
    64bit platforms. i'll be looking into this shortly. its a good
    candidate bug for 2.6.x and 3.0.x releases.

    @gpshead gpshead removed the easy label Jan 10, 2009
    @gpshead
    Copy link
    Member

    gpshead commented Jan 10, 2009

    err not 3.0.x, 3.0 is always unsigned like anyone sane would want. :)

    @gpshead
    Copy link
    Member

    gpshead commented Apr 5, 2009

    J. David Ibáñez - do you still happen to have that Document.odt laying
    around?

    @jdavidibpgmailcom
    Copy link
    Mannequin

    jdavidibpgmailcom mannequin commented Apr 6, 2009

    There it is.

    @gdementen
    Copy link
    Mannequin

    gdementen mannequin commented Apr 20, 2009

    Regarding the issue J. David Ibáñez has, I have a few comments to add:

    I'm not sure whether or not a separate bug report should be opened for
    this issue.

    @gpshead
    Copy link
    Member

    gpshead commented Jun 26, 2009

    fix for J. David's issue submitted to trunk r73565 and py3k r73566 just in
    time for the 3.1 release.

    release30-maint r73567
    release26-maint r73568

    @gpshead gpshead closed this as completed Jun 26, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants