Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch to make zlib-objects better support threads #48988

Closed
ebfe mannequin opened this issue Dec 24, 2008 · 15 comments
Closed

Patch to make zlib-objects better support threads #48988

ebfe mannequin opened this issue Dec 24, 2008 · 15 comments
Labels
performance Performance or resource usage

Comments

@ebfe
Copy link
Mannequin

ebfe mannequin commented Dec 24, 2008

BPO 4738
Nosy @pitrou, @vstinner
Files
  • zlib_threads-3.diff
  • zlibtest2.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-01-02.17:35:18.144>
    created_at = <Date 2008-12-24.16:29:30.644>
    labels = ['performance']
    title = 'Patch to make zlib-objects better support threads'
    updated_at = <Date 2009-01-02.17:49:52.769>
    user = 'https://bugs.python.org/ebfe'

    bugs.python.org fields:

    activity = <Date 2009-01-02.17:49:52.769>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-01-02.17:35:18.144>
    closer = 'pitrou'
    components = ['None']
    creation = <Date 2008-12-24.16:29:30.644>
    creator = 'ebfe'
    dependencies = []
    files = ['12531', '12532']
    hgrepos = []
    issue_num = 4738
    keywords = ['patch']
    message_count = 15.0
    messages = ['78266', '78282', '78287', '78289', '78318', '78326', '78329', '78331', '78350', '78359', '78669', '78771', '78772', '78849', '78850']
    nosy_count = 3.0
    nosy_names = ['pitrou', 'vstinner', 'ebfe']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue4738'
    versions = ['Python 3.1', 'Python 2.7']

    @ebfe
    Copy link
    Mannequin Author

    ebfe mannequin commented Dec 24, 2008

    My application needs to pack and unpack workunits all day long and does
    this using multiple threading.Threads. I've noticed that the zlib module
    seems to use only one thread at a time when using [de]compressobj(). As
    the comment in the sourcefile zlibmodule.c already says the module uses
    a global lock to protect different threads from accessing the object.
    While the c-functions release the GIL while waiting for the global lock,
    only one thread at a time can use zlib.
    My app ends up using only one CPU to compress/decompress it's workunits...

    The patch (svn diff to ) attached here fixes this problem by extending
    the compressobj-structure by an additional member to create
    object-specific locks and removes the global lock. The lock protects
    each compressobj individually and allows multiple python threads to use
    zlib in parallel, utilizing all available CPUs.

    @ebfe ebfe mannequin added the performance Performance or resource usage label Dec 24, 2008
    @vstinner
    Copy link
    Member

    A call to PyThread_free_lock(this->lock) in Comp_dealloc() and
    Decomp_dealloc(). Comp_dealloc() and Decomp_dealloc() code may also be
    factorized (write a common function to free unused_data,
    unconsumed_tail and self).

    @ebfe
    Copy link
    Mannequin Author

    ebfe mannequin commented Dec 26, 2008

    new svn diff

    @vstinner
    Copy link
    Member

    Ok, the patch looks fine and I like finer locks ;-)

    @vstinner
    Copy link
    Member

    New comments about the last patch:

    • GIL is not released for adler() or crc32() whereas these functions
      may be slow for long strings: just add Py_BEGIN_ALLOW_THREADS /
      Py_END_ALLOW_THREADS before / after adler(...) and crc32(...)
    • (As ENTER_HASHLIB, issue bpo-4751) I think that
      Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS are useless in
      ENTER_ZLIB
    • You might add explicit self to ENTER/LEAVE_ZLIB because the macros
      are now dependent of self (and not the whole module) =>
      ENTER_ZLIB(self) and LEAVE_ZLIB(self)

    Are deflateCopy() and inflateCopy() slow enough to release the GIL?

    @ebfe
    Copy link
    Mannequin Author

    ebfe mannequin commented Dec 27, 2008

    new svn diff attached

    • GIL is now released for adler32 and crc32 if the buffer is larger than
      5kb (we don't want to risk burning cpu cycles by GIL-stuff)
    • adler32 got it's param by s# but now does s* - why s# anyway?
    • ENTER_ZLIB no longer gives away the GIL. It's dangerous and useless as
      there is no pressure on the object's lock.
    • deflateCopy() and inflateCopy() are not worth the trouble.u

    @vstinner
    Copy link
    Member

    Comments on zlib_threads-2.diff:

    • the indentation is strange: don't mix spaces and tabs!
    • I prefer ";" after a call to a macro: "ENTER_ZLIB(self);" instead
      of "ENTER_ZLIB(self)". It makes vim happy (auto indent code correctly)
      and it works for ENTER_ZLIB and LEAVER_ZLIB.
    • ENTER_ZLIB and LEAVER_ZLIB prototype is wrong if WITH_THREAD is not
      defined
    • oh yeah, s* is needed to protect the buffer with a lock
    • why 5kb? is it a random value? I prefer power of two, like 4096
      bytes :-)

    @ebfe
    Copy link
    Mannequin Author

    ebfe mannequin commented Dec 27, 2008

    new svn diff attached

    the indentation in this file is not my fault, it has tabs all over it...

    The 5kb limits protects from the overhead of releasing the GIL. With
    very small buffers the overall runtime in my benchmark tends to double.
    I set it based on my testing and it remains being arbitrary to a certain
    degree. Set the limit to 1 and try 1.000.000 times b'abc'...

    May I also suggest to change the zlib module not to accept s* but y*:

    • Internally zlib operates on bytes, characters don't mean a thing in
      zlib-land.
    • We rely on s* performing the encoding into default for us. This
      behaviour is hidden from the programmer and somewhat violates the rule
      of least surprise.
    • type(zlib.decompress(zlib.compress('abc'))) == bytes
    • Changing from s* to y* forces the programmer to use .encode() on his
      strings (e.g. zlib.compress('abc'.encode()) which very clearly shows
      what's happening.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 27, 2008

    May I also suggest to change the zlib module not to accept s* but y*

    You are probably right, but this would also break the API and can't be
    done lightheartedly. You can open a new bug entry about this though.

    @vstinner
    Copy link
    Member

    I opened a separate issue for the unicode problem: bpo-4757.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 1, 2009

    Same comment about potential deadlocks as in bpo-4751.

    @ebfe
    Copy link
    Mannequin Author

    ebfe mannequin commented Jan 2, 2009

    Here is a small test-script with concurrent access to a single
    compressosbj. The original patch will immediately deadlock.

    The patch attached releases the GIL before trying to get the zlib-lock.
    This allows the other thread to release the zlib-lock but comes at the
    cost of one additional GIL lock/unlock.

    @ebfe
    Copy link
    Mannequin Author

    ebfe mannequin commented Jan 2, 2009

    test-script

    @pitrou
    Copy link
    Member

    pitrou commented Jan 2, 2009

    Checked in r68165. Thanks!

    @pitrou pitrou closed this as completed Jan 2, 2009
    @vstinner
    Copy link
    Member

    vstinner commented Jan 2, 2009

    @pitrou: You added "Also, the GIL is now released when computing the
    CRC of a large buffer." in the NEWS. The GIL released for crc32 but
    also for adler32.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants