This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: A race condition with GIL releasing exists in stringlib_bytes_join
Type: Stage: resolved
Components: Versions: Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: bmerry, malin, methane, tzickel
Priority: normal Keywords:

Created on 2020-03-16 06:02 by tzickel, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (11)
msg364288 - (view) Author: (tzickel) * Date: 2020-03-16 06:01
bpo 36051 added optimization to release GIL on certain conditions of bytes joining, but it has missed a critical path.

If the number of items joining is less or equal to NB_STATIC_BUFFERS (10) than static_buffers will be used to hold the buffers.

https://github.com/python/cpython/blob/5b66ec166b81c8a77286da2c0d17be3579c3069a/Objects/stringlib/join.h#L54

But then the decision to release the GIL or not (drop_gil) does not take this into consideration, and the GIL might be released and then another thread is free to do the same code path, and hijack the static_buffers for it's own usage, causing a race condition.

A decision should be made if it's worth for the optimization to not use the static buffers in this case (although it's an early part of the code...) or not drop the GIL anyhow if it's static buffers (a thing which might make this optimization not worth it, since based on length of data to join, and not number of items to join).
msg364290 - (view) Author: Bruce Merry (bmerry) * Date: 2020-03-16 06:30
Good catch! I'll take a look this week to see what makes sense for the use case for which I originally proposed this optimisation.
msg364292 - (view) Author: (tzickel) * Date: 2020-03-16 07:17
Also, in line:

https://github.com/python/cpython/blob/d07d9f4c43bc85a77021bcc7d77643f8ebb605cf/Objects/stringlib/join.h#L85

perhaps add an if to check if the backing object is really mutable ? (Py_buffer.readonly)
msg364294 - (view) Author: (tzickel) * Date: 2020-03-16 07:38
Also, semi related, (dunno where to discuss it), would a good .join() optimization be to add an optional length parameter, like .join(iterable, length=10), and when running in that code-path, it would skip all the calls to (PySequence_Fast which converts no list to list), and all the pre calculation of length to calculate allocation size, and instead directly start copying from input until length is done, and if the iterable didn't have enough length to fill up, only then throw an exception ?

There are places where you know how much information you expect to be .joining (or you want to join just a part of it) ?
msg364299 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-03-16 09:38
static_buffers is not a static variable. It is auto local variable.
So I think other thread don't hijack it.
msg364302 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-03-16 09:43
>
>
>
> perhaps add an if to check if the backing object is really mutable ?
> (Py_buffer.readonly)
>
>

Py_buffer.readonly doesn't mean immutable.  You can create read only buffer
from bytearray too.

Current logic uses PyBytes_CheckExact.  It is safe and enough for most
cases.
msg364304 - (view) Author: Bruce Merry (bmerry) * Date: 2020-03-16 09:50
> static_buffers is not a static variable. It is auto local variable.
> So I think other thread don't hijack it.

Oh yes, quite right. I should have looked closer at the code first before commenting. I think this can be closed as not-a-bug, unless +tzickel has example code that gives the wrong output?

> perhaps add an if to check if the backing object is really mutable ? (Py_buffer.readonly)

It's not just the buffer data being mutable that's an issue, it's the owning object. It's possible for an object to expose a read-only buffer, but also allow the buffer (including its size or address) to be mutated through its own API.

> Also, semi related, (dunno where to discuss it), would a good .join() optimization be to add an optional length parameter, like .join(iterable, length=10)

You could always open a separate bug for it, but I can't see it catching on given that one needs to modify one's code for it.
msg364314 - (view) Author: Ma Lin (malin) * Date: 2020-03-16 13:06
I also planned to review this commit at some moment, I feel a bit unsteady about it.

If an optimization needs to be fine-tuned, and may introduces some pitfalls for future code maintenance, IMHO it is best to avoid doing this kind of optimization.
msg364338 - (view) Author: (tzickel) * Date: 2020-03-16 16:51
My mistake...
msg364500 - (view) Author: (tzickel) * Date: 2020-03-18 06:42
Regarding getting the buffer and releasing the GIL, if it's wrong, why not fix other places in the code that do it, like:

https://github.com/python/cpython/blob/611836a69a7a98bb106b4d315ed76a1e17266f4f/Modules/posixmodule.c#L9619

The GIL is released, the syscall might be blocking, and iov contains pointers to buffers ?
msg364594 - (view) Author: Bruce Merry (bmerry) * Date: 2020-03-19 06:04
+tzickel I'd suggest reading the discussion in issue 36051, and maybe raising a new issue about it if you still have concerns. In short, dropping the GIL in more bytes.join cases wouldn't necessarily be wrong, but it might break code that made the assumption that bytes.join is atomic even though that's never been claimed.
History
Date User Action Args
2022-04-11 14:59:28adminsetgithub: 84155
2020-03-19 06:04:30bmerrysetmessages: + msg364594
2020-03-18 06:42:05tzickelsetmessages: + msg364500
2020-03-17 02:19:35methanesetstatus: open -> closed
stage: resolved
2020-03-16 16:51:27tzickelsetresolution: not a bug
messages: + msg364338
2020-03-16 13:06:36malinsetnosy: + malin
messages: + msg364314
2020-03-16 09:50:19bmerrysetmessages: + msg364304
2020-03-16 09:43:11methanesetmessages: + msg364302
2020-03-16 09:38:37methanesetmessages: + msg364299
2020-03-16 07:38:21tzickelsetmessages: + msg364294
2020-03-16 07:17:33tzickelsetmessages: + msg364292
2020-03-16 06:30:03bmerrysetmessages: + msg364290
2020-03-16 06:06:31tzickelsetnosy: + methane, bmerry
2020-03-16 06:02:00tzickelcreate