Issue31622
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017-09-29 00:39 by pdox, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (14) | |||
---|---|---|---|
msg303291 - (view) | Author: (pdox) * | Date: 2017-09-29 00:39 | |
Currently, Python exposes "thread identifiers" as unsigned long values across multiple modules: threading.get_ident() -> Returns an integer thread id sys._current_frames() -> Dictionary keys are integer thread ids signal.pthread_kill() -> Accepts an arbitrary integer thread id In reality, OS level thread identifiers are opaque types. On a system with pthreads, the thread_id that Python provides is merely pthread_t casted to unsigned long. This works today, but is in violation of the standard, and could break on systems with exotic pthread_t. There is also the ability to introduce undefined behavior, such as sending a signal to an invalid thread id: >>> signal.pthread_kill(42, signal.SIGHUP) Segmentation fault (core dumped) Changing the thread identifiers to an opaque type (which also tracks the validity of the thread id, for blocking use of expired system thread ids) will solve these two issues. |
|||
msg303293 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2017-09-29 01:03 | |
Are you aware of PEP 539? (See issue 25658 for discussion.) I haven't followed that closely, but I imagine this issue is at least related. |
|||
msg303470 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2017-10-01 17:34 | |
> On a system with pthreads, the thread_id that Python provides is merely pthread_t casted to unsigned long. This works today, but is in violation of the standard, and could break on systems with exotic pthread_t. I don't think that follows. Even if pthread_t is not an integer, it does have a binary representation that can be trivially converted to a arbitrary-sized Python int in Python-facing APIs such as threading.get_ident(). (and similarly, of course, for converting in the other direction e.g. for pthread_kill()) > There is also the ability to introduce undefined behavior, such as sending a signal to an invalid thread id: I wouldn't worry much about pthread_kill(), a little-used feature. |
|||
msg303480 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2017-10-01 19:03 | |
You don't think having a function that takes an integer and dereferences it as memory is a bad idea? |
|||
msg303481 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2017-10-01 19:05 | |
> You don't think having a function that takes an integer and dereferences it as memory is a bad idea? I'm not sure what you're talking about. What function is that? |
|||
msg303482 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2017-10-01 19:07 | |
On Sun, Oct 1, 2017, at 12:05, Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > > > You don't think having a function that takes an integer and dereferences it as memory is a bad idea? > > I'm not sure what you're talking about. What function is that? signal.pthread_kill and pdox's proposed time.pthread_getcpuclockid on https://github.com/python/cpython/pull/3756. |
|||
msg303483 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2017-10-01 19:15 | |
Le 01/10/2017 à 21:07, Benjamin Peterson a écrit : > > signal.pthread_kill and pdox's proposed time.pthread_getcpuclockid on > https://github.com/python/cpython/pull/3756 Given how specialized (and of little actual use) those functions are, I don't think it's much of a problem if they misbehave if you deliberately pass an invalid thread id. At least it sounds less of a problem than breaking all code that expects a thread id to be an integer. (btw, pthread_getcpuclockid returns ESRCH on Linux if called with an invalid thread id) |
|||
msg303485 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2017-10-01 19:20 | |
On Sun, Oct 1, 2017, at 12:15, Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > > Le 01/10/2017 à 21:07, Benjamin Peterson a écrit : > > > > signal.pthread_kill and pdox's proposed time.pthread_getcpuclockid on > > https://github.com/python/cpython/pull/3756 > > Given how specialized (and of little actual use) those functions are, I > don't think it's much of a problem if they misbehave if you deliberately > pass an invalid thread id. Okay, so you think this PR is reasonable? > > At least it sounds less of a problem than breaking all code that expects > a thread id to be an integer. I suppose, though you can't usefully use it as an integer. I imagine a comparable and hashable opaque object would be compatible with most code. > > (btw, pthread_getcpuclockid returns ESRCH on Linux if called with an > invalid thread id) More precisely, glibc has a heuristic check for thread id validity, which dereferences whatever you pass in. It won't save you from crashing. |
|||
msg303486 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2017-10-01 19:25 | |
Le 01/10/2017 à 21:20, Benjamin Peterson a écrit : > >>> signal.pthread_kill and pdox's proposed time.pthread_getcpuclockid on >>> https://github.com/python/cpython/pull/3756 >> >> Given how specialized (and of little actual use) those functions are, I >> don't think it's much of a problem if they misbehave if you deliberately >> pass an invalid thread id. > > Okay, so you think this PR is reasonable? On the principle yes. I am not interested enough in this feature to look at the implementation. Another possibility would be to have a new separate threading API returning a "low-level opaque thread handle" that you could pass to pthread_kill() and pthread_getcpuclockid(). threading.get_ident() could still be used for logging purposes and others. A related issue: what if you want to call pthread_kill() on a non-Python thread? > I suppose, though you can't usefully use it as an integer. I imagine a > comparable and hashable opaque object would be compatible with most > code. What happens for code that uses e.g. "%x" to format thread ids? |
|||
msg303487 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2017-10-01 19:34 | |
On Sun, Oct 1, 2017, at 12:25, Antoine Pitrou wrote: > Another possibility would be to have a new separate threading API > returning a "low-level opaque thread handle" that you could pass to > pthread_kill() and pthread_getcpuclockid(). threading.get_ident() could > still be used for logging purposes and others. > > A related issue: what if you want to call pthread_kill() on a non-Python > thread? C code that provides pthread_t to Python would have to wrap it. > > > I suppose, though you can't usefully use it as an integer. I imagine a > > comparable and hashable opaque object would be compatible with most > > code. > > What happens for code that uses e.g. "%x" to format thread ids? It's okay for the opaque object to have a int() conversion. The important part is that threading apis don't accept ints. |
|||
msg303493 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2017-10-01 21:13 | |
> Currently, Python exposes "thread identifiers" as unsigned long values across multiple modules: It happened not so long ago. In 3.6 and earlier "thread identifiers" are signed integers. See issue6532, it was massive change. This issue was deferred for long time due to the afraid of breaking the word. And it still was not tested with third-party code. >> What happens for code that uses e.g. "%x" to format thread ids? >It's okay for the opaque object to have a int() conversion. What happens for C code that uses "%lu" to format thread ids? |
|||
msg303511 - (view) | Author: Nick Coghlan (ncoghlan) * ![]() |
Date: 2017-10-02 07:24 | |
Note that it would be entirely possible to leave the Python level thread IDs alone, but change the way they were used internally to keep the thread ID distinct from the operating system level thread handle. That would limit the adjustment to signal.pthread_kill() and other APIs that wanted to resolve the integer ID back to an OS level handle. |
|||
msg303575 - (view) | Author: (pdox) * | Date: 2017-10-03 03:42 | |
If we don't want to change the type of get_ident(), there is another option. We could have PyThread keep track of which thread ids (e.g. pthread_t) values are valid (meaning, the thread is still running). This could be tracked in a global data structure (protected by a mutex) internal to PyThread. This can be done automatically for threads created by PyThread, and threads where Python is running (in between PyGILState_Ensure/PyGILState_Release, or other entry functions). If PyThread had this ability, then pthread_kill, etc. could raise an exception when the thread_id is invalid. The tricky part would be non-Python-created threads. There two obvious options there... one would be to provide explicit PyThread_Register/PyThread_Unregister functions that a thread could call to make itself known to PyThread. The other, would be to allow the use of unsafe thread_id's, as long as an override flag is provided. (e.g. pthread_kill(thread_id, sig, allow_unsafe=True). To take the abstraction one step further, we could completely hide the OS-level thread identifier inside PyThread, and instead generate our own ids, guaranteeing that they will always be integers. (and perhaps also guaranteeing a higher level of uniqueness) (This is not so unusual, as pthread_t is itself an abstraction, unrelated to kernel-level thread ids. On linux, the kernel identifies threads using globally unique TIDs, which can be fetched with SYS_gettid. This value does not match pthread_self(). With GLIBC, the value of pthread_t is (usually) a pointer to the bottom of the stack for the thread, which holds a thread descriptor (placed there by pthread_create). For this reason, pthread doesn't mesh well with threads that it didn't create (e.g., calling pthread_self in a thread which was created with clone(), will not do the right thing). pthread_kill is essentially a wrapper around tkill(), which first resolves the pthread_t into a TID.) |
|||
msg304369 - (view) | Author: (pdox) * | Date: 2017-10-13 23:25 | |
I don't see much enthusiasm or agreement here, so I'm closing for now. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:52 | admin | set | github: 75803 |
2017-10-13 23:25:16 | pdox | set | status: open -> closed resolution: postponed messages: + msg304369 stage: resolved |
2017-10-03 03:43:00 | pdox | set | messages: + msg303575 |
2017-10-02 07:24:52 | ncoghlan | set | messages: + msg303511 |
2017-10-01 21:13:45 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg303493 |
2017-10-01 19:34:10 | benjamin.peterson | set | messages: + msg303487 |
2017-10-01 19:25:55 | pitrou | set | messages: + msg303486 |
2017-10-01 19:20:54 | benjamin.peterson | set | messages: + msg303485 |
2017-10-01 19:15:08 | pitrou | set | messages: + msg303483 |
2017-10-01 19:07:10 | benjamin.peterson | set | messages: + msg303482 |
2017-10-01 19:05:58 | pitrou | set | messages: + msg303481 |
2017-10-01 19:03:26 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg303480 |
2017-10-01 17:34:52 | pitrou | set | nosy:
+ tim.peters messages: + msg303470 |
2017-10-01 16:56:08 | rhettinger | set | nosy:
+ ncoghlan, pitrou |
2017-09-29 01:03:27 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg303293 |
2017-09-29 00:39:32 | pdox | create |