New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework uuid module: lazy initialization and add a new C extension #55272
Comments
When the uuid.py module is simply imported it has the side effect of forking a subprocess (/sbin/ldconfig) and doing a lot of stuff find a uuid implementation by ctypes. This is undesirable in many contexts. It would be better to perform those tasks on demand, when the first UUID is actually requested. In general, imports should avoid unnecessary system call side effects. This also makes testing easier. |
I think launching external tools like ifconfig and ipconfig can be avoided pretty easily. There are many recipes around the net how to use native API's. |
With the attached patch the "heavy work" will be done on request, when calling uuid1() or uuid4() not on import. I am working off from the py3k svn branch. Is it necessary to submit a separate patch for py2 branch? |
Kenny, I don't see a problem with uuid is *imported*, it just creates a couple of STANDARD UUID class objects for use later. And this seems to just set the number and validates it. I don't see any subprocess calls performed. Perhaps you were referring to scenarios of using uuid1/uuid5 methods in mac and suggesting improvements to it by your patch? |
If you do 'python -c "import uuid" under strace, _posixsubprocess is definitely loaded, and a pipe2 call is made. Take a look at the code starting at (py3k trunk) line 418 (try:). That's where the weird stuff happens, which is what the patch is addressing. Ken: thanks for working on this. |
Thanks for posting a patch! I have two comments:
|
|
Because ctypes (or, actually, the libffi it relies on) needs specific low-level code for each platform it runs on, and not all platforms have such code. Another reason is that ctypes is dangerous and some administrators might prefer to disable it (especially on shared hosting ala Google App Engine). |
Thanks for pointing that out! I guess that is the reason you did the import |
Maybe I understood and ctypes ImportError simply must be handled and fallbacked to something else. But there are only 3 ways of getting MAC address:
And ctypes seems to be the best choice: it's portable across Python VMs (better that 3) and faster (better than 1). |
Indeed.
Perhaps, but doing without ctypes should still be possible, otherwise |
It's also possible using existing wrapped os system calls. One exaple is here: http://code.google.com/p/pycopia/source/browse/trunk/aid/pycopia/ifconfig.py Although that one doesn't current support MAC addresses, but it could. The socket module also now support the netlink socket on Linux, so it shouldbe possible to use that for getting MAC address on Linux. |
|
I'm thinking Python could use a general purpose ifconfig/mac-layer module that uuid.py could then just use. |
Perhaps, but that's really out of scope for this issue. Feel free to |
The patch hasn’t incorporated Antoine’s comments AFAICT. Also I don’t see this fit for back porting to bug fix releases. Correct me if I’m wrong. |
Hynek, you are right. |
The implementation does not actually end up in infinite loop, just repeating the loading of the CDLL is slow. I added caching to that and fixed the ctypes imports too. |
Jyrki, roundup doesn’t seem to recognize you patch so we can’t review it in Rietveld. Could you re-try, maybe using hg? |
Here's a second take on the patch |
One thing I am wondering about is why we have to use find_library() at all. Wouldn’t it be more robust, and more efficient, to call CDLL() directly? We just have to know the exactly library version we are expecting. On Linux, the full soname is libuuid.so.1. It seems on OS X it is called libc.dylib (but it would be good for someone else to confirm). # The uuid_generate_* routines are provided by libuuid on at least
# Linux and FreeBSD, and provided by libc on Mac OS X.
if sys.platform == "darwin":
libname = "libc.dylib"
else:
libname = "libuuid.so.1"
_ctypes_lib = ctypes.CDLL(libname) |
I cannot comment on uuid directly, but for me, this is yet another example of how assumptions can break things. imho - if you know the exact version of s shared library that you want, calling cdll directly should be find. Maybe find_library is historic. However, an advantage to calling find_library is that it should lead to an OSError if it cannot be loaded. But trapping OSError on a call to ctypes.cdll or testing for None (NULL) from find_library() is the option of a developer using the ctypes interface. As far as libFOO.so.N - do you always want to assume it is going to be version N, or are you expecting to be able to work work version N+X? |
The versioning problem with libFOO.so.N already occurs with compiled programs. A C program compiled against libuuid.so.1 will fail to load if you only have libuuid.so.2. On the other hand, a Python program using find_library() will find either version. My point about robustness is that if a version 2 is invented, it might have different semantics or function signatures, and Python would then be assuming the wrong semantics. |
It sounds like ctypes is causing you some headache. How about we get rid of ctypes for uuid and replace it with a simple implementation in C instead? Autoconf (configure) can take care of the library detection easily. |
There is already bpo-20519 for that, although it looks like the proposed patch keeps ctypes as a fall-back. (My interest here is only theoretical.) |
The way I have seen that resolved - in many locations, is to have the So, I agree wholeheartedly, if a versioned number is requested, that, or It has been over two months, and I may have read it wrong - but that On 08-May-16 05:24, Martin Panter wrote:
|
How often uuid1 is used? I never use it and it looks uuid1 makes uuid.py complicated. I hope PEP-562 is accepted. Without PEP-562, easy way is making proxy function. # uuid/init.py def uuid1(node=None, clock_seq=None):
"""Generate a UUID from a host ID, sequence number, and the current time.
If 'node' is not given, getnode() is used to obtain the hardware
address. If 'clock_seq' is given, it is used as the sequence number;
otherwise a random 14-bit sequence number is chosen."""
from . import _uuid1
return _uuid1.uuid1() |
I abandonned my PR and started to review Antoine's PR 3796 which basically combines all previous patches proposed last 6 years :-) |
I ran two benchmarks on my Fedora 26. git co master git co 8d59aca Import: haypo@selma$ ./python -m perf compare_to import_ref.json import_new.json --table uuid.uuid1(): haypo@selma$ ./python -m perf compare_to uuid1_ref.json uuid1_new.json --table Everything is faster. The import time is 9.4x faster, nice! In practice, the import time is probably even better. My benchmark uses repeated import, it doesn't measure the "first time" import which was more expensive because of the "import ctypes". |
Crude import benchmark (Ubuntu):
$ time ./python -c "import uuid" real 0m0.074s
$ time ./python -c "import uuid" real 0m0.030s
$ time ./python -c pass real 0m0.027s |
uuid fails to build for me on master since that change landed: cpython/Modules/_uuidmodule.c:13:11: error: implicit declaration of function 'uuid_generate_time_safe' is invalid in C99 This is on macOS Sierra. |
It's expected if uuid_generate_time_safe() isn't available on your platform. But test_uuid still passes? |
I would prefer to avoid compilation errors on a popular platforms like macOS. Would it possible to check if uuid_generate_time_safe() is available, maybe in configure? Or we can "simply" skip _uuid compilation on macOS? |
That's probably possible. |
Though I don't know how to reuse the find_file() logic in configure... |
Maybe we could use pkg-config instead? haypo@selma$ pkg-config uuid --cflags |
pkg-config is a Linux-ism. But Linux already works fine... $ uname
Darwin
$ pkg-config
-bash: pkg-config: command not found |
I proposed PR 3855 to add macOS support to _uuid (and fix the compilation error). |
I agree with Barry's comment on PR 3855: "I'd rather see a configure check for the existence of uuid_generate_time_safe() rather than hard coding it to platforms !APPLE for two reasons. 1) If macOS ever adds this API in some future release, this ifndef will be incorrect, and 2) if some other platform comes along that doesn't have this API, it will still use the incorrect function." It's exactly for situations like this that autoconf tests exist; we should not be hardwiring assumptions about the lack of particular platform APIs. |
I think the configure check should be this (sets HAVE_LIBUUID in pyconfig.h): diff --git a/configure.ac b/configure.ac
index 41bd9effbf..90d53c1b7d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2657,6 +2657,7 @@ AC_MSG_RESULT($SHLIBS)
AC_CHECK_LIB(sendfile, sendfile)
AC_CHECK_LIB(dl, dlopen) # Dynamic linking for SunOS/Solaris and SYSV
AC_CHECK_LIB(dld, shl_load) # Dynamic linking for HP-UX
+AC_CHECK_LIB(uuid, uuid_generate_time_safe)
# only check for sem_init if thread support is requested
if test "$with_threads" = "yes" -o -z "$with_threads"; then |
I've followed Stefan's suggestion and opened PR 4287 (tested on 10.10.5) |
Berker's latest patch looks good to me. Unrelated to the patch (same before and after), this looks odd to me: >>> import uuid
>>> uuid._has_uuid_generate_time_safe is None
True
>>>
>>> import _uuid
>>> _uuid.has_uuid_generate_time_safe
1 [Also, I thought we weren't supposed to use ctypes in the stdlib.] |
""" Unrelated to the patch (same before and after), this looks odd to me: >>> import uuid
>>> uuid._has_uuid_generate_time_safe is None
True
>>>
>>> import _uuid
>>> _uuid.has_uuid_generate_time_safe
1
""" None means "not initialized yet". It's initialized on demand, at the first call of uuid1() or get_node(): $ python3
Python 3.7.0a2+ (heads/master:a5293b4ff2, Nov 6 2017, 12:22:04)
>>> import uuid
>>> uuid._has_uuid_generate_time_safe # == None
>>> uuid.uuid1()
UUID('3e5a7628-c2e5-11e7-adc1-3ca9f4650c0c')
>>> uuid._has_uuid_generate_time_safe
1
Antoine's commit a106aec avoids ctypes when libuuid is available. For the other systems without libuuid, well, it was probably simpler to use ctypes. ctypes was more popular a few years ago. The code "just works" and I guess that nobody wants to touch it :-) |
Does this check work? I tried similar checks with other functions and they were falsely passed because calling an undeclared function is not an error in C. The reliable check of the existence of the uuid_generate_time_safe function is:
|
On Wed, Nov 08, 2017 at 08:28:10PM +0000, Serhiy Storchaka wrote:
Not here. If I replace uuid_generate_time_safe with a non-existing function checking for uuid_generate_time_unsafe... yes The originally suggested AC_CHECK_LIB macro however works here in both cases. :) |
It worked for me on OS X (returns no) and Linux (returns yes after I installed uuid-dev) but I didn't test it on both systems at the same. Travis CI also returned 'no': https://travis-ci.org/python/cpython/jobs/299285001 In any case, Serhiy's suggestion is better than mine so I've opened PR 4343. And yes, I'm beginning to regret my decision on not using AC_CHECK_LIB :) |
The header file check in setup.py incorrectly reported "not found" if |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: