New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance regression in socket getsockaddrarg() #66325
Comments
I noticed that socket.sendto() got noticably slower since 3.4 (at least), compared to 2.7: 2.7: 3.4: A profile reveals this: Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 3.4: Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) As can be seen, it's the IDNA encoding which takes a long time, and doesn't appear in the 2.7 profile.
The parsing code (including idna codec) is present in both versions though:
"""
static int
getsockaddrarg(PySocketSockObject *s, PyObject *args,
struct sockaddr *addr_ret, int *len_ret)
[...]
if (!PyArg_ParseTuple(args, "eti:getsockaddrarg",
"idna", &host, &port))
return 0;
""" I'm currently bisecting the commit, but I'm not familiar with encoding stuff. |
IDNA encoding is quite slow (see 6e1071ed4c66). I'm surprised we accept general hosnames in sendto(), though (rather than plain IP addresses). 25 µs per call is a lot for such a function. |
For Python, the encoder is only used when you pass a Unicode string. |
Hm... |
OK, I think I see what you mean: $ ./python -m timeit -s "import socket; s =
socket.socket(socket.AF_INET, socket.SOCK_DGRAM)" "s.sendto(b'hello',
('127.0.0.1', 4242))"10000 loops, best of 3: 44.7 usec per loop
$ ./python -m timeit -s "import socket; s =
socket.socket(socket.AF_INET, socket.SOCK_DGRAM)" "s.sendto(b'hello',
(b'127.0.0.1', 4242))"
10000 loops, best of 3: 23.7 usec per loop That's really surprising, especially since gethostbyname() and $ ./python -c "import socket; print(type(socket.gethostbyname('127.0.0.1')))"
<class 'str'> |
Note that even the bytes version is still quite slow. UDP is used for light-weight protocols where you may send thousands or more messages per second. I'd be curious what the sendto() performance is in raw C. |
"Abc" is a bytes string in Python 2 and an Unicode string in Python 3. |
Ah, I wouldn't rely on the absolyte values, my computer is *slow*. On a more recent machine, I get this: Whereas a C loop gives a 4usec per loop.
Sure, but why do getaddrinfo() and gethostbyname() return strings then? This means that someone using: addr = getaddrinfo(...)
sendto(DATA, addr) Will pay the idna encoding upon every call to sendto(). |
Perhaps it is time to add support of ipaddress objects in socket functions. Then we could avoid address parsing in tight loop not only for Unicode strings, but for bytes strings too. s = socket.socket(...)
addr = ipaddress.ip_address(ipaddress.getaddrinfo(...))
for ...:
s.sendto(DATA, (addr, port)) |
What I was thinking too :-) One common pattern when doing UDP networking is the following: def datagram_received(self, remote_addr, data):
# process data
...
self.send_to(remote_addr, response_data) If you want to pass an ipaddress object to send_to, you have to make it so that datagram_received() gives you an ipaddress object too. Perhaps we need a more low-level solution, e.g. a parsing cache integrated in the C socket module. |
Parsing a bytes object i.e. b'127.0.0.1' is done by inet_pton(), so If we had getaddrinfo() and gethostbyname() return bytes instead of |
Charles-François: you get the idna overhead in 2.7, too, by specifying u'127.0.0.1' as the address. The idna overhead could be bypassed fairly easily in C by:
Before adding caching, I'd check whether a cache lookup is actually faster than calling inet_pton. |
The attached patch makes the difference between Unicode and bytes strings for host names negligible, plus it slightly speeds up the bytes case as well. |
I don't see it in a profile output, and the timing doesn't change |
Please understand that Victor and I were asking you to pass a *unicode* object, with a *u* prefix. For me, the time more-than-doubles, on OSX, with the system python. mvl:~ loewis$ /usr/bin/python -m timeit -s "import socket; s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)" "s.sendto(b'hello', ('127.0.0.1', 4242))" |
Sorry, I misread 'b'. |
This will change the type of exception. If this is acceptable and modulo Antoine's and my nitpicks on Rietveld, the patch LGTM. But it is too complicated. Here is alternative. It has many flaws (less extensible, incompatible with Argument Clinic, can produce inaccurate error message, etc), but it is much simpler. And preserve the type of exception. |
I have updated my patch per the review. |
Serhiy: your patch still changes the type of exception, for s.sendto(b'hello',(u'thisisaverylongstringthisisaverylongstringthisisaverylongstringthisisaverylongstring', 4242)) You get a UnicodeError now, but a socket.gaierror then. This is because the name encodes fine as ascii, but still violates the IDNA requirement on label length. My patch does the same. I don't see where your and my patch differ in behavior. But I agree that your patch is certainly much simpler, while mine might be slightly faster (for not creating copies of the host name). I'm fine with either being applied. Antoine? |
Oh, really.
May be apply your Argument Clinic friendly patch to 3.5 and simple patch to |
Martin's approach looks better to me; also, it could be exported for other modules (for example, the ssl module also requests idna encoding at one place). I don't know if this should be fixed in 3.4. It's a performance improvement, not really a bug fix. |
New changeset bc991d4f9ce7 by Martin v. Löwis in branch 'default': New changeset 0b477934e0a1 by Martin v. Löwis in branch 'default': |
New changeset 49085b746029 by Martin v. Löwis in branch 'default': |
I agree that this doesn't need to be back ported to 3.4, in particular as there is a minor semantic change (for invalid labels, it might perform a DNS lookup, instead of rejecting them right away). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: