msg172682 - (view) |
Author: Michele Orrù (maker) * |
Date: 2012-10-11 19:31 |
Buggy due to the use of scanf at Modueles/socketmodule.c:868
Broken from python2.7 to tip on my machine (GNU/Linux)
>>> import socket
[64481 refs]
>>> socket.gethostbyname('4294967306.4294967296.4294967296.1')
'10.0.0.1'
[67764 refs]
>>> socket.gethostbyname('192.168.1.1 ')
'192.168.1.1'
[67764 refs]
>>> socket.gethostbyname(' 192.168.1.1')
'192.168.1.1'
[67764 refs]
>>> socket.gethostbyname(' 192.168.1.1 ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno -2] Name or service not known
[67805 refs]
|
msg172700 - (view) |
Author: Santoso Wijaya (santoso.wijaya) * |
Date: 2012-10-11 22:07 |
Attaching patch to trim leading and trailing whitespaces prior to processing.
Incidentally, this also means:
>>> socket.gethostbyname('')
'0.0.0.0'
>>> socket.gethostbyname(' ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno -5] No address associated with hostname
But I'm not sure if we're okay with changing that semantics, so I left it be in this patch.
|
msg172723 - (view) |
Author: Michele Orrù (maker) * |
Date: 2012-10-12 07:34 |
> Attaching patch to trim leading and trailing whitespaces prior to
> processing.
Note that tests are incorrect: the parsing is of the form %d.%d.%d.%d%c, so the parser should accept trailing spaces. That's the same for ping iirc:
$ ping "192.168.1.1 "
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
--- 192.168.1.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1007ms
$ ping " 192.168.1.1"
ping: unknown host 192.168.1.1
I am trying to get out with a simple parser inside the function, probabl y strtol() is the way.
Since Python's C API does not provide any sscanf wrapper, I thought about adding a new one. But given that, AFAIS it appears just two times over the entire sourcecode, there is no need IMO for exporting a new one right now. What is your opinion?
|
msg172787 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-10-12 21:59 |
> Buggy due to the use of scanf at Modueles/socketmodule.c:868
I don't think so. The following test fails because sscanf() returns 5 instead of 4:
if (sscanf(name, "%d.%d.%d.%d%c", &d1, &d2, &d3, &d4, &ch) == 4 && ...)
So '192.168.1.1 ' is passed to getaddrinfo().
If you consider that '192.168.1.1 ' is an invalid name, you should report the issue to the vendor of the C library of your OS.
--
On Linux, the "issue" is also present in inet_ntoa():
$ python
Python 2.7.3 (default, Jul 24 2012, 11:41:40)
[GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux2
>>> import ctypes
>>> buffer=ctypes.create_string_buffer(128)
>>> libc=ctypes.cdll.LoadLibrary("libc.so.6")
>>> libc.inet_aton(b"127.0.0.1", ctypes.byref(buffer))
1
>>> buffer.raw[:4]
'\x7f\x00\x00\x01'
>>> libc.inet_aton(b"127.0.0.2 ", ctypes.byref(buffer))
1
>>> buffer.raw[:4]
'\x7f\x00\x00\x02'
The source code of the inet_aton() function of the GNU libc:
http://sourceware.org/git/?p=glibc.git;a=blob;f=resolv/inet_addr.c;h=144b87a74c1aef62779862e78e5f87e18e66ee9e;hb=HEAD#l108
Website of the GNU libc:
http://www.gnu.org/software/libc/
|
msg172801 - (view) |
Author: Michele Orrù (maker) * |
Date: 2012-10-13 08:59 |
>
>> Buggy due to the use of scanf at Modueles/socketmodule.c:868
>
> I don't think so. The following test fails because sscanf() returns 5
> instead of 4:
You are right, sorry for didn't notice.
> If you consider that '192.168.1.1 ' is an invalid name, you should
> report the issue to the vendor of the C library of your OS.
Yep, I'm going to ask on #glibc if there is any specific reason for accepting trailing spaces. But this concerns the specific underlying implementation if getaddrinfo(), so IMO we should just update the int overflow issue for now, maintaining the coherence with other utilities having the same "problem". For this, I am attaching a new patch which which just do not test for trailing whitespaces.
--
ù
|
msg172802 - (view) |
Author: Michele Orrù (maker) * |
Date: 2012-10-13 09:44 |
Reviewed with Ezio.
|
msg172825 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2012-10-13 19:55 |
I don't think uint32_t exists everywhere, you should use "unsigned int" instead.
|
msg172840 - (view) |
Author: Michele Orrù (maker) * |
Date: 2012-10-13 23:27 |
Updated && tested on GNU/Linux (gcc).
|
msg196304 - (view) |
Author: Michele Orrù (maker) * |
Date: 2013-08-27 18:15 |
Ping.
|
msg196452 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-08-29 10:07 |
Stupid question: why use scanf()/strtol() to parse an IP address, and not simply inet_pton()/inet_aton(), which are made precisely for that purpose?
inet_pton() is POSIX (it was introduced at the same time as getaddrinfo(), which is used unconditionally a couple lines below). If it's not available, we could use inet_aton() as fallback (which is amusingly not POSIX, but I doubt we'll find a platform with neither inet_pton() nor inet_aton()).
Using inet_pton() would have the added advantage of working with IPv6 addresses: right now, a numeric IPv6 address is always resolved by getaddrinfo(), which results in a performance overhead:
$ python -m timeit -s "from socket import *; s = socket(AF_INET, SOCK_DGRAM); DATA = 'hello'" "s.sendto(DATA, ('127.0.0.1', 42))"
100000 loops, best of 3: 6.86 usec per loop
$ python -m timeit -s "from socket import *; s = socket(AF_INET6, SOCK_DGRAM); DATA = 'hello'" "s.sendto(DATA, ('::1', 42))"
10000 loops, best of 3: 24.2 usec per loop
|
msg196454 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-08-29 10:12 |
Note that for IPv6 addresses, we should just extract manually the
scope ID ('%' prefix) if present.
|
msg196460 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-08-29 14:13 |
Here's a patch using inet_pton() if available, otherwise inet_aton()
if available, otherwise fallback to getaddrinfo().
This should work on every platform, but if a platform has neither
inet_pton() nor inet_aton(), calling getaddrinfo() will incur an
overhead.
All Unices have either inet_aton() or inet_pton().
For Windows >= Vista, inet_pton() is available.
I'm not sure whether XP has one of them: if not, then we might keep
the hand-parsing as a fallback.
This adds support for numeric IPv6 addresses (if inet_pton() is
available), which notably speeds up sendto() on IPv6 sockets, and
gives a tiny speedup for IPv4 sockets:
before:
$ ./python -m timeit -s "import socket; s =
socket.socket(socket.AF_INET, socket.SOCK_DGRAM); DATA = b'x'"
"s.sendto(DATA, ('127.0.0.1', 4242))"
100000 loops, best of 3: 13.2 usec per loop
$ ./python -m timeit -s "import socket; s =
socket.socket(socket.AF_INET6, socket.SOCK_DGRAM); DATA = b'x'"
"s.sendto(DATA, ('::1', 4242))"
10000 loops, best of 3: 32.1 usec per loop
after:
$ ./python -m timeit -s "import socket; s =
socket.socket(socket.AF_INET, socket.SOCK_DGRAM); DATA = b'x'"
"s.sendto(DATA, ('127.0.0.1', 4242))"
100000 loops, best of 3: 11.5 usec per loop
$ ./python -m timeit -s "import socket; s =
socket.socket(socket.AF_INET6, socket.SOCK_DGRAM); DATA = b'x'"
"s.sendto(DATA, ('::1', 4242))"
100000 loops, best of 3: 12.8 usec per loop
Note that if the IPv6 address contains a scope ID ('%' suffix), then
we fallback to getaddrinfo() which handles it properly (the scope ID
can be an interface index but also an interface name).
|
msg196462 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-08-29 15:30 |
> For Windows >= Vista, inet_pton() is available.
> I'm not sure whether XP has one of them: if not, then we might keep
> the hand-parsing as a fallback.
Apparently, XP doesn't have inet_aton()...
So I changed the code to use inet_pton() if available, otherwise use inet_addr(), which is available everywhere (the only gotcha is that it doesn't handle "255.255.255.255", so we special-case it like "<broadcast>").
We thus get rid of the hand-parsing completely, and get a more efficient handling of numeric IPv4 addresses, and a much more efficient handling for numeric IPv6 addresses (see above benchmarks).
|
msg196465 - (view) |
Author: Michele Orrù (maker) * |
Date: 2013-08-29 17:13 |
Cool, thanks for pointing the correct library function.
Tested on my machine - Debian 3.9 x86_64 GNU/Linux.
|
msg196863 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-09-03 18:54 |
Here's an updated patch with new tests.
It passes the regression test, and yields noticable performance improvements for IPv6:
before:
$ ./python -m timeit -s "import socket; s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM); DATA = b'hello'" "s.sendto(DATA, ('127.0.0.1', 4242)) "
10000 loops, best of 3: 28 usec per loop
$ ./python -m timeit -s "import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM); DATA = b'hello'" "s.sendto(DATA, ('::1', 4242)) "
10000 loops, best of 3: 59.1 usec per loop
after:
$ ./python -m timeit -s "import socket; s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM); DATA = b'hello'" "s.sendto(DATA, ('127.0.0.1', 4242)) "
10000 loops, best of 3: 24.8 usec per loop
$ ./python -m timeit -s "import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM); DATA = b'hello'" "s.sendto(DATA, ('::1', 4242)) "
10000 loops, best of 3: 26.7 usec per loop
Note that the tests aren't as good as I'd like them to, because apparently some systems (e.g. Solaris) have broken gethostbyaddr()...
But it's cleaner, more robust and more efficient.
|
msg196897 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-09-04 08:58 |
Perhaps you could add tests for the broadcast special cases?
|
msg197622 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-09-13 17:54 |
New changeset 540a9c69c2ea by Charles-François Natali in branch 'default':
Issue #16201: socket: Use inet_pton()/inet_addr() instead of ad-hoc parsing for
http://hg.python.org/cpython/rev/540a9c69c2ea
|
msg197693 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-09-14 07:22 |
Alright, I committed it to default.
I don't think it should go into 2.7 or 3.3, since it's more an
enhancement than a bug fix.
|
msg197705 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-09-14 12:06 |
Closing, thanks for the report!
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:37 | admin | set | github: 60405 |
2013-09-14 12:06:53 | neologix | set | status: open -> closed versions:
- Python 2.7, Python 3.3 messages:
+ msg197705
resolution: fixed stage: patch review -> resolved |
2013-09-14 07:22:59 | neologix | set | messages:
+ msg197693 |
2013-09-13 17:54:21 | python-dev | set | nosy:
+ python-dev messages:
+ msg197622
|
2013-09-04 08:58:14 | pitrou | set | versions:
- Python 3.1, Python 3.2, Python 3.5 |
2013-09-04 08:58:04 | pitrou | set | messages:
+ msg196897 |
2013-09-03 18:54:17 | neologix | set | files:
+ parse_inet-1.diff
messages:
+ msg196863 |
2013-08-29 17:13:31 | maker | set | messages:
+ msg196465 |
2013-08-29 15:30:45 | neologix | set | files:
- parse_inet.diff |
2013-08-29 15:30:38 | neologix | set | keywords:
+ needs review files:
+ parse_inet.diff messages:
+ msg196462
stage: patch review |
2013-08-29 14:13:59 | neologix | set | files:
+ parse_inet.diff
messages:
+ msg196460 |
2013-08-29 10:12:56 | neologix | set | messages:
+ msg196454 |
2013-08-29 10:07:44 | neologix | set | nosy:
+ neologix messages:
+ msg196452
|
2013-08-29 09:34:06 | neologix | link | issue18806 superseder |
2013-08-27 18:15:12 | maker | set | messages:
+ msg196304 |
2012-10-13 23:27:46 | maker | set | files:
+ issue16201.3.patch
messages:
+ msg172840 |
2012-10-13 19:55:13 | pitrou | set | nosy:
+ pitrou messages:
+ msg172825
|
2012-10-13 09:44:59 | maker | set | files:
+ issue16201.2.patch
messages:
+ msg172802 |
2012-10-13 08:59:24 | maker | set | files:
+ issue16201.1.patch
messages:
+ msg172801 |
2012-10-12 21:59:41 | vstinner | set | nosy:
+ vstinner messages:
+ msg172787
|
2012-10-12 08:13:52 | maker | set | files:
+ issue16201.patch |
2012-10-12 07:34:16 | maker | set | nosy:
+ ezio.melotti messages:
+ msg172723
|
2012-10-11 22:07:21 | santoso.wijaya | set | files:
+ gethostbyname_whitespace.patch keywords:
+ patch messages:
+ msg172700
|
2012-10-11 20:55:48 | santoso.wijaya | set | nosy:
+ santoso.wijaya
type: behavior components:
+ Extension Modules versions:
+ Python 2.7 |
2012-10-11 19:31:02 | maker | create | |