classification
Title: socket.gethostbyname resolving octal IP addresses incorrectly
Type: behavior Stage:
Components: Library (Lib), macOS Versions: Python 3.3, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, koobs, mattrobenolt, ned.deily, r.david.murray, ronaldoussoren, vstinner, xiang.zhang
Priority: normal Keywords:

Created on 2016-07-25 06:27 by mattrobenolt, last changed 2016-07-26 16:06 by r.david.murray.

Files
File name Uploaded Description Edit
socket-test-freebsd-9-10-11-python-27-33-34-35.txt koobs, 2016-07-26 07:15
Messages (19)
msg271237 - (view) Author: Matt Robenolt (mattrobenolt) Date: 2016-07-25 06:27
This also affects socket.getaddrinfo on macOS only, but is fine on Linux. I've not tested on Windows to see behavior there.

Given the IP address `0177.0000.0000.0001`, which is a valid octal format representing `127.0.0.1`, we can see varying results. Confirmed in both python 2.7 and 3.5.

First, socket.gethostbyname is always wrong, and always returns `177.0.0.1`:

```
>>> socket.gethostbyname('0177.0000.0000.0001')
'177.0.0.1'
```

This can be seen on both Linux and macOS.

With `socket.getaddrinfo`, resolution is correct on Linux, but the bad 177.0.0.1 on macOS.

Linux:
```
>>> socket.getaddrinfo('0177.0000.0000.0001', None)[0]
(2, 1, 6, '', ('127.0.0.1', 0))
```

macOS:
```
>>> socket.getaddrinfo('0177.0000.0000.0001', None)[0]
(2, 2, 17, '', ('177.0.0.1', 0))
```

This behavior exists in both 2.7.12 and 3.5.2 at least. I haven't tested many others, but I assume pretty universal.
msg271265 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-25 13:14
This would appear to be a platform OS issue.  Is it "broken" also for FreeBSD?  (I put broken in quotes because  interpreting ocatal isn't part of the posix speck for gethostbyname.  It could even be an accident that it works on Linux.

I'm not going to close this yet, since it might be worth a doc issue, or at least documenting here what the status of this is on FreeBSD.
msg271266 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-25 13:16
To clarify: by platform OS issue, I mean that the octal-conversion-or-not is none of Python's doing, it is done by the C library call that gethostbyname is a thin wrapper around.
msg271271 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-25 13:46
On Linux, it seems it's not an accident. inet_addr(3) explicitly says it can handle octal or haxadecimal forms.
msg271283 - (view) Author: Matt Robenolt (mattrobenolt) Date: 2016-07-25 15:22
Sorry, to add a data point, in C, `gethostbyname` also does the correct thing on macOS.

See:

```
#include <stdio.h>
#include <errno.h>
#include <netdb.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(int argc, char *argv[]) {
    int i;
    struct hostent *lh = gethostbyname("0177.0000.0000.0001");
    struct in_addr **addr_list;

    if (lh) {
        addr_list = (struct in_addr **)lh->h_addr_list;
        for (i=0; addr_list[i] != NULL; i++) {
            printf("%s", inet_ntoa(*addr_list[i]));
        }
        printf("\n");
    } else {
        herror("gethostbyname");
    }

    return 0;
}
```

So I'm not sure this is platform specific.

Either way, `socket.gethostbyname` is wrong on both linux and macOS. I'm a bit lost with what's going on here though, admittedly. :)
msg271284 - (view) Author: Matt Robenolt (mattrobenolt) Date: 2016-07-25 15:25
And lastly, it seems that `socket.gethostbyname_ex` _does_ work correctly on both platforms.

```
>>> socket.gethostbyname_ex('0177.0000.0000.0001')
('0177.0000.0000.0001', [], ['127.0.0.1'])
```
msg271289 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-25 15:44
Hmm.  Since gethostbyname is a deprecated interface, perhaps there is nothing to do here.

However, if someone wants to investigate further and finds a fix, we will evaluate it.
msg271290 - (view) Author: Matt Robenolt (mattrobenolt) Date: 2016-07-25 15:48
Is it worth investigating the different behavior then with `getaddrinfo` between platforms? As far as I know, that's the only method that works with both ipv6 and will tell you "here are all the IP addresses this resolves to".
msg271293 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-07-25 16:05
A similar bug report can be seen at https://github.com/dotnet/corefx/issues/8362. There someone makes a conclusion that getaddrinfo (Python seems to use getaddrinfo to implement gethostbyname) doesn't work correctly with octal form. They finally ignore this inconsistent behaviour.
msg271294 - (view) Author: Matt Robenolt (mattrobenolt) Date: 2016-07-25 16:18
Ah, I just confirmed broken behavior in macOS as well using `getaddrinfo()` in C.

I guess I'd be ok with python ignoring this as well. Maybe worth a change to documentation to note this?
msg271304 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-07-25 20:22
socket.gethostbyname calls the internal function setipaddr, which tries to avoid a name resolution by first calling either inet_pton or inet_addr. Otherwise it calls getaddrinfo.

Windows
-------

setipaddr calls inet_addr, which supports octal [1]. ctypes example:

    ws2_32 = ctypes.WinDLL('ws2_32')
    in_addr = ctypes.c_ubyte * 4
    ws2_32.inet_addr.restype = in_addr

    >>> ws2_32.inet_addr(b'0177.0000.0000.0001')[:]
    [127, 0, 0, 1]

3.5+ could call inet_pton since it was added in Vista. However, it does not support octal:

    >>> addr = in_addr()
    >>> ws2_32.inet_pton(socket.AF_INET, b'0177.0000.0000.0001', addr)
    0
    >>> ws2_32.inet_pton(socket.AF_INET, b'127.0.0.1', addr)
    1
    >>> addr[:]
    [127, 0, 0, 1]

socket.inet_pton instead calls WSAStringToAddressA, which does support octal:

    >>> list(socket.inet_pton(socket.AF_INET, '0177.0000.0000.0001'))
    [127, 0, 0, 1]

socket.gethostbyname_ex calls gethostbyname since gethostbyname_r isn't defined. This does not support octal and errors out:

    >>> socket.gethostbyname_ex('0177.0000.0000.0001')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    socket.herror: [Errno 11001] host not found

getaddrinfo also does not support octal and errors out:

    >>> socket.getaddrinfo('0177.0000.0000.0001', None)[0]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Program Files\Python35\lib\socket.py", line 732, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno 11001] getaddrinfo failed
    
    >>> ctypes.FormatError(11001)
    'No such host is known.'

[1]: https://msdn.microsoft.com/en-us/library/ms738563#internet_addresses
msg271343 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2016-07-26 07:15
@David 

The symptoms from FreeBSD look a little different:

Only gethostbyname affected only on 2.7 and 3.3 on all freebsd versions (9, 10, 11). 

Python 3.2 was not tested (freebsd port was deleted), but likely affected as well

Feels/Appears like a gethostbyname fix or other change affecting gethostbyname in 3.4, missing merges to 3.3, (likely 3.2) and 2.7.

Full test matrix attached
msg271346 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2016-07-26 08:16
For what it is worth: the relevant standard says that octal and hexadecimal addresses should be accepted (POSIX getaddrinfo refers to inet_addr for numeric IP addresses and that says that octal and hexadecimal numbers are valid in IP addresses), see:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_addr.html#

Adding a implementation note to the documentation might be useful, but it should IMHO only mention that the platform getaddrinfo is used in the implementation for the Python functions and should not mention specific platforms because we don't have the processes to keep such specific notes up-to-date.
msg271363 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-07-26 13:20
I don't understand the point of the issue. Is it a documentation issue?

Python doesn't parse anything: it's a thin wrapper on top of the standard C library. If you want to complain, report the issue to the maintainers of your C library ;-)
msg271364 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-07-26 13:26
> However, if someone wants to investigate further and finds a fix, we will evaluate it.

IMHO the best fix is to document that the exact behaviour depends on the platform, and that only IPv4 decimal and IPv6 hexadecimal are portable. Corner cases like IPv4 octal addresses are not portable, you should write your own parser.

Note: I checked ipaddress, it doesn't seem to support the funny octal addresses format.

Why do you need octal addresses? What is your use case? :-p
msg271365 - (view) Author: Matt Robenolt (mattrobenolt) Date: 2016-07-26 13:30
> Why do you need octal addresses? What is your use case? :-p

I didn't, but an attacker leveraged this to bypass security. We had checks against `127.0.0.1`, but this resolved to `177.0.0.1` incorrectly, bypassing the check. We were using `socket.gethostbyname` which yielded this.

See https://github.com/getsentry/sentry/pull/3787 for a little bit more context.
msg271369 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-07-26 14:01
> I didn't, but an attacker leveraged this to bypass security.

Ah, that's a real use case. Can you please rephrase the issue title to make it more explicit?

Because in this issue, it's not obvious to me if octal addressses must be accepted on all platforms, or rejected on all platforms.
msg271383 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-26 15:53
There's also the fact that Eryk pointed out that there are different ways to implement this on Windows, so there might be something we want to "fix" there.  It seems like we're not consistent in how we handle addresses in the various socket module functions.
msg271390 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-26 16:06
koobs' results are also interesting, since they indicate that *something* changed on the python side that affected this for freebsd.
History
Date User Action Args
2016-07-26 16:06:22r.david.murraysetmessages: + msg271390
2016-07-26 15:53:28r.david.murraysetmessages: + msg271383
2016-07-26 14:01:48vstinnersetmessages: + msg271369
2016-07-26 13:30:08mattrobenoltsetmessages: + msg271365
2016-07-26 13:26:31vstinnersetmessages: + msg271364
2016-07-26 13:20:30vstinnersetnosy: + vstinner
messages: + msg271363
2016-07-26 08:16:26ronaldoussorensetmessages: + msg271346
2016-07-26 07:15:25koobssetfiles: + socket-test-freebsd-9-10-11-python-27-33-34-35.txt

messages: + msg271343
versions: + Python 3.3
2016-07-25 20:22:47eryksunsetnosy: + eryksun
messages: + msg271304
2016-07-25 16:18:31mattrobenoltsetmessages: + msg271294
2016-07-25 16:05:07xiang.zhangsetmessages: + msg271293
2016-07-25 15:48:59mattrobenoltsetmessages: + msg271290
2016-07-25 15:44:54r.david.murraysetmessages: + msg271289
2016-07-25 15:25:08mattrobenoltsetmessages: + msg271284
2016-07-25 15:22:37mattrobenoltsetmessages: + msg271283
2016-07-25 13:46:56xiang.zhangsetnosy: + xiang.zhang
messages: + msg271271
2016-07-25 13:16:38r.david.murraysetmessages: + msg271266
2016-07-25 13:14:49r.david.murraysetnosy: + r.david.murray, koobs
messages: + msg271265
2016-07-25 08:08:47SilentGhostsetnosy: + ned.deily, ronaldoussoren
components: + macOS
2016-07-25 06:27:54mattrobenoltcreate