This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: socket.inet_aton parsing issue on some libc versions
Type: security Stage:
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Dominik Czarnota, Jeffrey.Kintscher, aldwinaldwin, christian.heimes, vstinner
Priority: normal Keywords:

Created on 2019-07-03 21:54 by Dominik Czarnota, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
inet_aton_pton.py vstinner, 2019-07-05 13:06
Messages (11)
msg347243 - (view) Author: disconnect3d (Dominik Czarnota) * Date: 2019-07-03 21:54
The socket.inet_aton Python function uses libc's inet_aton function. This, on some implementations, for example the latest GNU C Library (glibc 2.29 as of today), accepts whitespace with trailing characters after a valid IP.

An example can be seen below:
```
In [1]: import socket

In [2]: socket.inet_aton('1.1.1.1 this shall not pass')
Out[2]: b'\x01\x01\x01\x01'

In [3]: socket.inet_aton('1.1.1.1this shall not pass')
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
----> 1 socket.inet_aton('1.1.1.1this shall not pass')

OSError: illegal IP address string passed to inet_aton
```

The problem is, some users might use this function to also validate whether a given host or ip string is valid, especially taking into account Python's "ask for forgiveness and not permission" motto (or a coding style?).

This will introduce security vulnerabilities, like command injections, if it is used with insecure os.system or subprocess.* functions. For example, an ip of "1.1.1.1 ; whoami" will result in returning an output of both ping and whoami commands.
```
def do_ping(request):
    """This is an insecure example, DO NOT REUSE THIS CODE"""
    ip = request['ip']

    # Try to parse as ipv4 and save to db as just 4 bytes
    try:
       ip_bytes = socket.inet_aton(ip)
    except OSError:
       return 'IP is not a valid IPv4.'

    save_ping_request_to_db(ip_bytes)

    return subprocess.check_output('ping -c1 %s' % ip, shell=True)  # an ip of "1.1.1.1 ; whomai" triggers a command injection vulnerability here
```

An initial report sent with this issue contains an example and potential security issue in ssl builtin module which has been already fixed (https://bugs.python.org/issue37463).
The socket.inet_aton function is also used in requests library utility functions, which I am also going to report just after sending this issue.


It is also worth mentioning that the inet_aton documentation - https://docs.python.org/3/library/socket.html#socket.inet_aton - states that the validation depends on the underlying implementation:
> inet_aton() also accepts strings with less than three dots; see the Unix manual page inet(3) for details.
> If the IPv4 address string passed to this function is invalid, OSError will be raised. Note that exactly what is valid depends on the underlying C implementation of inet_aton().

Which is nice but:
1. The inet(3) doesn't say that inet_aton shouldn't be used to validate whether the input is and only is an IP address or that it accepts trailing whitespace with garbage.
It only states that "inet_aton() returns nonzero if the address is valid, zero if not". I added the relevant "DESCRIPTION" part from my `man 3 inet` on an Ubuntu 18.04.1 LTS host at the end.

2. The help(socket.inet_aton) doesn't forewarn about it. For example IPython users might miss the docs and just read the help from the shell:
> Help on built-in function inet_aton in module _socket:
> 
> inet_aton(...)
>     inet_aton(string) -> bytes giving packed 32-bit IP representation
> 
>     Convert an IP address in string format (123.45.67.89) to the 32-bit packed
>     binary format used in low-level network functions.

3. "If the IPv4 address string passed to this function is invalid, OSError will be raised." vs "Note that exactly what is valid depends on (...)" is weird since "1.1.1.1 blabla" string *is just not a valid IPv4 address string*.


Ideally, the problem should be fixed on the libc side, but I don't know if it is going to be. The first comment from a similar issue on glibc bugtracker - https://sourceware.org/bugzilla/show_bug.cgi?id=20018 - states:
> For historic reasons, inet_addr and inet_aton accept trailing garbage.  Some parsers rely on this (for example, libresolv when it parses “nameserver” directives in /etc/resolv.conf).
> 
> (...)
> 
> We should add a check for trailing garbage and relegate the old behavior to a compatibility symbol.

However I am not sure if it won't be fixed or if it was simply forgotten during fixing the linked issue as it probably wasn't reported as a separate one.

Because of that I think Python should add an additional validation of inet_aton input: raise a validation error when the input contains a whitespace.

In case this solution would be rejected because of a need/want to be consistent with C's inet_aton, we should add a security warning that using this function to validate IP addresses is dangerous on some libc version, along with a description/example what exactly is wrong with it.


--- man 3 inet, the relevant description part below ---
DESCRIPTION
       inet_aton() converts the Internet host address cp from the IPv4 numbers-and-dots notation into binary form (in network byte order) and stores it in the struc‐
       ture that inp points to.  inet_aton() returns nonzero if the address is valid, zero if not.  The address supplied in cp can have one of the following forms:

       a.b.c.d   Each of the four numeric parts specifies a byte of the address; the bytes are assigned in left-to-right order to produce the binary address.

       a.b.c     Parts a and b specify the first two bytes of the binary address.  Part c is interpreted as a 16-bit value that defines the rightmost  two  bytes  of
                 the binary address.  This notation is suitable for specifying (outmoded) Class B network addresses.

       a.b       Part a specifies the first byte of the binary address.  Part b is interpreted as a 24-bit value that defines the rightmost three bytes of the binary
                 address.  This notation is suitable for specifying (outmoded) Class A network addresses.

       a         The value a is interpreted as a 32-bit value that is stored directly into the binary address without any byte rearrangement.

       In all of the above forms, components of the dotted address can be specified in decimal, octal (with a  leading  0),  or  hexadecimal,  with  a  leading  0X).
       Addresses  in  any  of  these forms are collectively termed IPV4 numbers-and-dots notation.  The form that uses exactly four decimal numbers is referred to as
       IPv4 dotted-decimal notation (or sometimes: IPv4 dotted-quad notation).

       inet_aton() returns 1 if the supplied string was successfully interpreted, or 0 if the string is invalid (errno is not set on error).
---
msg347255 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-07-04 04:30
From : https://docs.python.org/3.9/library/subprocess.html#security-considerations

Security Considerations

Unlike some other popen functions, this implementation will never implicitly call a system shell. This means that all characters, including shell metacharacters, can safely be passed to child processes. If the shell is invoked explicitly, via shell=True, it is the application’s responsibility to ensure that all whitespace and metacharacters are quoted appropriately to avoid shell injection vulnerabilities.

When using shell=True, the shlex.quote() function can be used to properly escape whitespace and shell metacharacters in strings that are going to be used to construct shell commands.
msg347256 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-07-04 04:41
indeed, can confirm that a string starting with valid ip address and a space followed with anything, will not trigger the exception.

do_ping('1.1.1.1 1.256.300.1 ; whoami') => no exception
msg347321 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-05 09:44
January 2019, Florian Weimer created the issue "Deprecate inet_addr, inet_aton" in glibc:
https://sourceware.org/bugzilla/show_bug.cgi?id=24111

inet_aton() ignores extra string "for historic reasons". More info at:
https://sourceware.org/bugzilla/show_bug.cgi?id=20018#c0

"For historic reasons, inet_addr and inet_aton accept trailing garbage.  Some parsers rely on this (for example, libresolv when it parses “nameserver” directives in /etc/resolv.conf)."

--

To clarify the complex situation: glibc getaddrinfo() has been fixed in glibc to ignore extra string, but not inet_aton(). getaddrinfo() fix:

* https://sourceware.org/bugzilla/show_bug.cgi?id=20018
* https://bugzilla.redhat.com/show_bug.cgi?id=1347549
msg347328 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-05 10:18
I take the freedom of assigning CVE-2016-10739 to this Python issue, even if CVE-2016-10739 was reported to the glibc (not to Python).

"In the GNU C Library (aka glibc or libc6) through 2.28, the getaddrinfo function would successfully parse a string that contained an IPv4 address followed by whitespace and arbitrary characters, which could lead applications to incorrectly assume that it had parsed a valid string, without the possibility of embedded HTTP headers or other potentially dangerous substrings."
https://access.redhat.com/security/cve/cve-2016-10739
msg347329 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2019-07-05 10:25
CVE-2016-10739 was filed against glibc. We cannot re-use a CVE number from another product in CPython. You can only reference that a CVE causes a security bug.
msg347337 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-05 12:32
One solution would to be reimplement socket.inet_aton() with inet_pton() internally.

inet_pton() is well specified and standard (POSIX). inet_aton() is not ("inet_aton() is not specified in POSIX.1, but is available on most systems." says its Linux manual page).

>>> socket.inet_pton(socket.AF_INET, "1.2.3.4")
b'\x01\x02\x03\x04'
>>> socket.inet_pton(socket.AF_INET, "1.2.3.4 extra string")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: illegal IP address string passed to inet_pton

Problems:

* inet_pton() requires an address family. Should we iterate on all supported address families until one works?
* inet_pton() doesn't accept IPv4 "short format" like "127"

>>> socket.inet_aton("127")
b'\x00\x00\x00\x7f'
>>> socket.inet_pton(socket.AF_INET, "127")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: illegal IP address string passed to inet_pton
msg347339 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2019-07-05 12:52
> inet_pton() requires an address family. Should we iterate on all supported address families until one works?

No, that is not necessary. inet_aton() only accepts IPv4 addresses. Therefore an emulation with inet_pton() would look like this:

>>> import functools, socket
>>> inet_aton = functools.partial(socket.inet_pton, socket.AF_INET)
>>> inet_aton('127.0.0.1')
b'\x7f\x00\x00\x01'

> inet_pton() doesn't accept IPv4 "short format" like "127"

Yes, that might be a problem.
msg347341 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-05 13:06
Attached inet_aton_pton.py is a proof-of-concept which reimplements inet_aton() using inet_pton(). It supports a.b.c.d, a.b.c, a.b and a formats. I'm not sure which exception should be raised in case of parsing error.

socket.inet_aton() raises OSError"illegal IP address string passed to inet_aton") using C code:

    PyErr_SetString(PyExc_OSError,
                    "illegal IP address string passed to inet_aton");
msg372740 - (view) Author: disconnect3d (Dominik Czarnota) * Date: 2020-07-01 10:14
Its a while since this has been reported. I think inet_aton_pton.py is fine, though, a commit adding it should explain why we do it this way.

@vstinner can you push this patch further or do you want me to do it?
msg372742 - (view) Author: disconnect3d (Dominik Czarnota) * Date: 2020-07-01 10:15
Regarding the exception case: seems like OSError, since that's what originally was done and we don't want to break users of this code.
History
Date User Action Args
2022-04-11 14:59:17adminsetgithub: 81676
2020-08-07 06:47:51Jeffrey.Kintschersetnosy: + Jeffrey.Kintscher
2020-07-01 10:15:39Dominik Czarnotasetmessages: + msg372742
2020-07-01 10:14:17Dominik Czarnotasetmessages: + msg372740
2019-07-05 13:06:01vstinnersetfiles: + inet_aton_pton.py

messages: + msg347341
2019-07-05 12:52:09christian.heimessetmessages: + msg347339
2019-07-05 12:32:32vstinnersetmessages: + msg347337
2019-07-05 10:25:49christian.heimessetmessages: + msg347329
title: [CVE-2016-10739] socket.inet_aton parsing issue on some libc versions -> socket.inet_aton parsing issue on some libc versions
2019-07-05 10:19:00vstinnersetmessages: + msg347328
title: socket.inet_aton parsing issue on some libc versions -> [CVE-2016-10739] socket.inet_aton parsing issue on some libc versions
2019-07-05 09:44:32vstinnersetmessages: + msg347321
2019-07-05 09:33:39vstinnersetnosy: + vstinner
2019-07-04 07:57:17xtreaksetnosy: + christian.heimes
2019-07-04 04:41:35aldwinaldwinsetmessages: + msg347256
2019-07-04 04:30:23aldwinaldwinsetnosy: + aldwinaldwin
messages: + msg347255
2019-07-03 21:54:59Dominik Czarnotasettype: security
2019-07-03 21:54:34Dominik Czarnotacreate