This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Dominik Czarnota
Recipients Dominik Czarnota
Date 2019-07-03.21:54:33
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1562190874.32.0.352639947279.issue37495@roundup.psfhosted.org>
In-reply-to
Content
The socket.inet_aton Python function uses libc's inet_aton function. This, on some implementations, for example the latest GNU C Library (glibc 2.29 as of today), accepts whitespace with trailing characters after a valid IP.

An example can be seen below:
```
In [1]: import socket

In [2]: socket.inet_aton('1.1.1.1 this shall not pass')
Out[2]: b'\x01\x01\x01\x01'

In [3]: socket.inet_aton('1.1.1.1this shall not pass')
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
----> 1 socket.inet_aton('1.1.1.1this shall not pass')

OSError: illegal IP address string passed to inet_aton
```

The problem is, some users might use this function to also validate whether a given host or ip string is valid, especially taking into account Python's "ask for forgiveness and not permission" motto (or a coding style?).

This will introduce security vulnerabilities, like command injections, if it is used with insecure os.system or subprocess.* functions. For example, an ip of "1.1.1.1 ; whoami" will result in returning an output of both ping and whoami commands.
```
def do_ping(request):
    """This is an insecure example, DO NOT REUSE THIS CODE"""
    ip = request['ip']

    # Try to parse as ipv4 and save to db as just 4 bytes
    try:
       ip_bytes = socket.inet_aton(ip)
    except OSError:
       return 'IP is not a valid IPv4.'

    save_ping_request_to_db(ip_bytes)

    return subprocess.check_output('ping -c1 %s' % ip, shell=True)  # an ip of "1.1.1.1 ; whomai" triggers a command injection vulnerability here
```

An initial report sent with this issue contains an example and potential security issue in ssl builtin module which has been already fixed (https://bugs.python.org/issue37463).
The socket.inet_aton function is also used in requests library utility functions, which I am also going to report just after sending this issue.


It is also worth mentioning that the inet_aton documentation - https://docs.python.org/3/library/socket.html#socket.inet_aton - states that the validation depends on the underlying implementation:
> inet_aton() also accepts strings with less than three dots; see the Unix manual page inet(3) for details.
> If the IPv4 address string passed to this function is invalid, OSError will be raised. Note that exactly what is valid depends on the underlying C implementation of inet_aton().

Which is nice but:
1. The inet(3) doesn't say that inet_aton shouldn't be used to validate whether the input is and only is an IP address or that it accepts trailing whitespace with garbage.
It only states that "inet_aton() returns nonzero if the address is valid, zero if not". I added the relevant "DESCRIPTION" part from my `man 3 inet` on an Ubuntu 18.04.1 LTS host at the end.

2. The help(socket.inet_aton) doesn't forewarn about it. For example IPython users might miss the docs and just read the help from the shell:
> Help on built-in function inet_aton in module _socket:
> 
> inet_aton(...)
>     inet_aton(string) -> bytes giving packed 32-bit IP representation
> 
>     Convert an IP address in string format (123.45.67.89) to the 32-bit packed
>     binary format used in low-level network functions.

3. "If the IPv4 address string passed to this function is invalid, OSError will be raised." vs "Note that exactly what is valid depends on (...)" is weird since "1.1.1.1 blabla" string *is just not a valid IPv4 address string*.


Ideally, the problem should be fixed on the libc side, but I don't know if it is going to be. The first comment from a similar issue on glibc bugtracker - https://sourceware.org/bugzilla/show_bug.cgi?id=20018 - states:
> For historic reasons, inet_addr and inet_aton accept trailing garbage.  Some parsers rely on this (for example, libresolv when it parses “nameserver” directives in /etc/resolv.conf).
> 
> (...)
> 
> We should add a check for trailing garbage and relegate the old behavior to a compatibility symbol.

However I am not sure if it won't be fixed or if it was simply forgotten during fixing the linked issue as it probably wasn't reported as a separate one.

Because of that I think Python should add an additional validation of inet_aton input: raise a validation error when the input contains a whitespace.

In case this solution would be rejected because of a need/want to be consistent with C's inet_aton, we should add a security warning that using this function to validate IP addresses is dangerous on some libc version, along with a description/example what exactly is wrong with it.


--- man 3 inet, the relevant description part below ---
DESCRIPTION
       inet_aton() converts the Internet host address cp from the IPv4 numbers-and-dots notation into binary form (in network byte order) and stores it in the struc‐
       ture that inp points to.  inet_aton() returns nonzero if the address is valid, zero if not.  The address supplied in cp can have one of the following forms:

       a.b.c.d   Each of the four numeric parts specifies a byte of the address; the bytes are assigned in left-to-right order to produce the binary address.

       a.b.c     Parts a and b specify the first two bytes of the binary address.  Part c is interpreted as a 16-bit value that defines the rightmost  two  bytes  of
                 the binary address.  This notation is suitable for specifying (outmoded) Class B network addresses.

       a.b       Part a specifies the first byte of the binary address.  Part b is interpreted as a 24-bit value that defines the rightmost three bytes of the binary
                 address.  This notation is suitable for specifying (outmoded) Class A network addresses.

       a         The value a is interpreted as a 32-bit value that is stored directly into the binary address without any byte rearrangement.

       In all of the above forms, components of the dotted address can be specified in decimal, octal (with a  leading  0),  or  hexadecimal,  with  a  leading  0X).
       Addresses  in  any  of  these forms are collectively termed IPV4 numbers-and-dots notation.  The form that uses exactly four decimal numbers is referred to as
       IPv4 dotted-decimal notation (or sometimes: IPv4 dotted-quad notation).

       inet_aton() returns 1 if the supplied string was successfully interpreted, or 0 if the string is invalid (errno is not set on error).
---
History
Date User Action Args
2019-07-03 21:54:34Dominik Czarnotasetrecipients: + Dominik Czarnota
2019-07-03 21:54:34Dominik Czarnotasetmessageid: <1562190874.32.0.352639947279.issue37495@roundup.psfhosted.org>
2019-07-03 21:54:34Dominik Czarnotalinkissue37495 messages
2019-07-03 21:54:33Dominik Czarnotacreate