Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket.inet_aton parsing issue on some libc versions #81676

Open
disconnect3d mannequin opened this issue Jul 3, 2019 · 14 comments
Open

socket.inet_aton parsing issue on some libc versions #81676

disconnect3d mannequin opened this issue Jul 3, 2019 · 14 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-security A security issue

Comments

@disconnect3d
Copy link
Mannequin

disconnect3d mannequin commented Jul 3, 2019

BPO 37495
Nosy @vstinner, @tiran, @websurfer5, @aldwinaldwin, @disconnect3d
Files
  • inet_aton_pton.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-07-03.21:54:34.295>
    labels = ['type-security', '3.8', '3.7', 'library', '3.9']
    title = 'socket.inet_aton parsing issue on some libc versions'
    updated_at = <Date 2020-08-07.06:47:51.662>
    user = 'https://github.com/disconnect3d'

    bugs.python.org fields:

    activity = <Date 2020-08-07.06:47:51.662>
    actor = 'Jeffrey.Kintscher'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2019-07-03.21:54:34.295>
    creator = 'Dominik Czarnota'
    dependencies = []
    files = ['48457']
    hgrepos = []
    issue_num = 37495
    keywords = []
    message_count = 11.0
    messages = ['347243', '347255', '347256', '347321', '347328', '347329', '347337', '347339', '347341', '372740', '372742']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'christian.heimes', 'Jeffrey.Kintscher', 'aldwinaldwin', 'Dominik Czarnota']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue37495'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9']

    @disconnect3d
    Copy link
    Mannequin Author

    disconnect3d mannequin commented Jul 3, 2019

    The socket.inet_aton Python function uses libc's inet_aton function. This, on some implementations, for example the latest GNU C Library (glibc 2.29 as of today), accepts whitespace with trailing characters after a valid IP.

    An example can be seen below:

    In [1]: import socket
    
    In [2]: socket.inet_aton('1.1.1.1 this shall not pass')
    Out[2]: b'\x01\x01\x01\x01'
    
    In [3]: socket.inet_aton('1.1.1.1this shall not pass')
    ---------------------------------------------------------------------------
    OSError                                   Traceback (most recent call last)
    ----> 1 socket.inet_aton('1.1.1.1this shall not pass')
    
    OSError: illegal IP address string passed to inet_aton
    

    The problem is, some users might use this function to also validate whether a given host or ip string is valid, especially taking into account Python's "ask for forgiveness and not permission" motto (or a coding style?).

    This will introduce security vulnerabilities, like command injections, if it is used with insecure os.system or subprocess.* functions. For example, an ip of "1.1.1.1 ; whoami" will result in returning an output of both ping and whoami commands.

    def do_ping(request):
        """This is an insecure example, DO NOT REUSE THIS CODE"""
        ip = request['ip']
    
        # Try to parse as ipv4 and save to db as just 4 bytes
        try:
           ip_bytes = socket.inet_aton(ip)
        except OSError:
           return 'IP is not a valid IPv4.'
    
        save_ping_request_to_db(ip_bytes)
    
        return subprocess.check_output('ping -c1 %s' % ip, shell=True)  # an ip of "1.1.1.1 ; whomai" triggers a command injection vulnerability here
    

    An initial report sent with this issue contains an example and potential security issue in ssl builtin module which has been already fixed (https://bugs.python.org/issue37463).
    The socket.inet_aton function is also used in requests library utility functions, which I am also going to report just after sending this issue.

    It is also worth mentioning that the inet_aton documentation - https://docs.python.org/3/library/socket.html#socket.inet_aton - states that the validation depends on the underlying implementation:

    inet_aton() also accepts strings with less than three dots; see the Unix manual page inet(3) for details.
    If the IPv4 address string passed to this function is invalid, OSError will be raised. Note that exactly what is valid depends on the underlying C implementation of inet_aton().

    Which is nice but:

    1. The inet(3) doesn't say that inet_aton shouldn't be used to validate whether the input is and only is an IP address or that it accepts trailing whitespace with garbage.
      It only states that "inet_aton() returns nonzero if the address is valid, zero if not". I added the relevant "DESCRIPTION" part from my man 3 inet on an Ubuntu 18.04.1 LTS host at the end.

    2. The help(socket.inet_aton) doesn't forewarn about it. For example IPython users might miss the docs and just read the help from the shell:

    Help on built-in function inet_aton in module _socket:

    inet_aton(...)
    inet_aton(string) -> bytes giving packed 32-bit IP representation

    Convert an IP address in string format (123.45.67.89) to the 32-bit packed
    binary format used in low-level network functions.
    
    1. "If the IPv4 address string passed to this function is invalid, OSError will be raised." vs "Note that exactly what is valid depends on (...)" is weird since "1.1.1.1 blabla" string *is just not a valid IPv4 address string*.

    Ideally, the problem should be fixed on the libc side, but I don't know if it is going to be. The first comment from a similar issue on glibc bugtracker - https://sourceware.org/bugzilla/show_bug.cgi?id=20018 - states:

    For historic reasons, inet_addr and inet_aton accept trailing garbage. Some parsers rely on this (for example, libresolv when it parses “nameserver” directives in /etc/resolv.conf).

    (...)

    We should add a check for trailing garbage and relegate the old behavior to a compatibility symbol.

    However I am not sure if it won't be fixed or if it was simply forgotten during fixing the linked issue as it probably wasn't reported as a separate one.

    Because of that I think Python should add an additional validation of inet_aton input: raise a validation error when the input contains a whitespace.

    In case this solution would be rejected because of a need/want to be consistent with C's inet_aton, we should add a security warning that using this function to validate IP addresses is dangerous on some libc version, along with a description/example what exactly is wrong with it.

    --- man 3 inet, the relevant description part below ---
    DESCRIPTION
    inet_aton() converts the Internet host address cp from the IPv4 numbers-and-dots notation into binary form (in network byte order) and stores it in the struc‐
    ture that inp points to. inet_aton() returns nonzero if the address is valid, zero if not. The address supplied in cp can have one of the following forms:

       a.b.c.d   Each of the four numeric parts specifies a byte of the address; the bytes are assigned in left-to-right order to produce the binary address.
    
       a.b.c     Parts a and b specify the first two bytes of the binary address.  Part c is interpreted as a 16-bit value that defines the rightmost  two  bytes  of
                 the binary address.  This notation is suitable for specifying (outmoded) Class B network addresses.
    
       a.b       Part a specifies the first byte of the binary address.  Part b is interpreted as a 24-bit value that defines the rightmost three bytes of the binary
                 address.  This notation is suitable for specifying (outmoded) Class A network addresses.
    
       a         The value a is interpreted as a 32-bit value that is stored directly into the binary address without any byte rearrangement.
    
       In all of the above forms, components of the dotted address can be specified in decimal, octal (with a  leading  0),  or  hexadecimal,  with  a  leading  0X).
       Addresses  in  any  of  these forms are collectively termed IPV4 numbers-and-dots notation.  The form that uses exactly four decimal numbers is referred to as
       IPv4 dotted-decimal notation (or sometimes: IPv4 dotted-quad notation).
    
       inet_aton() returns 1 if the supplied string was successfully interpreted, or 0 if the string is invalid (errno is not set on error).
    

    ---

    @disconnect3d disconnect3d mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-security A security issue labels Jul 3, 2019
    @aldwinaldwin
    Copy link
    Mannequin

    aldwinaldwin mannequin commented Jul 4, 2019

    From : https://docs.python.org/3.9/library/subprocess.html#security-considerations

    Security Considerations

    Unlike some other popen functions, this implementation will never implicitly call a system shell. This means that all characters, including shell metacharacters, can safely be passed to child processes. If the shell is invoked explicitly, via shell=True, it is the application’s responsibility to ensure that all whitespace and metacharacters are quoted appropriately to avoid shell injection vulnerabilities.

    When using shell=True, the shlex.quote() function can be used to properly escape whitespace and shell metacharacters in strings that are going to be used to construct shell commands.

    @aldwinaldwin
    Copy link
    Mannequin

    aldwinaldwin mannequin commented Jul 4, 2019

    indeed, can confirm that a string starting with valid ip address and a space followed with anything, will not trigger the exception.

    do_ping('1.1.1.1 1.256.300.1 ; whoami') => no exception

    @vstinner
    Copy link
    Member

    vstinner commented Jul 5, 2019

    January 2019, Florian Weimer created the issue "Deprecate inet_addr, inet_aton" in glibc:
    https://sourceware.org/bugzilla/show_bug.cgi?id=24111

    inet_aton() ignores extra string "for historic reasons". More info at:
    https://sourceware.org/bugzilla/show_bug.cgi?id=20018#c0

    "For historic reasons, inet_addr and inet_aton accept trailing garbage. Some parsers rely on this (for example, libresolv when it parses “nameserver” directives in /etc/resolv.conf)."

    --

    To clarify the complex situation: glibc getaddrinfo() has been fixed in glibc to ignore extra string, but not inet_aton(). getaddrinfo() fix:

    @vstinner
    Copy link
    Member

    vstinner commented Jul 5, 2019

    I take the freedom of assigning CVE-2016-10739 to this Python issue, even if CVE-2016-10739 was reported to the glibc (not to Python).

    "In the GNU C Library (aka glibc or libc6) through 2.28, the getaddrinfo function would successfully parse a string that contained an IPv4 address followed by whitespace and arbitrary characters, which could lead applications to incorrectly assume that it had parsed a valid string, without the possibility of embedded HTTP headers or other potentially dangerous substrings."
    https://access.redhat.com/security/cve/cve-2016-10739

    @vstinner vstinner changed the title socket.inet_aton parsing issue on some libc versions [CVE-2016-10739] socket.inet_aton parsing issue on some libc versions Jul 5, 2019
    @tiran
    Copy link
    Member

    tiran commented Jul 5, 2019

    CVE-2016-10739 was filed against glibc. We cannot re-use a CVE number from another product in CPython. You can only reference that a CVE causes a security bug.

    @tiran tiran changed the title [CVE-2016-10739] socket.inet_aton parsing issue on some libc versions socket.inet_aton parsing issue on some libc versions Jul 5, 2019
    @vstinner
    Copy link
    Member

    vstinner commented Jul 5, 2019

    One solution would to be reimplement socket.inet_aton() with inet_pton() internally.

    inet_pton() is well specified and standard (POSIX). inet_aton() is not ("inet_aton() is not specified in POSIX.1, but is available on most systems." says its Linux manual page).

    >>> socket.inet_pton(socket.AF_INET, "1.2.3.4")
    b'\x01\x02\x03\x04'
    >>> socket.inet_pton(socket.AF_INET, "1.2.3.4 extra string")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: illegal IP address string passed to inet_pton

    Problems:

    • inet_pton() requires an address family. Should we iterate on all supported address families until one works?
    • inet_pton() doesn't accept IPv4 "short format" like "127"
    >>> socket.inet_aton("127")
    b'\x00\x00\x00\x7f'
    >>> socket.inet_pton(socket.AF_INET, "127")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: illegal IP address string passed to inet_pton

    @tiran
    Copy link
    Member

    tiran commented Jul 5, 2019

    inet_pton() requires an address family. Should we iterate on all supported address families until one works?

    No, that is not necessary. inet_aton() only accepts IPv4 addresses. Therefore an emulation with inet_pton() would look like this:

    >>> import functools, socket
    >>> inet_aton = functools.partial(socket.inet_pton, socket.AF_INET)
    >>> inet_aton('127.0.0.1')
    b'\x7f\x00\x00\x01'

    inet_pton() doesn't accept IPv4 "short format" like "127"

    Yes, that might be a problem.

    @vstinner
    Copy link
    Member

    vstinner commented Jul 5, 2019

    Attached inet_aton_pton.py is a proof-of-concept which reimplements inet_aton() using inet_pton(). It supports a.b.c.d, a.b.c, a.b and a formats. I'm not sure which exception should be raised in case of parsing error.

    socket.inet_aton() raises OSError"illegal IP address string passed to inet_aton") using C code:

    PyErr_SetString(PyExc_OSError,
                    "illegal IP address string passed to inet_aton");
    

    @disconnect3d
    Copy link
    Mannequin Author

    disconnect3d mannequin commented Jul 1, 2020

    Its a while since this has been reported. I think inet_aton_pton.py is fine, though, a commit adding it should explain why we do it this way.

    @vstinner can you push this patch further or do you want me to do it?

    @disconnect3d
    Copy link
    Mannequin Author

    disconnect3d mannequin commented Jul 1, 2020

    Regarding the exception case: seems like OSError, since that's what originally was done and we don't want to break users of this code.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @disconnect3d
    Copy link
    Contributor

    disconnect3d commented Nov 29, 2023

    Ping @vstinner @tiran @websurfer5 @aldwinaldwin @ambv can we finally get some love here and prioritize this so it is fixed?

    This issue can cause security vulnerabilities if this function is used in a security relevant context and is assumed to raise an error on invalid IP strings; it should probably get a CVE too...

    @vstinner
    Copy link
    Member

    can we finally get some love here and prioritize this so it is fixed?

    So far, nobody proposed a fix, and it seems like nobody is available to work on a fix.

    Do you want to work on a fix?

    @vstinner
    Copy link
    Member

    I suppose that the bare minimum that we can do is documentation: document this surprising behavior, and maybe give advices on how to avoid it.

    In the glibc, only getaddrinfo() was fixed. inet_aton() was not changed, likely for backward compatibility reasons.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants