Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket.getfqdn() doesn't cope properly with purely DNS-based setups #49254

Open
dfranke mannequin opened this issue Jan 19, 2009 · 24 comments
Open

socket.getfqdn() doesn't cope properly with purely DNS-based setups #49254

dfranke mannequin opened this issue Jan 19, 2009 · 24 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@dfranke
Copy link
Mannequin

dfranke mannequin commented Jan 19, 2009

BPO 5004
Nosy @loewis, @tiran, @mcepl, @bitdancer, @ThomasWaldmann, @jan-hudec, @shoop
Files
  • python2.7-socket-getfqdn.patch
  • python5004-test.c: test C program showing various gethost function output on my system
  • python2.7-socket-getfqdn.patch: Updated patch, originally by Stijn.Hoop
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2009-01-19.22:29:02.219>
    labels = ['type-bug', 'library', '3.9', '3.10', '3.11']
    title = "socket.getfqdn() doesn't cope properly with purely DNS-based setups"
    updated_at = <Date 2021-12-16.17:35:04.319>
    user = 'https://bugs.python.org/dfranke'

    bugs.python.org fields:

    activity = <Date 2021-12-16.17:35:04.319>
    actor = 'mcepl'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2009-01-19.22:29:02.219>
    creator = 'dfranke'
    dependencies = []
    files = ['29919', '29921', '50390']
    hgrepos = []
    issue_num = 5004
    keywords = ['patch']
    message_count = 24.0
    messages = ['80216', '109635', '109647', '162508', '173987', '187065', '187098', '187234', '187237', '187238', '187253', '187256', '187350', '187355', '297435', '300593', '308979', '372583', '404775', '404780', '404796', '404806', '404826', '405364']
    nosy_count = 14.0
    nosy_names = ['loewis', 'christian.heimes', 'mcepl', 'r.david.murray', 'dfranke', 'devurandom', 'mcjeff', 'ankitoshniwal', 'Thomas.Waldmann', 'Stijn.Hoop', 'bulb', 'James Shewey', 'richard.security.consultant', 'shoop']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue5004'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @dfranke
    Copy link
    Mannequin Author

    dfranke mannequin commented Jan 19, 2009

    On Linux and presumably on other POSIX-like systems, socket.getfqdn()
    doesn't work if a system resolves its own FQDN using DNS rather than
    /etc/hosts.

    My system's FQDN is 'fugue.tank.wellohorld.com'. My /etc/hosts is empty
    except for loopback entries, and /etc/resolv.conf contains the line
    'domain tank.wellohorld.com'. This is sufficient for 'hostname -f' to
    do the Right Thing, but socket.getfqdn() simply returns 'fugue':

    dfranke@fugue:~/Python-2.6.1$ hostname
    fugue
    dfranke@fugue:~/Python-2.6.1$ hostname -f
    fugue.tank.wellohorld.com
    dfranke@fugue:~/Python-2.6.1$ ./python
    Python 2.6.1 (r261:67515, Jan 19 2009, 13:56:59)
    [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import socket
    >>> socket.getfqdn()
    'fugue'
    >>>
    dfranke@fugue:~/Python-2.6.1$ echo -e '$a\n172.17.0.120
    fugue.tank.wellohorld.com fugue\n.\nwq' | sudo ed /etc/hosts
    305
    350
    dfranke@fugue:~/Python-2.6.1$ ./python
    Python 2.6.1 (r261:67515, Jan 19 2009, 13:56:59)
    [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import socket
    >>> socket.getfqdn()
    'fugue.tank.wellohorld.com'
    >>>
    dfranke@fugue:~/Python-2.6.1$

    @dfranke dfranke mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error extension-modules C modules in the Modules dir and removed stdlib Python modules in the Lib dir labels Jan 19, 2009
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 8, 2010

    Would someone with appropriate knowledge please take a look to see if this is still an issue.

    @BreamoreBoy BreamoreBoy mannequin added stdlib Python modules in the Lib dir and removed extension-modules C modules in the Modules dir labels Jul 8, 2010
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jul 9, 2010

    I think anybody willing to invest the time could acquire the appropriate knowledge, at least to determine whether it's still an issue (i.e. trying to reproduce it). To fix it, one would then need to read the source code of hostname, and find out what they do differently; strace might be sufficient already.

    @ankitoshniwal
    Copy link
    Mannequin

    ankitoshniwal mannequin commented Jun 7, 2012

    I cannot reproduce this issue. I just tested this on my mac.

    atoshniw@prusev-mn:~/Documents/code/python-dev/bin #hostname -f
    prusev-mn.helloworld.com
    atoshniw@prusev-mn:~/Documents/code/python-dev/bin #python
    Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
    [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import socket
    >>> socket.getfqdn()
    'prusev-mn.helloworld.com'

    @mcjeff
    Copy link
    Mannequin

    mcjeff mannequin commented Oct 28, 2012

    Gave this a go myself...

    $ ./python
    Python 3.4.0a0 (default:57a33af85407, Oct 27 2012, 21:26:30) 
    [GCC 4.4.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import socket
    >>> socket.getfqdn()
    'host.domain.com'
    >>> 
    $ hostname -f
    host.domain.com
    
    $ cat /etc/hosts
    127.0.0.1       localhost.localdomain   localhost

    # The following lines are desirable for IPv6 capable hosts
    ::1 ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters

    Linux host 3.5.2-x86_64 #1 SMP Wed Aug 15 14:31:07 EDT 2012 x86_64 GNU/Linux

    According to strace, both rely on DNS:

    recvfrom(3, "Wj\201\200\0\1\0\1\0\5\0\0\00219\003134\003230\003173\7in-a"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("1.2.3.4")}, [16]) = 176

    Same behavior on both 2.6 & hg tip. I think this is a non-issue.

    @StijnHoop
    Copy link
    Mannequin

    StijnHoop mannequin commented Apr 16, 2013

    Still seeing this on Fedora 18 / Python 2.7.3.

    I only have loopback in /etc/hosts

    [TUE\shoop@pclin281] <~> cat /etc/hosts
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    I search in .campus.tue.nl and .win.tue.nl:

    [TUE\shoop@pclin281] <~> grep search /etc/resolv.conf
    search campus.tue.nl. win.tue.nl.

    Hostname -f reliably returns .campus.tue.nl as it should

    [TUE\shoop@pclin281] <> hostname -f
    pclin281.campus.tue.nl
    [TUE\shoop@pclin281] <
    > hostname -f
    pclin281.campus.tue.nl

    But socket.getfqdn disagrees, even with itself when run multiple times:

    [TUE\shoop@pclin281] <~> python
    Python 2.7.3 (default, Aug  9 2012, 17:23:57) 
    [GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import socket
    >>> socket.getfqdn()
    'pclin281'
    >>> socket.getfqdn()
    'pclin281.win.tue.nl'
    >>> socket.getfqdn()
    'pclin281'
    >>> socket.getfqdn()
    'pclin281.win.tue.nl'
    >>> 

    Note that pclin281.win.tue.nl is in fact also a valid DNS entry, but not one that I would expect the function to ever return given the search order.

    @bitdancer
    Copy link
    Member

    Note that socket.getfqdn is a wrapper around a couple of socket calls that are just wrappers of OS level socket calls. If you take a look at socket.py you'll see the definition. As Martin said earlier, if you (or anyone else) can figure out what hostname does differently and suggest how to patch our getfqdn method to behave similarly, I'm sure the patch will be welcome.

    Unfortunately there won't be any good way to write a test for this.

    @StijnHoop
    Copy link
    Mannequin

    StijnHoop mannequin commented Apr 18, 2013

    OK, fair enough.

    From reading sources, it appears that hostname is using getaddrinfo(3) on kernelhostname with hints->ai_flags & AI_CANONNAME, while Lib/socket.py simply uses gethostbyaddr(kernelhostname), and falls back on kernelhostname in case of errors.

    Unfortunately I am not entirely sure who is "correct" here, as I don't know the intent of socket.getfqdn().

    In my case, kernelhostname is set to 'pclin281' e.g. without the dots. I believe this to be correct, but I know that this is already "controversial" as in there exists software that expects an FQDN there, and internet folklore is split about 50/50 about this necessity.

    Then, apparently, there is confusion about AI_CANONNAME and what it actually should do. glibc upstream does address lookups but Fedora patches this out. See this recent glibc bug for more pointers:

    http://sourceware.org/bugzilla/show_bug.cgi?id=15218

    As mentioned in that bug, a lot of software runs on Fedora and works using that definition of AI_CANONNAME.

    However, switching Lib/socket.py / getfqdn from gethostbyaddr to getaddrinfo might have more implications than just fixing this case. I can try to write a patch, but is this the right direction?

    @StijnHoop
    Copy link
    Mannequin

    StijnHoop mannequin commented Apr 18, 2013

    Attached is a very lightly tested patch that matches hostname -f behaviour on my system. I suspect this should be OK but it definitely needs more testing than just my system...

    @bitdancer
    Copy link
    Member

    The problem with your patch is that it changes the (effective) meaning of the 'name' parameter. Before the patch, name can be an IP address. After the patch, that will fail on Fedora. (It also fails on my Gentoo system).

    It is interesting to note, as well, that the documentation for gethostbyaddr says that it is obsolete and getaddrinfo should be used instead.

    Could we use the getaddrinfo call if we don't get an FQDN back from gethostbyaddr? It doesn't look like that would completely solve your problem, though, given your example output. Have you figured out why that is happening?

    Alternatively, perhaps we could fall back to gethostbyaddr if we don't get an fqdn from the getaddrinfo call.

    However, given that the documentation actually specifies the algorithm used by getfqdn, I'm not sure if we can make either change in a bugfix version.

    @StijnHoop
    Copy link
    Mannequin

    StijnHoop mannequin commented Apr 18, 2013

    OK, dumping my current findings here, as I'm still not sure what the expected results should be.

    First of all, Lib/socket.py calls gethostbyaddr with a name. As in, gethostby _ADDR_ with a name.

    This works because Modules/socketmodule.c internally uses setipaddr() to resolve the name to an address. setipaddr() does this using a call to getaddrinfo() with hints.ai_family == AF_UNSPEC and no further flags.

    On my system (confirmed using the test program attached) this results in SIX entries, and this is the part that confused me.

    Due to virtualization I have a virtual bridge virbr0 configured with an internal IP address 192.168.122.1, as well as my LAN-connected bridge br0 with IP address 131.155.71.8. Both of these addresses are returned in the call to getaddrinfo() (each one 3 times), but NOT ALWAYS IN THE SAME ORDER.

    And this is the clue as to why python's socket.getfqdn() does not behave consistently. For 192.168.122.1 does not resolve to anything, hence it will return "pclin281". And 131.155.71.8 will backwards resolve to pclin281.win.tue.nl as the PTR record points to that entry.

    Now, again, I'm not entirely sure what to do here. I agree that this is not a simple bugfix. I also think that, apart from the weirdness of getaddrinfo() return order, socket.getfqdn() is doing it's documented job of returning /an/ FQDN for a given host.

    But in case of the guaranteed LOCAL canonical hostname, another function is warranted, imho.

    Does this make sense?

    For the record, output of a given run on my system:

    [TUE\shoop@pclin281] <~/tmp> ./test
    gai canon result 0: pclin281.campus.tue.nl 192.168.122.1
    gai canon result 1: (null) 131.155.71.8
    gai result 0: (null) 131.155.71.8
    gai result 1: (null) 131.155.71.8
    gai result 2: (null) 131.155.71.8
    gai result 3: (null) 192.168.122.1
    gai result 4: (null) 192.168.122.1
    gai result 5: (null) 192.168.122.1
    ghbn result 0 h_name: pclin281.campus.tue.nl
    ghbn result 0 h_alias: __NONE__
    ghbn result 1 h_name: pclin281.campus.tue.nl
    ghbn result 1 h_alias: __NONE__
    ghbn result 2 h_name: pclin281.campus.tue.nl
    ghbn result 2 h_alias: __NONE__
    ghbn result 3 h_name: pclin281.campus.tue.nl
    ghbn result 3 h_alias: __NONE__
    ghbn result 4 h_name: pclin281.campus.tue.nl
    ghbn result 4 h_alias: __NONE__
    ghbn result 5 h_name: pclin281.campus.tue.nl
    ghbn result 5 h_alias: __NONE__
    ghbn result 6 h_name: pclin281.campus.tue.nl
    ghbn result 6 h_alias: __NONE__
    ghbn result 7 h_name: pclin281.campus.tue.nl
    ghbn result 7 h_alias: __NONE__
    ghbn result 8 h_name: pclin281.campus.tue.nl
    ghbn result 8 h_alias: __NONE__
    ghbn result 9 h_name: pclin281.campus.tue.nl
    ghbn result 9 h_alias: __NONE__

    @bitdancer
    Copy link
    Member

    Yeah, a new function was a thought that had crossed my mind as well. getfqdnbyname, maybe? Or gethostnamefqdn? Then deprecate calling getfqdn without an argument.

    I agree that gethostbyaddr accepting a non-IP is weird. I have no idea why it was implemented that way, much less why it is *used* that way. It's been that way for a long time, though.

    @StijnHoop
    Copy link
    Mannequin

    StijnHoop mannequin commented Apr 19, 2013

    So after a good nights sleep: does it not make sense to use the canonical hostname iff the name argument is not present / empty? Otherwise, fall back to the documented steps? That way extra API is avoided, and I can't think of a case where you would rather have my weird results vs "the output of hostname -f".

    @bitdancer
    Copy link
    Member

    That is an interesting proposal, yes. I suppose someone that needs the getaddrinfo semantics for something other than the local host can just call it directly.

    Now, do we add the fact that we are doing this to the current alogarithmic documention? :)

    @JamesShewey
    Copy link
    Mannequin

    JamesShewey mannequin commented Jun 30, 2017

    According to the man page for gethostbyaddr "The gethostbyname*() and gethostbyaddr*() functions are obsolete. Applications should use getaddrinfo(3) and getnameinfo(3) instead." - so perhaps using the correct API call might be a good start to resolving this issue, but I found that in my case, I needed to chase the problem upstream instead of downstream. On my Red Hat box, the kernel.hostname value with sysctl was incorrect. I had to re-set it with a sysctl kernel.hostname=hostname.example.com. This overrides /etc/hosts, so I suspect this is probably not an issue on other distros that do not use sysctl.

    The moral of the story being garbage in, garbage out.

    @devurandom
    Copy link
    Mannequin

    devurandom mannequin commented Aug 20, 2017

    In my case, /etc/hostname, /proc/sys/kernel/hostname, uname -n, hostname -f all show the same FQDN, but python -c 'import socket ; print(socket.getfqdn())' still prints the short hostname. /etc/hosts is empty except for localhost. /etc/nsswitch.conf contains:
    hosts: files mymachines resolve [!UNAVAIL=return] dns myhostname

    @thomaswaldmann
    Copy link
    Mannequin

    thomaswaldmann mannequin commented Dec 24, 2017

    Embarassing as always to stumble over some stuff and then find a 9y old ticket here, where it is discussed and even (almost) solved.

    Our ticket: borgbackup/borg#3471

    Fixed getfqdn we use now instead of socket.getfqdn():

    borgbackup/borg@9b0d0f3#diff-4b53f84e19a3bb376bf2202371ed269aR188

    Note: no "else: name = hostname" at the end, that is a bug in the patch attached to this ticket (hostname is undefined after applying the patch).

    @jan-hudec
    Copy link
    Mannequin

    jan-hudec mannequin commented Jun 29, 2020

    Confirming the fixed version linked in previous comment by Thomas Waldmann is correct and matches what hostname -f does.

    @jan-hudec jan-hudec mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes labels Jun 29, 2020
    @richardsecurityconsultant
    Copy link
    Mannequin

    I just ran into this 12 year old issue. Can this be merged please?

    @tiran
    Copy link
    Member

    tiran commented Oct 22, 2021

    Could you or somebody else please create a PR with patch and a test case?

    @tiran tiran added 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Oct 22, 2021
    @richardsecurityconsultant
    Copy link
    Mannequin

    Here is the updated patch. Is python5004-test.c enough as a test case?

    @tiran
    Copy link
    Member

    tiran commented Oct 22, 2021

    We no longer accept patches. Contributors have to create a PR on GitHub, so we can record contributions and verify the contributor license agreement.

    @richardsecurityconsultant
    Copy link
    Mannequin

    In that case Stijn Hope should create the PR since he wrote the patch. Anyone else could get in trouble for using his code without proper permission.

    @shoop
    Copy link
    Mannequin

    shoop mannequin commented Oct 30, 2021

    I hereby put my patch in the public domain and/or under any desired
    copyright license as required by the Python project to accept it.

    Regards,

    Stijn Hoop

    On Fri, 22 Oct 2021 21:03:26 +0000
    Richard van den Berg <report@bugs.python.org> wrote:

    Richard van den Berg <richard.security.consultant@gmail.com> added
    the comment:

    In that case Stijn Hope should create the PR since he wrote the
    patch. Anyone else could get in trouble for using his code without
    proper permission.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue5004\>


    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    potiuk added a commit to potiuk/airflow that referenced this issue Jul 12, 2022
    We keep on having repeated issue reports about non-matching
    hostname of workers. This seems to be trceable to getfqdn method
    of socket in Kubernetes that in some circumstances (race condition
    with netwrking setup when starting) can return different hostname
    at different times.
    
    There seems to be a related issue in Python that has not been
    resolved in more than 13 years (!)
    
    python/cpython#49254
    
    The error seems to be related to the way how canonicalname is
    derived by getfqdn (it uses gethostbyaddr which sometimes
    provides different name than canonical name (it returns the
    first DNS name resolved that contains ".").
    
    We are fixing it in two ways:
    
    * instead of using gethostbyaddr we are using getadddrinfo with
      AI_CANONNAME flag which (according to the docs):
    
      https://man7.org/linux/man-pages/man3/getaddrinfo.3.html
    
        If hints.ai_flags includes the AI_CANONNAME flag, then the
        ai_canonname field of the first of the addrinfo structures in the
        returned list is set to point to the official name of the host.
    
    * we are caching the name returned by first time retrieval per
      interpreter. This way at least inside the same interpreter, the
      name of the host should not change.
    potiuk added a commit to apache/airflow that referenced this issue Jul 12, 2022
    We keep on having repeated issue reports about non-matching
    hostname of workers. This seems to be trceable to getfqdn method
    of socket in Kubernetes that in some circumstances (race condition
    with netwrking setup when starting) can return different hostname
    at different times.
    
    There seems to be a related issue in Python that has not been
    resolved in more than 13 years (!)
    
    python/cpython#49254
    
    The error seems to be related to the way how canonicalname is
    derived by getfqdn (it uses gethostbyaddr which sometimes
    provides different name than canonical name (it returns the
    first DNS name resolved that contains ".").
    
    We are fixing it in two ways:
    
    * instead of using gethostbyaddr we are using getadddrinfo with
      AI_CANONNAME flag which (according to the docs):
    
      https://man7.org/linux/man-pages/man3/getaddrinfo.3.html
    
        If hints.ai_flags includes the AI_CANONNAME flag, then the
        ai_canonname field of the first of the addrinfo structures in the
        returned list is set to point to the official name of the host.
    
    * we are caching the name returned by first time retrieval per
      interpreter. This way at least inside the same interpreter, the
      name of the host should not change.
    leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Jan 30, 2023
    We keep on having repeated issue reports about non-matching
    hostname of workers. This seems to be trceable to getfqdn method
    of socket in Kubernetes that in some circumstances (race condition
    with netwrking setup when starting) can return different hostname
    at different times.
    
    There seems to be a related issue in Python that has not been
    resolved in more than 13 years (!)
    
    python/cpython#49254
    
    The error seems to be related to the way how canonicalname is
    derived by getfqdn (it uses gethostbyaddr which sometimes
    provides different name than canonical name (it returns the
    first DNS name resolved that contains ".").
    
    We are fixing it in two ways:
    
    * instead of using gethostbyaddr we are using getadddrinfo with
      AI_CANONNAME flag which (according to the docs):
    
      https://man7.org/linux/man-pages/man3/getaddrinfo.3.html
    
        If hints.ai_flags includes the AI_CANONNAME flag, then the
        ai_canonname field of the first of the addrinfo structures in the
        returned list is set to point to the official name of the host.
    
    * we are caching the name returned by first time retrieval per
      interpreter. This way at least inside the same interpreter, the
      name of the host should not change.
    
    GitOrigin-RevId: a3f2df0f45973ddb97990361fdc5caa256c175ca
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants