This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: socket.gethostbyaddr raises error if invalid unicode in hosts
Type: behavior Stage: resolved
Components: Library (Lib), Unicode, Windows Versions: Python 3.10, Python 3.9, Python 3.8
Status: closed Resolution: duplicate
Dependencies: Superseder: Windows: socket.gethostbyaddr(name) fails for non-ASCII hostname
View: 26227
Assigned To: Nosy List: Peter92, christian.heimes, ezio.melotti, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2020-11-29 02:08 by Peter92, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (4)
msg382029 - (view) Author: Peter Hunt (Peter92) Date: 2020-11-29 02:08
If the hosts file contains invalid unicode, then the socket module will break when attempting to get the list of hosts. This renders modules such as Flask and Django unusable.

    I had a mapping to localghost (, and Docker incorrectly rewrote the hosts file during installation, turning it from " xn--9q8h" to " xn-9q8h".
    The socket module was not able to handle that, and was failing with a UnicodeDecodeError in Python 3.6+ as it attempted to list the addresses for "".

How to reproduce:
    # Add " xn-9q8h" to C:/Windows/System32/drivers/etc/hosts
    >>> socket.gethostbyaddr('')
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte

    In Python 2.7 - 3.5, it returns 'xn\u20139q8h'.
    I feel returning "xn-9q8h" as a string would be best, but ignoring it could be an option.
    An alternative would be to raise socket.error.

Full traceback from Flask as an example:

    Traceback (most recent call last):
    File "", line 285, in <module>
    File "C:\Users\Peter\AppData\Roaming\Python\Python37\site-packages\flask\", line 990, in run
        run_simple(host, port, self, **options)
    File "C:\Users\Peter\AppData\Roaming\Python\Python37\site-packages\werkzeug\", line 1052, in run_simple
    File "C:\Users\Peter\AppData\Roaming\Python\Python37\site-packages\werkzeug\", line 1005, in inner
    File "C:\Users\Peter\AppData\Roaming\Python\Python37\site-packages\werkzeug\", line 848, in make_server
        host, port, app, request_handler, passthrough_errors, ssl_context, fd=fd
    File "C:\Users\Peter\AppData\Roaming\Python\Python37\site-packages\werkzeug\", line 740, in __init__
        HTTPServer.__init__(self, server_address, handler)
    File "C:\Program Files\Python37\lib\", line 452, in __init__
    File "C:\Program Files\Python37\lib\http\", line 139, in server_bind
        self.server_name = socket.getfqdn(host)
    File "C:\Program Files\Python37\lib\", line 676, in getfqdn
        hostname, aliases, ipaddrs = gethostbyaddr(name)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte
msg382093 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2020-11-29 20:28
I cannot reproduce the issue on Linux:

# echo " xn-9q8h" >> /etc/hosts
# python3.8
>>> import socket
>>> socket.gethostbyaddr("")
('xn-9q8h', [], [''])
msg382154 - (view) Author: Peter Hunt (Peter92) Date: 2020-11-30 14:40
Ah, I just realised it may have been a different dash to the one that can be typed with the keyboard.

From the wiki article (, using either the "en" or "em" dash will cause the issue for me on Windows.
msg382185 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-11-30 19:52
This is covered by issue26227
Date User Action Args
2022-04-11 14:59:38adminsetgithub: 86661
2020-11-30 19:52:03steve.dowersetstatus: open -> closed
superseder: Windows: socket.gethostbyaddr(name) fails for non-ASCII hostname
messages: + msg382185

resolution: duplicate
stage: resolved
2020-11-30 14:40:11Peter92setmessages: + msg382154
2020-11-29 20:28:22christian.heimessetnosy: + christian.heimes

messages: + msg382093
versions: - Python 3.6, Python 3.7
2020-11-29 02:08:11Peter92create