classification
Title: socket.gethostbyaddr raises error if invalid unicode in hosts
Type: behavior Stage: resolved
Components: Library (Lib), Unicode, Windows Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Windows: socket.gethostbyaddr(name) fails for non-ASCII hostname
View: 26227
Assigned To: Nosy List: Peter92, christian.heimes, ezio.melotti, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2020-11-29 02:08 by Peter92, last changed 2020-11-30 19:52 by steve.dower. This issue is now closed.

Messages (4)
msg382029 - (view) Author: Peter Hunt (Peter92) Date: 2020-11-29 02:08
If the hosts file contains invalid unicode, then the socket module will break when attempting to get the list of hosts. This renders modules such as Flask and Django unusable.

Background:
    I had a mapping to localghost (https://twitter.com/rfreebern/status/1214560971185778693), and Docker incorrectly rewrote the hosts file during installation, turning it from "127.0.0.1 xn--9q8h" to "127.0.0.1 xn-9q8h".
    The socket module was not able to handle that, and was failing with a UnicodeDecodeError in Python 3.6+ as it attempted to list the addresses for "127.0.0.1".

How to reproduce:
    # Add "127.0.0.1 xn-9q8h" to C:/Windows/System32/drivers/etc/hosts
    
    >>> socket.gethostbyaddr('127.0.0.1')
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte

Expected:
    In Python 2.7 - 3.5, it returns 'xn\u20139q8h'.
    I feel returning "xn-9q8h" as a string would be best, but ignoring it could be an option.
    An alternative would be to raise socket.error.

Full traceback from Flask as an example:

    Traceback (most recent call last):
    File "__init__.py", line 285, in <module>
        app.run()
    File "C:\Users\Peter\AppData\Roaming\Python\Python37\site-packages\flask\app.py", line 990, in run
        run_simple(host, port, self, **options)
    File "C:\Users\Peter\AppData\Roaming\Python\Python37\site-packages\werkzeug\serving.py", line 1052, in run_simple
        inner()
    File "C:\Users\Peter\AppData\Roaming\Python\Python37\site-packages\werkzeug\serving.py", line 1005, in inner
        fd=fd,
    File "C:\Users\Peter\AppData\Roaming\Python\Python37\site-packages\werkzeug\serving.py", line 848, in make_server
        host, port, app, request_handler, passthrough_errors, ssl_context, fd=fd
    File "C:\Users\Peter\AppData\Roaming\Python\Python37\site-packages\werkzeug\serving.py", line 740, in __init__
        HTTPServer.__init__(self, server_address, handler)
    File "C:\Program Files\Python37\lib\socketserver.py", line 452, in __init__
        self.server_bind()
    File "C:\Program Files\Python37\lib\http\server.py", line 139, in server_bind
        self.server_name = socket.getfqdn(host)
    File "C:\Program Files\Python37\lib\socket.py", line 676, in getfqdn
        hostname, aliases, ipaddrs = gethostbyaddr(name)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte
msg382093 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2020-11-29 20:28
I cannot reproduce the issue on Linux:

# echo "127.0.0.2 xn-9q8h" >> /etc/hosts
# python3.8
>>> import socket
>>> socket.gethostbyaddr("127.0.0.2")
('xn-9q8h', [], ['127.0.0.2'])
msg382154 - (view) Author: Peter Hunt (Peter92) Date: 2020-11-30 14:40
Ah, I just realised it may have been a different dash to the one that can be typed with the keyboard.

From the wiki article (https://en.wikipedia.org/wiki/Dash), using either the "en" or "em" dash will cause the issue for me on Windows.
msg382185 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-11-30 19:52
This is covered by issue26227
History
Date User Action Args
2020-11-30 19:52:03steve.dowersetstatus: open -> closed
superseder: Windows: socket.gethostbyaddr(name) fails for non-ASCII hostname
messages: + msg382185

resolution: duplicate
stage: resolved
2020-11-30 14:40:11Peter92setmessages: + msg382154
2020-11-29 20:28:22christian.heimessetnosy: + christian.heimes

messages: + msg382093
versions: - Python 3.6, Python 3.7
2020-11-29 02:08:11Peter92create