classification
Title: os.uname() crashes if hostname contains non-ascii characters
Type: behavior Stage: resolved
Components: Library (Lib), Unicode Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Dominik.Richter, dmi.baranov, ezio.melotti, haypo, neologix, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2013-05-31 19:58 by Dominik.Richter, last changed 2013-06-04 05:11 by Dominik.Richter. This issue is now closed.

Files
File name Uploaded Description Edit
issue18109.patch dmi.baranov, 2013-06-03 12:21 review
Messages (19)
msg190413 - (view) Author: Dominik Richter (Dominik.Richter) Date: 2013-05-31 19:58
To reproduce (tested on Arch Linux, python 3.3.2):

  sudo hostname hât
  python -c "import os; os.uname()"

produces:

  Traceback (most recent call last):
    File "<string>", line 1, in <module>
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
msg190415 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-05-31 21:45
I'm unable to set an non-ASCII hostname on Fedora 18 (Linux kernel 3.9.2):

$ sudo hostname hât
hostname: the specified hostname is invalid
msg190416 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-05-31 21:48
See also issue #9377 (similar issue with the socket module).
msg190417 - (view) Author: Dmi Baranov (dmi.baranov) * Date: 2013-05-31 21:54
I just checked RFC952 [1] and RFC1123 [2], that host name is incorrect.
I think, you need report to Arch Linux bug-tracker.

$ sudo hostname hât
hostname: the specified hostname is invalid
$ uname -a
Linux d9frog9n-desktop 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26

[1] http://tools.ietf.org/html/rfc952
[2] http://tools.ietf.org/html/rfc1123
msg190418 - (view) Author: Dmi Baranov (dmi.baranov) * Date: 2013-05-31 21:57
/offtop Dumn, sorry for duplication here, Victor. We not having websockets here, my page not refreshed.
msg190422 - (view) Author: Dominik Richter (Dominik.Richter) Date: 2013-05-31 22:57
@dmi.baranov: You're right, according to:
http://tools.ietf.org/html/rfc952
  <hname> ::= <name>*["."<name>]
  <name>  ::= <let>[*[<let-or-digit-or-hyphen>]<let-or-digit>]
http://tools.ietf.org/html/rfc1178
  Don't use non-alphanumeric characters in a name.
If you use posix definition of alphanumeric, that fits. What a shame ;)
msg190425 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-05-31 23:02
"""
http://tools.ietf.org/html/rfc1178
  Don't use non-alphanumeric characters in a name.
"""

This is a recommendation, it does not mean that it is forbidden. But on Fedora, I'm unable to set a non-ASCII hostname. Arch Linux may be different.

Python should not fail with non-ASCII hostname, but I don't know which encoding should be used: ascii/surrogateescape (PEP 383)? the locale encoding + surrogateescape error handler?
msg190428 - (view) Author: Dominik Richter (Dominik.Richter) Date: 2013-05-31 23:33
@haypo: You're right, RFC1178 is only a recommendation. RFC952 however is mandatory, although it doesn't seem to define <let-or-digit> explicitly (or at least i wasn't able to find it; thus referencing POSIX).

Regarding Arch Linux's hostname: It is part of the package inetutils v1.9.1-5 [1], which is GNU inetutils packaged [2]. Unfortunately I don't know which encoding it uses. I agree with you: It would be great if python supported it.

fyi: I opened a bug on Arch Linux [3] (thank you Dmi for the suggestion)

[1] https://www.archlinux.org/packages/core/i686/inetutils/
[2] http://www.gnu.org/software/inetutils/
[3] https://bugs.archlinux.org/task/35580
msg190431 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-06-01 01:30
Issue 10097 may also have some relevant discussion, even though that issue originates from Windows.
msg190441 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-06-01 08:06
To reproduce the issue, try this:
# echo <hostname> > /proc/sys/kernel/hostname

> the locale encoding + surrogateescape error handler

Sounds reasonable.
msg190442 - (view) Author: Dominik Richter (Dominik.Richter) Date: 2013-06-01 08:43
@neologix: (with current hostname showing at the left of my prompt)

    none:~ #> echo hât > /proc/sys/kernel/hostname
    hât:~ #> hostname
    hât
msg190443 - (view) Author: Dominik Richter (Dominik.Richter) Date: 2013-06-01 08:47
/off: nevermind, wasn't directed at me
msg190517 - (view) Author: Dmi Baranov (dmi.baranov) * Date: 2013-06-03 05:47
Thanks Charles - I'm reproduced Dominik's issue at default branch:

$ python -c 'import os, sys;print(sys.version);print(os.uname())' 
3.4.0a0 (default:adfec512fb32, Jun  3 2013, 08:09:43) 
[GCC 4.6.3]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

Lastest branches affected only, so - this a bug.

$ python -c 'import os, sys;print(sys.version);print(os.uname())'
2.7.5+ (2.7:e9d0fb934b46, Jun  3 2013, 08:05:55) 
[GCC 4.6.3]
('Linux', 'h\xc3\xa2t', '3.2.0-32-generic', '#51-Ubuntu SMP Wed Sep 26 21:32:50 UTC 2012', 'i686')

$ python -c 'import os, sys;print(sys.version);print(os.uname())'
3.2.5 (3.2:b9b521efeba3, Jun  3 2013, 08:24:06) 
[GCC 4.6.3]
('Linux', 'hât', '3.2.0-32-generic', '#51-Ubuntu SMP Wed Sep 26 21:32:50 UTC 2012', 'i686')

Env:
$ hostname
hât
$ locale
LANG=en_US.UTF-8
...

BTW, that issue do not allow to compile from sources on hosts with similar names, I've created separate issue #18124 (possible a duplicate, but another behavior)
msg190540 - (view) Author: Dmi Baranov (dmi.baranov) * Date: 2013-06-03 12:21
There is patch. Test is non-LGTM, because having a side effect for hostname and requires root's permissions for manipulations with hostname[*]. Someone having ideas how I can "mock" system `uname` call?

[*] But this way is OK for Lib/test/test_sockets.py. I'm overheading here? (-:
msg190559 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-03 20:15
New changeset ffdee6b36305 by Victor Stinner in branch '3.3':
Close #18109: os.uname() now decodes fields from the locale encoding, and
http://hg.python.org/cpython/rev/ffdee6b36305

New changeset 2472603af83e by Victor Stinner in branch 'default':
(Merge 3.3) Close #18109: os.uname() now decodes fields from the locale
http://hg.python.org/cpython/rev/2472603af83e
msg190560 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-03 20:25
issue18109.patch is not correct: it uses the locale encoding in strict mode, the surrogateescape error handler should be used instead. I rewrote the patch. I removed the unit test because changing a hostname is really unexpected and may break (crash?) running desktop applications. The hostname is something too critical IMO. I ran the test manually on my Linux box at least ;-)
msg190561 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-03 20:36
test_logging is failing with a non-ASCII hostname because of the following error:

error: uncaptured python exception, closing channel <test.test_logging.TestSMTPServer listening localhost:0 at 0x7f09a0ef89b0> (<class 'UnicodeEncodeError'>:'ascii' codec can't encode character '\xe9' in position 6: ordinal not in range(128) [/home/haypo/prog/python/default/Lib/asyncore.py|read|83] [/home/haypo/prog/python/default/Lib/asyncore.py|handle_read_event|436] [/home/haypo/prog/python/default/Lib/asyncore.py|handle_accept|513] [/home/haypo/prog/python/default/Lib/test/test_logging.py|handle_accepted|746] [/home/haypo/prog/python/default/Lib/test/test_logging.py|__init__|692] [/home/haypo/prog/python/default/Lib/smtpd.py|push|276])

SMTPChannel.push() uses explicitly the ASCII encoding, whereas test_logging pass the FQDN to push().

I'm not interested to work on this issue. Please open a new issue if you consider important enough.

--

More tests are also failing with *undecodable* hostnames (ex: "aé€\udcff" with UTF-8 locale encoding): test_socket test_urllib test_urllib2 test_logging test_pydoc test_smtplib.

I fixed and closed the issue, even I still think that you should only use ASCII for your hostname ;-)
msg190562 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-03 20:38
Oh, by the way, I also changed socket.gethostname().
msg190588 - (view) Author: Dominik Richter (Dominik.Richter) Date: 2013-06-04 05:11
Thank you all for your help, works great!
@Victor: fully agree on the ascii hostname ;)
History
Date User Action Args
2013-06-04 05:11:53Dominik.Richtersetmessages: + msg190588
2013-06-03 20:38:40hayposetmessages: + msg190562
2013-06-03 20:36:20hayposetmessages: + msg190561
2013-06-03 20:25:17hayposetmessages: + msg190560
2013-06-03 20:15:06python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg190559

resolution: not a bug -> fixed
stage: resolved
2013-06-03 12:21:29dmi.baranovsetfiles: + issue18109.patch
keywords: + patch
messages: + msg190540
2013-06-03 05:47:37dmi.baranovsettype: crash -> behavior
messages: + msg190517
components: + Library (Lib)
versions: + Python 3.4
2013-06-01 08:47:33Dominik.Richtersetmessages: + msg190443
2013-06-01 08:43:58Dominik.Richtersetmessages: + msg190442
2013-06-01 08:06:26neologixsetnosy: + neologix
messages: + msg190441
2013-06-01 01:30:04r.david.murraysetnosy: + r.david.murray
messages: + msg190431
2013-05-31 23:33:54Dominik.Richtersetmessages: + msg190428
2013-05-31 23:02:21hayposetmessages: + msg190425
2013-05-31 22:57:44Dominik.Richtersettype: behavior -> crash
resolution: not a bug
messages: + msg190422
2013-05-31 21:57:01dmi.baranovsetmessages: + msg190418
2013-05-31 21:54:32dmi.baranovsetnosy: + dmi.baranov
messages: + msg190417
2013-05-31 21:48:55hayposetmessages: + msg190416
2013-05-31 21:45:03hayposetnosy: + haypo
messages: + msg190415
2013-05-31 21:03:36neologixsettype: crash -> behavior
2013-05-31 19:58:40Dominik.Richtercreate