Title: PEP 383: socket module doesn't handle undecodable protocol or service names
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.2, Python 3.3, Python 3.4
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: baikie, christian.heimes, loewis, vstinner
Priority: normal Keywords: patch

Created on 2010-08-22 18:26 by baikie, last changed 2013-07-06 00:30 by christian.heimes.

File name Uploaded Description Edit
proto-service-pep383-3.2.diff baikie, 2010-08-22 18:25 Handle socket protocol and service names according to PEP 383
Messages (2)
msg114687 - (view) Author: David Watson (baikie) Date: 2010-08-22 18:25
The protocol and service/port number databases are typically
implemented as text files on Unix and can contain non-ASCII names
in any encoding (presumably for local services), but the socket
module tries to decode them as strict UTF-8.  In particular,
getservbyport() and getnameinfo() will raise UnicodeError when
this fails.

Attached is a patch for 3.2 to use the file system encoding and
surrogateescape handler instead, in line with PEP 383.  This is
what Python already does for the passwd and group databases, and
it will allow protocol and service names to be given correctly as
command line arguments.
msg114747 - (view) Author: David Watson (baikie) Date: 2010-08-23 22:13
Come to think of it, I'm not sure if the patch is correct for
Windows, as PyUnicode_DecodeFSDefault() appears to do strict MBCS
decoding by default (similarly with PyUnicode_FSConverter() for
encoding).  Can Windows return service names that won't decode
with MBCS?  Or does it use a different encoding?  I don't have a
system to experiment with.
Date User Action Args
2013-07-06 00:30:51christian.heimessetnosy: + christian.heimes

versions: + Python 3.3, Python 3.4, - Python 3.1
2011-01-04 01:20:56pitrousetnosy: + loewis, vstinner
2010-08-23 22:13:26baikiesetmessages: + msg114747
2010-08-22 18:26:00baikiecreate