classification
Title: socket: AF_UNIX socket paths not handled according to PEP 383
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: baikie, haypo, loewis, pitrou, python-dev
Priority: normal Keywords: patch

Created on 2010-04-11 18:45 by baikie, last changed 2011-12-16 13:47 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
af_unix-pep383.diff baikie, 2010-04-11 18:45 Handle AF_UNIX addresses according to PEP 383
af_unix-pep383-no-8372-fix.diff baikie, 2010-04-11 18:47 Version which applies onto existing code, without fixing issue #8372
test-existing.diff baikie, 2010-04-11 18:48 Tests for existing behaviour that patch preserves
test-new.diff baikie, 2010-04-11 18:49 Tests for new behaviour (surrogateescape handling)
af_unix-pep383-doc.diff baikie, 2010-04-11 18:50 Doc updates
af_unix-pep383-docs-tests-3.2.diff baikie, 2010-09-11 18:08 Documentation and tests for new version of patch
af_unix-pep383-3.2-with-linux-unterminated.diff baikie, 2010-09-11 18:10 Patch for 3.2 (if linux-pass-unterminated patch from issue #8372 is applied)
af_unix-pep383-3.2-without-linux-unterminated.diff baikie, 2010-09-11 18:10 Patch for 3.2 (if linux-pass-unterminated patch is not applied)
af_unix-pep383-docs-tests-3.2-2.diff baikie, 2010-09-11 19:13 Removed use of sys.setfilesystemencoding()
Messages (6)
msg102865 - (view) Author: David Watson (baikie) Date: 2010-04-11 18:45
In 3.x, the socket module assumes that AF_UNIX addresses use
UTF-8 encoding - this means, for example, that accept() will
raise UnicodeDecodeError if the peer socket path is not valid
UTF-8, which could crash an unwary server.

Python 3.1.2 (r312:79147, Mar 23 2010, 19:02:21) 
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more
information.
>>> from socket import *
>>> s = socket(AF_UNIX, SOCK_STREAM)
>>> s.bind(b"\xff")
>>> s.getsockname()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte

I'm attaching a patch to handle socket paths according to PEP
383.  Normally this would use PyUnicode_FSConverter, but there
are a couple of ways in which the address handling currently
differs from normal filename handling.

One is that embedded null bytes are passed through to the system
instead of being rejected, which is needed for the Linux abstract
namespace.  These abstract addresses are returned as bytes
objects, but they can currently be specified as strings with
embedded null characters as well.  The patch preserves this
behaviour.

The current code also accepts read-only buffer objects (it uses
the "s#" format), so in order to accept these as well as
bytearray filenames (which the posix module accepts), the patch
simply accepts any single-segment buffer, read-only or not.

This patch applies on top of the patches I submitted for issue
#8372 (rather than knowingly running past the end of sun_path).
msg102866 - (view) Author: David Watson (baikie) Date: 2010-04-11 18:47
This patch does the same thing without fixing issue #8372 (not
that I'd recommend that, but it may be easier to review).
msg116111 - (view) Author: David Watson (baikie) Date: 2010-09-11 18:08
Updated the patches for Python 3.2 - these are now simpler as
they do not support bytearray arguments, as these are no longer
used for filenames (the existing code does not support bytearrays
either).

I've put the docs and tests in one patch, and made separate
patches for the code, one for if the linux-pass-unterminated
patch from issue #8372 is applied, and one for if it isn't.

One point I neglected to comment on before is the ability to
specify an address in the Linux abstract namespace as a
filesystem-encoded string prefixed with a null character.  This
may seem strange, but as well as simplifying the code, it does
support an actual use case, as on Linux systems the abstract
namespace is sometimes used to hold names based on real
filesystem paths such as "\x00/var/run/hald/dbus-XAbemUfDyQ", or
imaginary ones, such as "\x00/com/ubuntu/upstart".  In fact,
running "netstat" on my own system did not reveal any non-textual
abstract names in use (although they are of course allowed).
msg116116 - (view) Author: David Watson (baikie) Date: 2010-09-11 19:13
One of the tests got broken by the removal of sys.setfilesystemencoding().  Replaced it.
msg149621 - (view) Author: Roundup Robot (python-dev) Date: 2011-12-16 13:47
New changeset 1f23bb74f4bc by Antoine Pitrou in branch 'default':
Issue #8373: The filesystem path of AF_UNIX sockets now uses the filesystem
http://hg.python.org/cpython/rev/1f23bb74f4bc
msg149622 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-16 13:47
Patch committed in 3.3 (default branch), thank you!
History
Date User Action Args
2011-12-16 13:47:41pitrousetstatus: open -> closed

versions: + Python 3.3, - Python 3.1, Python 3.2
nosy: + pitrou

messages: + msg149622
resolution: fixed
stage: patch review -> resolved
2011-12-16 13:47:07python-devsetnosy: + python-dev
messages: + msg149621
2010-09-11 19:13:00baikiesetfiles: + af_unix-pep383-docs-tests-3.2-2.diff

messages: + msg116116
2010-09-11 18:10:35baikiesetfiles: + af_unix-pep383-3.2-without-linux-unterminated.diff
2010-09-11 18:10:04baikiesetfiles: + af_unix-pep383-3.2-with-linux-unterminated.diff
2010-09-11 18:08:43baikiesetfiles: + af_unix-pep383-docs-tests-3.2.diff

messages: + msg116111
2010-04-11 19:04:55pitrousetpriority: normal
nosy: + loewis, haypo
stage: patch review

versions: - Python 3.3
2010-04-11 18:50:11baikiesetfiles: + af_unix-pep383-doc.diff
2010-04-11 18:49:38baikiesetfiles: + test-new.diff
2010-04-11 18:48:57baikiesetfiles: + test-existing.diff
2010-04-11 18:47:33baikiesetfiles: + af_unix-pep383-no-8372-fix.diff

messages: + msg102866
2010-04-11 18:45:26baikiecreate