Update documentation for address formats to describe AF_UNIX surrogateescape handling and Linux abstract namespace. diff --git a/Doc/library/socket.rst b/Doc/library/socket.rst --- a/Doc/library/socket.rst +++ b/Doc/library/socket.rst @@ -34,8 +34,15 @@ in the C interface: as with :meth:`read` files, buffer allocation on receive operations is automatic, and buffer length is implicit on send operations. -Socket addresses are represented as follows: A single string is used for the -:const:`AF_UNIX` address family. A pair ``(host, port)`` is used for the +The address format required by a particular socket object is +automatically selected based on the address family specified when the +socket object was created. Socket addresses are represented as follows: + +``None`` does not represent an address in any family, but is returned +when the operating system does not return an address structure, as is +often the case when receiving data on a connected socket. + +A pair ``(host, port)`` is used for the :const:`AF_INET` address family, where *host* is a string representing either a hostname in Internet domain notation like ``'daring.cwi.nl'`` or an IPv4 address like ``'100.50.200.5'``, and *port* is an integral port number. For @@ -44,10 +51,7 @@ scopeid)`` is used, where *flowinfo* and and ``sin6_scope_id`` member in :const:`struct sockaddr_in6` in C. For :mod:`socket` module methods, *flowinfo* and *scopeid* can be omitted just for backward compatibility. Note, however, omission of *scopeid* can cause problems -in manipulating scoped IPv6 addresses. Other address families are currently not -supported. The address format required by a particular socket object is -automatically selected based on the address family specified when the socket -object was created. +in manipulating scoped IPv6 addresses. For IPv4 addresses, two special forms are accepted instead of a host address: the empty string represents :const:`INADDR_ANY`, and the string @@ -62,6 +66,24 @@ differently into an actual IPv4/v6 addre resolution and/or the host configuration. For deterministic behavior use a numeric address in *host* portion. +The address of an :const:`AF_UNIX` socket bound to a file system node +is represented as a string, using the file system encoding and the +``'surrogateescape'`` error handler (see :pep:`383`). An address in +Linux's abstract namespace is returned as a :class:`bytes` object with +an initial null byte; note that sockets in this namespace can +communicate with normal file system sockets, so programs intended to +run on Linux may need to handle both types of address. A string, +:class:`bytes` or buffer-compatible object can be used for either type +of address when passing it as an argument. The address of an unbound +:const:`AF_UNIX` socket is usually returned as an empty string or +``None``; some platforms can return the latter even in contexts where +a real address would normally be expected. + +.. versionchanged:: XXX + Previously, :const:`AF_UNIX` socket paths were assumed to use UTF-8 + encoding, and objects supporting the read-write buffer interface + were not accepted. + AF_NETLINK sockets are represented as pairs ``pid, groups``.