Message 102865 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	baikie
Recipients	baikie
Date	2010-04-11.18:45:24
SpamBayes Score	3.330669e-16
Marked as misclassified	No
Message-id	<1271011528.87.0.0855011916856.issue8373@psf.upfronthosting.co.za>
In-reply-to

Content
In 3.x, the socket module assumes that AF_UNIX addresses use UTF-8 encoding - this means, for example, that accept() will raise UnicodeDecodeError if the peer socket path is not valid UTF-8, which could crash an unwary server. Python 3.1.2 (r312:79147, Mar 23 2010, 19:02:21) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from socket import * >>> s = socket(AF_UNIX, SOCK_STREAM) >>> s.bind(b"\xff") >>> s.getsockname() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte I'm attaching a patch to handle socket paths according to PEP 383. Normally this would use PyUnicode_FSConverter, but there are a couple of ways in which the address handling currently differs from normal filename handling. One is that embedded null bytes are passed through to the system instead of being rejected, which is needed for the Linux abstract namespace. These abstract addresses are returned as bytes objects, but they can currently be specified as strings with embedded null characters as well. The patch preserves this behaviour. The current code also accepts read-only buffer objects (it uses the "s#" format), so in order to accept these as well as bytearray filenames (which the posix module accepts), the patch simply accepts any single-segment buffer, read-only or not. This patch applies on top of the patches I submitted for issue #8372 (rather than knowingly running past the end of sun_path).

In 3.x, the socket module assumes that AF_UNIX addresses use
UTF-8 encoding - this means, for example, that accept() will
raise UnicodeDecodeError if the peer socket path is not valid
UTF-8, which could crash an unwary server.

Python 3.1.2 (r312:79147, Mar 23 2010, 19:02:21) 
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more
information.
>>> from socket import *
>>> s = socket(AF_UNIX, SOCK_STREAM)
>>> s.bind(b"\xff")
>>> s.getsockname()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte

I'm attaching a patch to handle socket paths according to PEP
383.  Normally this would use PyUnicode_FSConverter, but there
are a couple of ways in which the address handling currently
differs from normal filename handling.

One is that embedded null bytes are passed through to the system
instead of being rejected, which is needed for the Linux abstract
namespace.  These abstract addresses are returned as bytes
objects, but they can currently be specified as strings with
embedded null characters as well.  The patch preserves this
behaviour.

The current code also accepts read-only buffer objects (it uses
the "s#" format), so in order to accept these as well as
bytearray filenames (which the posix module accepts), the patch
simply accepts any single-segment buffer, read-only or not.

This patch applies on top of the patches I submitted for issue
#8372 (rather than knowingly running past the end of sun_path).

History
Date	User	Action	Args
2010-04-11 18:45:29	baikie	set	recipients: + baikie
2010-04-11 18:45:28	baikie	set	messageid: <1271011528.87.0.0855011916856.issue8373@psf.upfronthosting.co.za>
2010-04-11 18:45:26	baikie	link	issue8373 messages
2010-04-11 18:45:25	baikie	create