classification
Title: socket.getsockname() inconsistent return type with AF_UNIX
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: giampaolo.rodola, neologix, pitrou, python-dev, vstinner
Priority: normal Keywords: needs review, patch

Created on 2013-04-10 01:02 by giampaolo.rodola, last changed 2013-06-01 17:26 by neologix.

Files
File name Uploaded Description Edit
issue17683.patch giampaolo.rodola, 2013-04-10 01:08 review
af_unix_str.diff neologix, 2013-05-20 17:59 review
Messages (15)
msg186472 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-04-10 01:02
(bytes vs. string)

>>> import socket 
>>> s = socket.socket(socket.AF_UNIX)
>>> s.getsockname()
b''
>>> s.bind(('/tmp/temp'))
>>> s.getsockname()
'/tmp/temp'
>>>
msg186473 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-04-10 01:08
Here's a patch.
msg186477 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-04-10 05:44
> Here's a patch.

Your patch returns an empty string in case of Linux abstract namespace: this is wrong, because even though the first char in sun_path is NIL, the path is non-null, there can be other non-NIL char following.

Example (the output returns a str because it's a Python 2.6 version):

>>> import socket
>>> s = socket.socket(socket.AF_UNIX)
>>> s.bind(b'\x00hello')
>>> s.getsockname()
'\x00hello'

That's why it returns bytes.
msg186478 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-10 06:33
Well, couldn't we simply return a string in that case? We just have to be more careful than in the patch :-)
msg186480 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-04-10 07:01
> Well, couldn't we simply return a string in that case? We just have to be more careful than in the patch :-)

Probably, provided that:

- s.bind(<addr>).getsockname() == <addr>
- it doesn't break existing code :-)
msg186487 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-10 12:35
Apparently the current code is a result of Guido's changeset 980308fbda29 which changed normal Unix sockets to use unicode objects, but Linux abstract namespace sockets to use bytes objects.

I don't know if a lot of thought was given to the issue in that changeset; I would favour using PyUnicode_DecodeFSDefaultAndSize for Linux abstract namespace sockets too.
msg186497 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-04-10 14:37
Sorry, I don't think I have anything to add.
msg186499 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-04-10 14:50
Note that if we pass bytes to bind(), getsockname() will return str anyway:

>>> s = socket.socket(socket.AF_UNIX)
>>> s.bind(b'hello')
>>> s.getsockname()
'hello'

That said, it seems more consistent to me to return str also in case of abstract namespace. 
As per:
http://blog.eduardofleury.com/archives/2007/09/13
...one is supposed to set only the first byte to null, and the rest of the path is supposed to be a 'plain' string.
msg189682 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-20 17:59
Here's a patch.
Note that I had to adapt test_socket (which was passing bytes).
msg189684 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-20 19:05
Looks fine to me :-)
msg189726 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-05-21 07:49
New changeset c0f2b038fc12 by Charles-François Natali in branch 'default':
Issue #17683: socket module: return AF_UNIX addresses in Linux abstract
http://hg.python.org/cpython/rev/c0f2b038fc12
msg189727 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-21 08:24
Antoine, I need your help :-)

http://buildbot.python.org/all/builders/x86 Gentoo Non-Debug
3.x/builds/4311/steps/test/logs/stdio
"""
======================================================================
ERROR: testLinuxAbstractNamespace (test.test_socket.TestLinuxAbstractNamespace)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/test_socket.py",
line 4456, in testLinuxAbstractNamespace
    s1.bind(address)
UnicodeEncodeError: 'ascii' codec can't encode character '\xff' in
position 19: ordinal not in range(128)
"""

I don't know jack s*** about encoding, but from what I understand, the
problem is that the test is passing an invalid ordinal, now that test
is passing a string. Should I revert this part of the test (i.e. make
the passed address a bytes again)?
msg189730 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-21 08:37
Hum, IIUC, basically what happens is that the user could - and still
can - pass arbitrary bytes as address (which is legtit), but depending
on the encoding, getsockaddr() and friends might blow up when decoding
it.
If that's correct, that's bad, and that would be a good reason to keep bytes.
msg189741 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-21 10:41
> Hum, IIUC, basically what happens is that the user could - and still
> can - pass arbitrary bytes as address (which is legtit), but depending
> on the encoding, getsockaddr() and friends might blow up when decoding
> it.

Shouldn't the surrogateescape error handler (PEP 383) prevent this?
msg189869 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-23 17:05
> Shouldn't the surrogateescape error handler (PEP 383) prevent this?

Yes, it shoud (I just read PEP 383, I told you I didn't know anything
about encoding :-).

So basically, for the test failure, the issue is simply that the
platform's default encoding can't encode character '\xff'.

Should I simply remove the offending character from this test address?

Also, let's say I wanted to test that it can be passed and returned
properly, so I add '\0xff' to the adress passed to testBytesName:

s.bind(b"\x00python\x00test\xff")

How should I check the string returned by getsockname()?

self.assertEquals(s.getsockname(), ???)
History
Date User Action Args
2013-06-01 17:26:38neologixsetnosy: + vstinner
2013-05-23 17:05:11neologixsetmessages: + msg189869
2013-05-21 10:41:28pitrousetmessages: + msg189741
2013-05-21 08:37:40neologixsetmessages: + msg189730
2013-05-21 08:24:38neologixsetmessages: + msg189727
2013-05-21 07:49:45python-devsetnosy: + python-dev
messages: + msg189726
2013-05-20 19:05:16pitrousetmessages: + msg189684
2013-05-20 18:53:22gvanrossumsetnosy: - gvanrossum
2013-05-20 17:59:14neologixsetfiles: + af_unix_str.diff
messages: + msg189682

components: + Library (Lib)
keywords: + needs review
type: behavior
stage: patch review
2013-04-10 14:50:34giampaolo.rodolasetmessages: + msg186499
2013-04-10 14:37:00gvanrossumsetmessages: + msg186497
2013-04-10 12:35:03pitrousetnosy: + gvanrossum
messages: + msg186487
2013-04-10 07:01:18neologixsetmessages: + msg186480
2013-04-10 06:33:13pitrousetmessages: + msg186478
2013-04-10 05:44:13neologixsetnosy: + neologix
messages: + msg186477
2013-04-10 01:08:30giampaolo.rodolasetfiles: + issue17683.patch
keywords: + patch
messages: + msg186473
2013-04-10 01:02:50giampaolo.rodolacreate