classification
Title: unicode DNS names in socket, urllib, urlopen
Type: enhancement Stage: patch review
Components: Library (Lib), Unicode Versions: Python 3.2
process
Status: closed Resolution: accepted
Dependencies: Superseder: unicode DNS names in urllib, urlopen
View: 9679
Assigned To: Nosy List: baikie, flox, gdamjan, loewis, orsenthil, vstinner
Priority: normal Keywords: buildbot, patch

Created on 2004-09-13 12:38 by gdamjan, last changed 2010-08-25 07:44 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
idna.diff baikie, 2010-03-22 21:55 Make gethostbyname(), gethostbyname_ex(), getnameinfo() use IDNA encoding (2.x/3.x)
socket-idna.diff baikie, 2010-08-22 18:29 Added gethostbyaddr()
getnameinfo-numerichost.diff baikie, 2010-08-23 22:47
Messages (8)
msg60563 - (view) Author: Damjan Georgievski (gdamjan) Date: 2004-09-13 12:38
http://docs.python.org/whatsnew/node18.html says that
unicode host names are allowed in the socket module
(automatically converting them by the IDNA spec), but
is seems the support is not fully implemented.

only the connect method of a socket instance will do
the auto conversion to a 'idna' string. 
socket.getaddr* functions will not!

Also other modules should support unicode hostnames.
(httplib already does) but urllib and urllib2 don't.
msg60564 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-09-14 07:32
Logged In: YES 
user_id=21627

Would you be interested in developing patches? For the
socket module, the bug is clear, and probably
straight-forward to fix. For urllib[2], issues are more
difficult, as Python should eventually support IRIs. It
appears that draft-duerst-iri-09 is going to become the RFC,
so changes for urllib either need to take this into account,
or be postponed after the RFC is published.
msg101537 - (view) Author: David Watson (baikie) Date: 2010-03-22 21:55
I was about to report this for the socket module - the gethostbyname(), gethostbyname_ex() and getnameinfo() functions are the only things currently affected in that module as far as I can see.  3.x is affected too - the functions will pass non-ASCII Unicode to the system as UTF-8 there.  The attached patch fixes them in 2.x and 3.x.
msg114690 - (view) Author: David Watson (baikie) Date: 2010-08-22 18:29
Updated the socket module patch to include gethostbyaddr() - it
happens to accept hostnames and is used this way in the standard
library.
msg114696 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-08-22 19:35
Thanks for the patch. Committed as r84261.

I'm not sure what the point is of supporting IDNA in getnameinfo, so I have removed that from the patch. If you think it's needed, please elaborate.
msg114727 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-08-23 07:51
Two builders are sad:
 * x86 gentoo
 * sparc solaris10 gcc

======================================================================
ERROR: test_idna (test.test_socket.GeneralModuleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_socket.py", line 644, in test_idna
    socket.getaddrinfo('\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435.python.org',0)
socket.gaierror: [Errno 9] service name not available for the specified socket type

http://www.python.org/dev/buildbot/builders/x86%20gentoo%203.x/builds/2898
http://www.python.org/dev/buildbot/builders/sparc%20solaris10%20gcc%203.x/builds/1472
msg114753 - (view) Author: David Watson (baikie) Date: 2010-08-23 22:47
> Thanks for the patch. Committed as r84261.
> 
> I'm not sure what the point is of supporting IDNA in getnameinfo, so I have removed that from the patch. If you think it's needed, please elaborate.

I don't see the point of it either, but if it's not supposed to
accept hostnames, it should use AI_NUMERICHOST in the call it
makes to getaddrinfo().  As it is, it does both forward and
reverse lookups when called with a hostname.

Attaching a patch to use AI_NUMERICHOST.

Also, this issue # isn't really resolved yet as Python does not
support IRIs (AFAIK).
msg114885 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-08-25 07:44
I have now committed file 18615 as r84313: thanks for the patch.

I have split this issue into two: this one is only about the socket module, and #9679 carries any remaining features (it would be good if we have only one bug per bug report).

Since the buildbots are also happy now (after r84277), I'm closing this as fixed again.
History
Date User Action Args
2010-08-25 07:44:42loewissetstatus: open -> closed
superseder: unicode DNS names in urllib, urlopen
messages: + msg114885
2010-08-23 22:47:42baikiesetfiles: + getnameinfo-numerichost.diff

messages: + msg114753
2010-08-23 07:51:28floxsetstatus: closed -> open

nosy: + flox
messages: + msg114727

keywords: + buildbot
2010-08-22 19:38:22loewissetstatus: open -> closed
resolution: accepted
2010-08-22 19:35:59loewissetmessages: + msg114696
2010-08-22 18:29:52baikiesetfiles: + socket-idna.diff

messages: + msg114690
2010-08-21 22:56:29pitrousetnosy: + vstinner
2010-08-19 16:28:07BreamoreBoysetstage: test needed -> patch review
versions: - Python 2.7, Python 3.3
2010-03-22 21:55:36baikiesetfiles: + idna.diff
versions: + Python 3.2, Python 3.3
nosy: + baikie

messages: + msg101537

keywords: + patch
2009-03-24 23:02:13vstinnersetnosy: - vstinner
2009-02-12 18:26:46ajaksu2setnosy: + orsenthil, vstinner
components: + Unicode
stage: test needed
2009-02-09 01:20:46ajaksu2settype: enhancement
components: + Library (Lib), - None
versions: + Python 2.7
2004-09-13 12:38:17gdamjancreate