This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author baikie
Recipients Almad, amaury.forgeotdarc, baikie, ezio.melotti, jesterKing, lemburg, loewis, r.david.murray, spaun2002, steve.dower, vstinner
Date 2015-06-25.20:28:14
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <20150625202807.GA19433@localhost.dbwats.plus.com>
In-reply-to <1431777658.14.0.622939986647.issue9377@psf.upfronthosting.co.za>
Content
I've updated the ASCII/surrogateescape patches in line with
various changes to Python since I posted them.

return-ascii-surrogateescape-2015-06-25.diff incorporates the
ascii-surrogateescape and uname-surrogateescape patches, and
accept-ascii-surrogateescape-2015-06-25.diff corresponds to the
try-surrogateescape-first patch.  Neither patch touches
gethostname() on Windows.

Python's existing code now has a fast path for ASCII-only strings
which passes them through unchanged (Unicode -> ASCII), so in
order not to slow down processing of valid IDNs, the latter patch
now effectively tries encodings in the order

   ASCII/strict (existing code, fast path)
   IDNA/strict (existing code)
   ASCII/surrogateescape (added by patch)

rather than the previous

   ASCII/surrogateescape
   IDNA/strict

This doesn't change the behaviour of the patch, since IDNA always
rejects strings containing surrogate codes, and either rejects
ASCII-only strings (e.g. when a label is longer than 63
characters) or passes them through unchanged.

These patches would at least allow getfqdn() to work in Almad's
example, but in that case the host also appears to be addressable
by the IDNA equivalent ("xn--didejo-noas-1ic") of its Unicode
hostname (I haven't checked as I'm not a Windows user, but I
presume the UnicodeDecodeError came from gethost_common() in
socketmodule.c and hence the name lookup was successful), so it
would certainly be more helpful to return Unicode for non-ASCII
gethostbyaddr() results there, if they were guaranteed to map to
real IDNA hostnames in Windows environments.

(That isn't guaranteed in Unix environments of course, which is
why I'm still suggesting ASCII/surrogateescape for the general
case.)
Files
File name Uploaded
accept-ascii-surrogateescape-2015-06-25.diff baikie, 2015-06-25.20:28:12
return-ascii-surrogateescape-2015-06-25.diff baikie, 2015-06-25.20:28:10
History
Date User Action Args
2015-06-25 20:28:16baikiesetrecipients: + baikie, lemburg, loewis, amaury.forgeotdarc, vstinner, ezio.melotti, r.david.murray, jesterKing, spaun2002, steve.dower, Almad
2015-06-25 20:28:14baikielinkissue9377 messages
2015-06-25 20:28:14baikiecreate