Message245826
I've updated the ASCII/surrogateescape patches in line with
various changes to Python since I posted them.
return-ascii-surrogateescape-2015-06-25.diff incorporates the
ascii-surrogateescape and uname-surrogateescape patches, and
accept-ascii-surrogateescape-2015-06-25.diff corresponds to the
try-surrogateescape-first patch. Neither patch touches
gethostname() on Windows.
Python's existing code now has a fast path for ASCII-only strings
which passes them through unchanged (Unicode -> ASCII), so in
order not to slow down processing of valid IDNs, the latter patch
now effectively tries encodings in the order
ASCII/strict (existing code, fast path)
IDNA/strict (existing code)
ASCII/surrogateescape (added by patch)
rather than the previous
ASCII/surrogateescape
IDNA/strict
This doesn't change the behaviour of the patch, since IDNA always
rejects strings containing surrogate codes, and either rejects
ASCII-only strings (e.g. when a label is longer than 63
characters) or passes them through unchanged.
These patches would at least allow getfqdn() to work in Almad's
example, but in that case the host also appears to be addressable
by the IDNA equivalent ("xn--didejo-noas-1ic") of its Unicode
hostname (I haven't checked as I'm not a Windows user, but I
presume the UnicodeDecodeError came from gethost_common() in
socketmodule.c and hence the name lookup was successful), so it
would certainly be more helpful to return Unicode for non-ASCII
gethostbyaddr() results there, if they were guaranteed to map to
real IDNA hostnames in Windows environments.
(That isn't guaranteed in Unix environments of course, which is
why I'm still suggesting ASCII/surrogateescape for the general
case.) |
|
Date |
User |
Action |
Args |
2015-06-25 20:28:16 | baikie | set | recipients:
+ baikie, lemburg, loewis, amaury.forgeotdarc, vstinner, ezio.melotti, r.david.murray, jesterKing, spaun2002, steve.dower, Almad |
2015-06-25 20:28:14 | baikie | link | issue9377 messages |
2015-06-25 20:28:14 | baikie | create | |
|