Message119076
> I would have thought that someone who intended a Unicode hostname
> to be looked up in its IDNA form would have encoded it using
> IDNA, rather than an 8-bit encoding - how many C programs would
> transcode the name that way, rather than just passing the char *
> from one interface to another?
Well, Python is not C. In Python, you would pass a str, and
expect it to work, which means it will get automatically encoded
with IDNA.
> In fact, I would think that non-ASCII bytes in a hostname most
> probably indicated that a name resolution mechanism other than
> the DNS was in use, and that the byte string should be passed
> unaltered just as a typical C program would.
I'm not talking about byte strings, but character strings.
> I don't object to that, but it does force a choice between
> decoding an 8-bit name for display (e.g. by using the locale
> encoding), and decoding it to round-trip automatically (e.g. by
> using ASCII/surrogateescape, with support on the encoding side).
In the face of ambiguity, refuse the temptation to guess.
> So overall, I do think it is better to decode names for automatic
> round-tripping rather than for display, but my main concern is
> simply that it should be possible to recover the original bytes
> so that round-tripping is at least possible.
Marc-Andre wants gethostname to use the Wide API on Windows, which,
in theory, allows for cases where round-tripping to bytes is
impossible. |
|
Date |
User |
Action |
Args |
2010-10-18 20:37:08 | loewis | set | recipients:
+ loewis, lemburg, vstinner, baikie, ezio.melotti, jesterKing |
2010-10-18 20:37:06 | loewis | link | issue9377 messages |
2010-10-18 20:37:06 | loewis | create | |
|