Message 119076 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	loewis
Recipients	baikie, ezio.melotti, jesterKing, lemburg, loewis, vstinner
Date	2010-10-18.20:37:06
SpamBayes Score	2.6056961e-08
Marked as misclassified	No
Message-id	<4CBCAFF1.9050105@v.loewis.de>
In-reply-to	<20101018181136.GA3631@dbwatson.ukfsn.org>

Content
> I would have thought that someone who intended a Unicode hostname > to be looked up in its IDNA form would have encoded it using > IDNA, rather than an 8-bit encoding - how many C programs would > transcode the name that way, rather than just passing the char * > from one interface to another? Well, Python is not C. In Python, you would pass a str, and expect it to work, which means it will get automatically encoded with IDNA. > In fact, I would think that non-ASCII bytes in a hostname most > probably indicated that a name resolution mechanism other than > the DNS was in use, and that the byte string should be passed > unaltered just as a typical C program would. I'm not talking about byte strings, but character strings. > I don't object to that, but it does force a choice between > decoding an 8-bit name for display (e.g. by using the locale > encoding), and decoding it to round-trip automatically (e.g. by > using ASCII/surrogateescape, with support on the encoding side). In the face of ambiguity, refuse the temptation to guess. > So overall, I do think it is better to decode names for automatic > round-tripping rather than for display, but my main concern is > simply that it should be possible to recover the original bytes > so that round-tripping is at least possible. Marc-Andre wants gethostname to use the Wide API on Windows, which, in theory, allows for cases where round-tripping to bytes is impossible.

> I would have thought that someone who intended a Unicode hostname
> to be looked up in its IDNA form would have encoded it using
> IDNA, rather than an 8-bit encoding - how many C programs would
> transcode the name that way, rather than just passing the char *
> from one interface to another?

Well, Python is not C. In Python, you would pass a str, and
expect it to work, which means it will get automatically encoded
with IDNA.

> In fact, I would think that non-ASCII bytes in a hostname most
> probably indicated that a name resolution mechanism other than
> the DNS was in use, and that the byte string should be passed
> unaltered just as a typical C program would.

I'm not talking about byte strings, but character strings.

> I don't object to that, but it does force a choice between
> decoding an 8-bit name for display (e.g. by using the locale
> encoding), and decoding it to round-trip automatically (e.g. by
> using ASCII/surrogateescape, with support on the encoding side).

In the face of ambiguity, refuse the temptation to guess.

> So overall, I do think it is better to decode names for automatic
> round-tripping rather than for display, but my main concern is
> simply that it should be possible to recover the original bytes
> so that round-tripping is at least possible.

Marc-Andre wants gethostname to use the Wide API on Windows, which,
in theory, allows for cases where round-tripping to bytes is
impossible.

History
Date	User	Action	Args
2010-10-18 20:37:08	loewis	set	recipients: + loewis, lemburg, vstinner, baikie, ezio.melotti, jesterKing
2010-10-18 20:37:06	loewis	link	issue9377 messages
2010-10-18 20:37:06	loewis	create