Message 111766 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	baikie, ezio.melotti, lemburg, loewis, vstinner
Date	2010-07-28.02:44:41
SpamBayes Score	1.0403998e-06
Marked as misclassified	No
Message-id	<1280285084.45.0.881472146417.issue9377@psf.upfronthosting.co.za>
In-reply-to

Content
I like the idea of using the PEP 383 for hostnames, but I don't understand the relation with IDNA (maybe because I don't know this encoding). +this leaves IDNA ASCII-compatible encodings in ASCII +form, but converts any non-ASCII bytes in the hostname to the Unicode +lone surrogate codes U+DC80...U+DCFF. What is an "IDNA ASCII-compatible encoding"? -- ascii-surrogateescape.diff: - I don't like unicode_from_hostname() name: "decode_hostname()" would be better. - It doesn't patch the doc and so cannot be applied alone. It doesn't matter, it's better to apply both patches at the same time. But thanks to have splitted them, it's easier to review them :-) try-surrogateescape-first.diff: - hostname_to_bytes() should be called "encode_hostname()" - if (!PyErr_ExceptionMatches(PyExc_UnicodeError)): you should catch UnicodeEncodeError here - "if this is not possible, :exc:`UnicodeError` is raised.": is it an UnicodeEncodeError? - use PyUnicode_AsEncodedString() instead of PyUnicode_AsEncodedObject(): it's faster for ASCII and ensure that the result is a bytes object (so you don't need to re-check the type)

I like the idea of using the PEP 383 for hostnames, but I don't understand the relation with IDNA (maybe because I don't know this encoding).

+this leaves IDNA ASCII-compatible encodings in ASCII
+form, but converts any non-ASCII bytes in the hostname to the Unicode
+lone surrogate codes U+DC80...U+DCFF.

What is an "IDNA ASCII-compatible encoding"?

--

ascii-surrogateescape.diff: 
 - I don't like unicode_from_hostname() name: "decode_hostname()" would be better.
 - It doesn't patch the doc and so cannot be applied alone. It doesn't matter, it's better to apply both patches at the same time. But thanks to have splitted them, it's easier to review them :-)

try-surrogateescape-first.diff:
 - hostname_to_bytes() should be called "encode_hostname()"
 - if (!PyErr_ExceptionMatches(PyExc_UnicodeError)):  you should catch UnicodeEncodeError here
 - "if this is not possible, :exc:`UnicodeError` is raised.": is it an UnicodeEncodeError?
 - use PyUnicode_AsEncodedString() instead of PyUnicode_AsEncodedObject(): it's faster for ASCII and ensure that the result is a bytes object (so you don't need to re-check the type)

History
Date	User	Action	Args
2010-07-28 02:44:44	vstinner	set	recipients: + vstinner, lemburg, loewis, baikie, ezio.melotti
2010-07-28 02:44:44	vstinner	set	messageid: <1280285084.45.0.881472146417.issue9377@psf.upfronthosting.co.za>
2010-07-28 02:44:42	vstinner	link	issue9377 messages
2010-07-28 02:44:41	vstinner	create