Message 259112 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	abarry, eryksun, ezio.melotti, paul.moore, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Date	2016-01-28.09:41:02
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1453974063.09.0.258642734771.issue26227@psf.upfronthosting.co.za>
In-reply-to

Content
> Added comments on Rietveld. Crap. It's easy to miss a compilation error on extensions :-/ I used "make && ./python -m test -v test_socket" to validate gethostbyaddr_encoding-2.patch and it succeded. Maybe we should setup.py to fail if an extension failed to be compiled? New patch should have less typos :-) I also checked for reference leak using ./python -m test -R 3:3 test_socket => no leak. > Why not use PyUnicode_DecodeFSDefault on all platforms? It is used in gethostname() on Unix. I don't know which encoding is the best choice on UNIX. I prefer to move step by step and fix an obvious bug on Windows blocking Émanuel (see his issue #26226). (Émanuel uses Émanuel-PC for its hostname, an non-ASCII hostname ;-)) I guess that UTF-8 works in most cases on UNIX, whereas using the locale encoding can introduce regressions if the hostname is non-ASCII. For example, decoding non-ASCII hostname would fail with LANG=C which forces an ASCII locale encoding. The issue #9377 proposes a more advanced code to choose the encoding to decode hostnames. Sorry, I didn't follow this issue recently, so I don't know if it proposes to use surrogateescape and/or IDNA. I prefer to discuss the encoding used on UNIX in a new issue (or better continue the existing discussion on issue #9377?).

> Added comments on Rietveld.

Crap. It's easy to miss a compilation error on extensions :-/

I used "make && ./python -m test -v test_socket" to validate  gethostbyaddr_encoding-2.patch and it succeded.

Maybe we should setup.py to *fail* if an extension failed to be compiled?

New patch should have less typos :-) I also checked for reference leak using ./python -m test -R 3:3 test_socket => no leak.


> Why not use PyUnicode_DecodeFSDefault on all platforms? It is used in
gethostname() on Unix.

I don't know which encoding is the best choice on UNIX. I prefer to move step by step and fix an obvious bug on Windows blocking Émanuel (see his issue #26226). (Émanuel uses Émanuel-PC for its hostname, an non-ASCII hostname ;-))

I guess that UTF-8 works in most cases on UNIX, whereas using the locale encoding can introduce regressions if the hostname is non-ASCII. For example, decoding non-ASCII hostname would fail with LANG=C which forces an ASCII locale encoding.

The issue #9377 proposes a more advanced code to choose the encoding to decode hostnames. Sorry, I didn't follow this issue recently, so I don't know if it proposes to use surrogateescape and/or IDNA.

I prefer to discuss the encoding used on UNIX in a new issue (or better continue the existing discussion on issue #9377?).

History
Date	User	Action	Args
2016-01-28 09:41:03	vstinner	set	recipients: + vstinner, paul.moore, tim.golden, ezio.melotti, zach.ware, serhiy.storchaka, eryksun, steve.dower, abarry
2016-01-28 09:41:03	vstinner	set	messageid: <1453974063.09.0.258642734771.issue26227@psf.upfronthosting.co.za>
2016-01-28 09:41:03	vstinner	link	issue26227 messages
2016-01-28 09:41:02	vstinner	create