This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients abarry, eryksun, ezio.melotti, paul.moore, python-dev, serhiy.storchaka, steve.dower, tim.golden, vstinner, williamdias, zach.ware, Владимир Мартьянов
Date 2020-03-13.17:31:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1584120664.43.0.733522038798.issue26227@roundup.psfhosted.org>
In-reply-to
Content
sock_decode_hostname() of socketmodule.c currently uses PyUnicode_DecodeFSDefault() on Windows. PyUnicode_DecodeFSDefault() uses UTF-8 by default (PEP 529).

I understand that the ANSI code page should be used instead of UTF-8.

Would it work to use PyUnicode_DecodeLocale(name, "surrogatepass")? It's implemented with mbstowcs(), but I don't recall which encoding it uses on Windows.

Or can we use PyUnicode_DecodeMBCS(name, strlen(name), "surrogatepass")?

--

I understand that setting PYTHONLEGACYWINDOWSFSENCODING environment variable to 1 should work around the issue.
History
Date User Action Args
2020-03-13 17:31:04vstinnersetrecipients: + vstinner, paul.moore, tim.golden, ezio.melotti, python-dev, zach.ware, serhiy.storchaka, eryksun, steve.dower, abarry, williamdias, Владимир Мартьянов
2020-03-13 17:31:04vstinnersetmessageid: <1584120664.43.0.733522038798.issue26227@roundup.psfhosted.org>
2020-03-13 17:31:04vstinnerlinkissue26227 messages
2020-03-13 17:31:04vstinnercreate