Message 324225 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Michael.Felt, michael-o, terry.reedy, vstinner
Date	2018-08-28.08:59:06
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1535446746.19.0.56676864532.issue34403@psf.upfronthosting.co.za>
In-reply-to

Content
... > byte 0xA7 decoded to Unicode character U+00A7 ... Well, it confirms what I expected: nl_langinfo(CODESET) announces "roman8", but mbstowcs() uses Latin1 encoding in practice. So I wrote the PR 8969 which forces the ASCII encoding in that case. I'm not sure how test_utf8_mode is supposed to be fixed in that case. Michael: you can try to apply PR 8969, and then apply manually PR 8967 patch: https://patch-diff.githubusercontent.com/raw/python/cpython/pull/8967.patch But I expect that with both patches, test_utf8_mode will still fail on test_cmd_line(). You can try to modify test_cmd_line() to force encoding to "ascii". What are the values of sys.getfilesystemencoding() and locale.getpreferredencoding() with the C locale with PR 8969? I expect "roman8" which can cause issue in os.fsencode()/os.fsdecode(). Maybe Python should also force ASCII here?

...
> byte 0xA7 decoded to Unicode character U+00A7
...

Well, it confirms what I expected: nl_langinfo(CODESET) announces "roman8", but mbstowcs() uses Latin1 encoding in practice.

So I wrote the PR 8969 which forces the ASCII encoding in that case. I'm not sure how test_utf8_mode is supposed to be fixed in that case.

Michael: you can try to apply PR 8969, and then apply manually PR 8967 patch:
https://patch-diff.githubusercontent.com/raw/python/cpython/pull/8967.patch

But I expect that with both patches, test_utf8_mode will still fail on test_cmd_line(). You can try to modify test_cmd_line() to force encoding to "ascii".

What are the values of sys.getfilesystemencoding() and locale.getpreferredencoding() with the C locale with PR 8969? I expect "roman8" which can cause issue in os.fsencode()/os.fsdecode(). Maybe Python should also force ASCII here?

History
Date	User	Action	Args
2018-08-28 08:59:06	vstinner	set	recipients: + vstinner, terry.reedy, Michael.Felt, michael-o
2018-08-28 08:59:06	vstinner	set	messageid: <1535446746.19.0.56676864532.issue34403@psf.upfronthosting.co.za>
2018-08-28 08:59:06	vstinner	link	issue34403 messages
2018-08-28 08:59:06	vstinner	create