Message 267024 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steve.dower
Recipients	davispuh, eryksun, ezio.melotti, martin.panter, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Date	2016-06-03.02:31:39
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1464921099.76.0.248800066967.issue27179@psf.upfronthosting.co.za>
In-reply-to

Content
> There is right encoding, it's encoding that's actually used. This is true, but it puts the decision entirely in the hands of the developer(s) of the two processes involved. All IPC on Windows uses bytes, and encodings _always_ need to be negotiated by the processes involved. You can't reliably assume or infer anything. The closest you get is to assume that both processes are using the same MSVCRT version and have not changed the defaults (except Python changes the defaults, from text to binary, so that assumption is easily broken). Using "cmd /u" is one way to negotiate with that process for the shell=True case, but all others basically just require an explicit encoding parameter so that it can be specified. IMHO, if we make Python default to UTF-8 and subprocess use utf_8:errors (mojibake is not acceptable by default) and "cmd /u", we cover enough common cases to minimise the need to explicitly specify. (A close second best is to default to the console CP if available and default locale otherwise.)

> There is right encoding, it's encoding that's actually used.

This is true, but it puts the decision entirely in the hands of the developer(s) of the two processes involved.

All IPC on Windows uses bytes, and encodings _always_ need to be negotiated by the processes involved. You can't reliably assume or infer anything. The closest you get is to assume that both processes are using the same MSVCRT version and have not changed the defaults (except Python changes the defaults, from text to binary, so that assumption is easily broken).

Using "cmd /u" is one way to negotiate with that process for the shell=True case, but all others basically just require an explicit encoding parameter so that it can be specified. IMHO, if we make Python default to UTF-8 and subprocess use utf_8:errors (mojibake is not acceptable by default) and "cmd /u", we cover enough common cases to minimise the need to explicitly specify. (A close second best is to default to the console CP if available and default locale otherwise.)

History
Date	User	Action	Args
2016-06-03 02:31:39	steve.dower	set	recipients: + steve.dower, paul.moore, vstinner, tim.golden, ezio.melotti, martin.panter, zach.ware, eryksun, davispuh
2016-06-03 02:31:39	steve.dower	set	messageid: <1464921099.76.0.248800066967.issue27179@psf.upfronthosting.co.za>
2016-06-03 02:31:39	steve.dower	link	issue27179 messages
2016-06-03 02:31:39	steve.dower	create