Message 304118 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	andrewclegg
Recipients	Arfrever, amaury.forgeotdarc, andrewclegg, berwin22, chris.jerdonek, davispuh, eric.araujo, eryksun, mark, martin.panter, mightyiam, ncoghlan, pitrou, python-dev, r.david.murray, segfaulthunter, srid, steve.dower, vstinner
Date	2017-10-11.08:41:48
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1507711308.79.0.213398074469.issue6135@psf.upfronthosting.co.za>
In-reply-to

Content
I meant the former; I'll look a bit more at the documentation and submit an issue/patch. As regards the 'text' flag - universal_newlines is actually exactly that already. I've just checked the code of subprocess.py and the universal_newlines argument is read in only two places: * as one of the deciding factors in whether to use the text mode * in a backwards compatibility clause in check_output So subprocess already has a text mode and a 'magic option' to trigger it. It works well, and is useful in most cases. When the default encoding guess is incorrect, it can easily be corrected by supplying the correct encoding. This is a good situation! What is not so good is the API. I'm teaching a Python course for scientists at the moment. Retrieving text from external processes is an extremely common use case. I would rather not teach them to just have to use 'encoding=utf-8', because arguably teaching a user to supply an encoding without knowing if it's correct is worse than the system guessing. Equally, teaching 'universal_newlines=True' is a bit obscure. The way forward seems to be: * Add a text=True/False argument that is essentially the same as universal_newlines, to improve the API. * Clearly document that this implies that the encoding will be guessed, and that an explicit encoding can be given if the guess is wrong * Optionally, and I have no strong feelings either way on this, remove/deprecate the universal_newlines argument

I meant the former; I'll look a bit more at the documentation and submit an issue/patch.

As regards the 'text' flag - universal_newlines is actually exactly that already. I've just checked the code of subprocess.py and the universal_newlines argument is read in only two places:
* as one of the deciding factors in whether to use the text mode
* in a backwards compatibility clause in check_output

So subprocess *already* has a text mode and a 'magic option' to trigger it. It works well, and is useful in most cases. When the default encoding guess is incorrect, it can easily be corrected by supplying the correct encoding. This is a good situation!

What is not so good is the API. I'm teaching a Python course for scientists at the moment. Retrieving text from external processes is an extremely common use case. I would rather not teach them to just have to use 'encoding=utf-8', because arguably teaching a user to supply an encoding without knowing if it's correct is worse than the system guessing. Equally, teaching 'universal_newlines=True' is a bit obscure.

The way forward seems to be:
* Add a text=True/False argument that is essentially the same as universal_newlines, to improve the API.
* Clearly document that this implies that the encoding will be guessed, and that an explicit encoding can be given if the guess is wrong
* Optionally, and I have no strong feelings either way on this, remove/deprecate the universal_newlines argument

History
Date	User	Action	Args
2017-10-11 08:41:48	andrewclegg	set	recipients: + andrewclegg, amaury.forgeotdarc, ncoghlan, pitrou, vstinner, mark, eric.araujo, segfaulthunter, Arfrever, r.david.murray, srid, mightyiam, chris.jerdonek, python-dev, martin.panter, eryksun, steve.dower, berwin22, davispuh
2017-10-11 08:41:48	andrewclegg	set	messageid: <1507711308.79.0.213398074469.issue6135@psf.upfronthosting.co.za>
2017-10-11 08:41:48	andrewclegg	link	issue6135 messages
2017-10-11 08:41:48	andrewclegg	create