Issue 8765: Tests unwillingly writing unicocde to raw streams

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53011

classification

Title:	Tests unwillingly writing unicocde to raw streams
Type:	behavior	Stage:	resolved
Components:	IO	Versions:	Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	amaury.forgeotdarc, benjamin.peterson, pakal, pitrou, serhiy.storchaka, ysj.ray
Priority:	normal	Keywords:	patch

Created on 2010-05-19 12:17 by pakal, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test_fileio_errclosedonwrite.patch	pakal, 2010-05-19 12:17
issue_8765.diff	ysj.ray, 2010-05-22 08:51

Pull Requests
URL	Status	Linked	Edit
PR 11127	merged	serhiy.storchaka, 2018-12-12 08:03

Messages (13)
msg106053 - (view)	Author: Pascal Chambon (pakal) *	Date: 2010-05-19 12:17
In test_fileio, one of the tests wants to ensure writing to closed raw streams fails, but it actually tries to write an unicode string, which should rather lead to an immediate TypeError. Here is a tiny patch to prevent the "double error cause" danger - this test is bugging me because my own I/O library cant pass the stdlib io tests in this case. The initial problem here is that we can't write unicode to a buffered binary stream (TypeError), but we can do it with an unbufferred raw stream - as the C implementation of the latter does string coercion instead of raising TypeError. Shouldn't we unify the behaviour of binary streams in such cases ?
msg106162 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-05-20 16:15
> The initial problem here is that we can't write unicode to a buffered > binary stream (TypeError), but we can do it with an unbufferred raw > stream - as the C implementation of the latter does string coercion > instead of raising TypeError. > Shouldn't we unify the behaviour of binary streams in such cases ? Yes, we certainly should. This is probably an oversight or a bug.
msg106174 - (view)	Author: Pascal Chambon (pakal) *	Date: 2010-05-20 17:33
Allright, what's the expected behaviour then - implicitly converting unicode to bytes (like C RawFileIO), or raising a typeerror (like buffered streams do) ?
msg106176 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-05-20 17:54
> Allright, what's the expected behaviour then - implicitly converting > unicode to bytes (like C RawFileIO), or raising a typeerror (like > buffered streams do) ? Sorry, I should have been clearer. The expected behaviour is to raise a TypeError. The new io module was written for Python 3, where you shouldn't mix bytes and unicode strings and expect things to work.
msg106215 - (view)	Author: Pascal Chambon (pakal) *	Date: 2010-05-21 08:15
This would require patching separately py2k and py3k visibly... I'll have a look at it when I have time.
msg106219 - (view)	Author: ysj.ray (ysj.ray)	Date: 2010-05-21 09:08
pakal wrote: """ In test_fileio, one of the tests wants to ensure writing to closed raw streams fails, but it actually tries to write an unicode string """ I don't understand. Isn't b'xxx' and 'xxx' the same in py2.x? They are not unicode string, but bytes string.
msg106226 - (view)	Author: Pascal Chambon (pakal) *	Date: 2010-05-21 10:59
yes, but the same tests are used for py3k as well, where "xxx" is interpreted as unicode (2to3 tools dont try to guess if a py2k string intended to be a byte string or an unicode one).
msg106227 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-05-21 11:02
test_fileio and test_io both use "from __future__ import unicode_literals", which means classical string literals construct unicode strings rather than byte strings. So, yes, Pascal is right, this should be corrected (both the tests, and the implementation so that it refuses unicode arguments).
msg106289 - (view)	Author: ysj.ray (ysj.ray)	Date: 2010-05-22 08:51
Yes, I saw that, thanks for explanation! So I work a patch against the trunk, including modification of fileio_write(), bufferedwriter_write() and test_fileio.py.
msg106295 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-05-22 11:22
Amaury, do you remember if we made this deliberately?
msg106428 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-05-25 11:27
The 3.1 version does it correctly since issue7785, but this was not backported to 2.x. Python 3.x uses the "y" format code to accept bytes and not unicode; this code does not exist in 2.x, and was replaced with "s", which accepts unicode. But since the io module is designed up front to forbid default conversion between bytes and unicode, I think it's safe to change the code as suggested.
msg331678 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-12-12 08:11
I agree, that it would be right to accept only binary strings when write to binary stream. But I afraid that it is too late to change this in the 16th bugfix of 2.7. This can break existing code or tests. I suggest to change the behavior only in the Py3k compatibility mode. PR 11127 is based on issue_8765.diff, but emits a warning when run Python with the -3 option. $ ./python -3 -c "import io; io.FileIO('/dev/null', 'w').write(u'')" -c:1: DeprecationWarning: write() argument must be string or buffer, not 'unicode' $ ./python -3 -We -c "import io; io.FileIO('/dev/null', 'w').write(u'')" Traceback (most recent call last): File "<string>", line 1, in <module> DeprecationWarning: write() argument must be string or buffer, not 'unicode' This will help to migrate to Python 3, but keeps the behavior unchanged in normal run.
msg333696 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-01-15 12:34
New changeset 1462234baf7398a6b00c0f51905e26caa17d3c60 by Serhiy Storchaka in branch '2.7': [2.7] bpo-8765: Deprecate writing unicode to binary streams in Py3k mode. (GH-11127) https://github.com/python/cpython/commit/1462234baf7398a6b00c0f51905e26caa17d3c60

History
Date	User	Action	Args
2022-04-11 14:57:01	admin	set	github: 53011
2019-01-15 12:35:38	serhiy.storchaka	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2019-01-15 12:34:51	serhiy.storchaka	set	messages: + msg333696
2019-01-12 07:59:51	serhiy.storchaka	set	nosy: + benjamin.peterson
2018-12-12 08:11:49	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg331678
2018-12-12 08:03:36	serhiy.storchaka	set	pull_requests: + pull_request10357
2013-02-02 21:21:54	ezio.melotti	set	type: behavior stage: patch review
2010-05-25 11:27:45	amaury.forgeotdarc	set	messages: + msg106428
2010-05-22 11:22:21	pitrou	set	nosy: + amaury.forgeotdarc messages: + msg106295
2010-05-22 08:51:16	ysj.ray	set	files: + issue_8765.diff messages: + msg106289
2010-05-21 11:02:45	pitrou	set	messages: + msg106227
2010-05-21 10:59:19	pakal	set	messages: + msg106226
2010-05-21 09:08:10	ysj.ray	set	nosy: + ysj.ray messages: + msg106219
2010-05-21 08:15:34	pakal	set	messages: + msg106215
2010-05-20 17:54:00	pitrou	set	messages: + msg106176
2010-05-20 17:33:36	pakal	set	messages: + msg106174
2010-05-20 16:15:43	pitrou	set	messages: + msg106162
2010-05-19 12:28:51	r.david.murray	set	nosy: + pitrou
2010-05-19 12:17:54	pakal	create