classification
Title: Tests unwillingly writing unicocde to raw streams
Type: Stage:
Components: IO Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, pakal, pitrou, ysj.ray
Priority: normal Keywords: patch

Created on 2010-05-19 12:17 by pakal, last changed 2010-05-25 11:27 by amaury.forgeotdarc.

Files
File name Uploaded Description Edit
test_fileio_errclosedonwrite.patch pakal, 2010-05-19 12:17 review
issue_8765.diff ysj.ray, 2010-05-22 08:51 review
Messages (11)
msg106053 - (view) Author: Pascal Chambon (pakal) Date: 2010-05-19 12:17
In test_fileio, one of the tests wants to ensure writing to closed raw streams fails, but it actually tries to write an unicode string, which should rather lead to an immediate TypeError.

Here is a tiny patch to prevent the "double error cause" danger - this test is bugging me because my own I/O library cant pass the stdlib io tests in this case.

The initial problem here is that we can't write unicode to a buffered binary stream (TypeError), but we can do it with an unbufferred raw stream - as the C implementation of the latter does string coercion instead of raising TypeError.
Shouldn't we unify the behaviour of binary streams in such cases ?
msg106162 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-20 16:15
> The initial problem here is that we can't write unicode to a buffered 
> binary stream (TypeError), but we can do it with an unbufferred raw 
> stream - as the C implementation of the latter does string coercion 
> instead of raising TypeError.
> Shouldn't we unify the behaviour of binary streams in such cases ?

Yes, we certainly should. This is probably an oversight or a bug.
msg106174 - (view) Author: Pascal Chambon (pakal) Date: 2010-05-20 17:33
Allright, what's the expected behaviour then - implicitly converting unicode to bytes (like C RawFileIO), or raising a typeerror (like buffered streams do) ?
msg106176 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-20 17:54
> Allright, what's the expected behaviour then - implicitly converting
> unicode to bytes (like C RawFileIO), or raising a typeerror (like
> buffered streams do) ?

Sorry, I should have been clearer. The expected behaviour is to raise a
TypeError. The new io module was written for Python 3, where you
shouldn't mix bytes and unicode strings and expect things to work.
msg106215 - (view) Author: Pascal Chambon (pakal) Date: 2010-05-21 08:15
This would require patching separately py2k and py3k visibly...
I'll have a look at it when I have time.
msg106219 - (view) Author: ysj.ray (ysj.ray) Date: 2010-05-21 09:08
pakal wrote:
"""
In test_fileio, one of the tests wants to ensure writing to closed raw streams fails, but it actually tries to write an unicode string
"""

I don't understand. Isn't b'xxx' and 'xxx' the same in py2.x? They are not unicode string, but bytes string.
msg106226 - (view) Author: Pascal Chambon (pakal) Date: 2010-05-21 10:59
yes, but the same tests are used for py3k as well, where "xxx" is interpreted as unicode (2to3 tools dont try to guess if a py2k string intended to be a byte string or an unicode one).
msg106227 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-21 11:02
test_fileio and test_io both use "from __future__ import unicode_literals", which means classical string literals construct unicode strings rather than byte strings.
So, yes, Pascal is right, this should be corrected (both the tests, and the implementation so that it refuses unicode arguments).
msg106289 - (view) Author: ysj.ray (ysj.ray) Date: 2010-05-22 08:51
Yes, I saw that, thanks for explanation!

So I work a patch against the trunk, including modification of fileio_write(), bufferedwriter_write() and test_fileio.py.
msg106295 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-22 11:22
Amaury, do you remember if we made this deliberately?
msg106428 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-05-25 11:27
The 3.1 version does it correctly since issue7785, but this was not backported to 2.x.
Python 3.x uses the "y*" format code to accept bytes and not unicode; this code does not exist in 2.x, and was replaced with "s*", which accepts unicode.
But since the io module is designed up front to forbid default conversion between bytes and unicode, I think it's safe to change the code as suggested.
History
Date User Action Args
2010-05-25 11:27:45amaury.forgeotdarcsetmessages: + msg106428
2010-05-22 11:22:21pitrousetnosy: + amaury.forgeotdarc
messages: + msg106295
2010-05-22 08:51:16ysj.raysetfiles: + issue_8765.diff

messages: + msg106289
2010-05-21 11:02:45pitrousetmessages: + msg106227
2010-05-21 10:59:19pakalsetmessages: + msg106226
2010-05-21 09:08:10ysj.raysetnosy: + ysj.ray
messages: + msg106219
2010-05-21 08:15:34pakalsetmessages: + msg106215
2010-05-20 17:54:00pitrousetmessages: + msg106176
2010-05-20 17:33:36pakalsetmessages: + msg106174
2010-05-20 16:15:43pitrousetmessages: + msg106162
2010-05-19 12:28:51r.david.murraysetnosy: + pitrou
2010-05-19 12:17:54pakalcreate