This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Tests unwillingly writing unicocde to raw streams
Type: behavior Stage: resolved
Components: IO Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, benjamin.peterson, pakal, pitrou, serhiy.storchaka, ysj.ray
Priority: normal Keywords: patch

Created on 2010-05-19 12:17 by pakal, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_fileio_errclosedonwrite.patch pakal, 2010-05-19 12:17
issue_8765.diff ysj.ray, 2010-05-22 08:51
Pull Requests
URL Status Linked Edit
PR 11127 merged serhiy.storchaka, 2018-12-12 08:03
Messages (13)
msg106053 - (view) Author: Pascal Chambon (pakal) * Date: 2010-05-19 12:17
In test_fileio, one of the tests wants to ensure writing to closed raw streams fails, but it actually tries to write an unicode string, which should rather lead to an immediate TypeError.

Here is a tiny patch to prevent the "double error cause" danger - this test is bugging me because my own I/O library cant pass the stdlib io tests in this case.

The initial problem here is that we can't write unicode to a buffered binary stream (TypeError), but we can do it with an unbufferred raw stream - as the C implementation of the latter does string coercion instead of raising TypeError.
Shouldn't we unify the behaviour of binary streams in such cases ?
msg106162 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-20 16:15
> The initial problem here is that we can't write unicode to a buffered 
> binary stream (TypeError), but we can do it with an unbufferred raw 
> stream - as the C implementation of the latter does string coercion 
> instead of raising TypeError.
> Shouldn't we unify the behaviour of binary streams in such cases ?

Yes, we certainly should. This is probably an oversight or a bug.
msg106174 - (view) Author: Pascal Chambon (pakal) * Date: 2010-05-20 17:33
Allright, what's the expected behaviour then - implicitly converting unicode to bytes (like C RawFileIO), or raising a typeerror (like buffered streams do) ?
msg106176 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-20 17:54
> Allright, what's the expected behaviour then - implicitly converting
> unicode to bytes (like C RawFileIO), or raising a typeerror (like
> buffered streams do) ?

Sorry, I should have been clearer. The expected behaviour is to raise a
TypeError. The new io module was written for Python 3, where you
shouldn't mix bytes and unicode strings and expect things to work.
msg106215 - (view) Author: Pascal Chambon (pakal) * Date: 2010-05-21 08:15
This would require patching separately py2k and py3k visibly...
I'll have a look at it when I have time.
msg106219 - (view) Author: ysj.ray (ysj.ray) Date: 2010-05-21 09:08
pakal wrote:
"""
In test_fileio, one of the tests wants to ensure writing to closed raw streams fails, but it actually tries to write an unicode string
"""

I don't understand. Isn't b'xxx' and 'xxx' the same in py2.x? They are not unicode string, but bytes string.
msg106226 - (view) Author: Pascal Chambon (pakal) * Date: 2010-05-21 10:59
yes, but the same tests are used for py3k as well, where "xxx" is interpreted as unicode (2to3 tools dont try to guess if a py2k string intended to be a byte string or an unicode one).
msg106227 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-21 11:02
test_fileio and test_io both use "from __future__ import unicode_literals", which means classical string literals construct unicode strings rather than byte strings.
So, yes, Pascal is right, this should be corrected (both the tests, and the implementation so that it refuses unicode arguments).
msg106289 - (view) Author: ysj.ray (ysj.ray) Date: 2010-05-22 08:51
Yes, I saw that, thanks for explanation!

So I work a patch against the trunk, including modification of fileio_write(), bufferedwriter_write() and test_fileio.py.
msg106295 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-22 11:22
Amaury, do you remember if we made this deliberately?
msg106428 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-05-25 11:27
The 3.1 version does it correctly since issue7785, but this was not backported to 2.x.
Python 3.x uses the "y*" format code to accept bytes and not unicode; this code does not exist in 2.x, and was replaced with "s*", which accepts unicode.
But since the io module is designed up front to forbid default conversion between bytes and unicode, I think it's safe to change the code as suggested.
msg331678 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-12 08:11
I agree, that it would be right to accept only binary strings when write to binary stream. But I afraid that it is too late to change this in the 16th bugfix of 2.7. This can break existing code or tests.

I suggest to change the behavior only in the Py3k compatibility mode. PR 11127 is based on issue_8765.diff, but emits a warning when run Python with the -3 option.

$ ./python -3 -c "import io; io.FileIO('/dev/null', 'w').write(u'')"
-c:1: DeprecationWarning: write() argument must be string or buffer, not 'unicode'
$ ./python -3 -We -c "import io; io.FileIO('/dev/null', 'w').write(u'')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
DeprecationWarning: write() argument must be string or buffer, not 'unicode'

This will help to migrate to Python 3, but keeps the behavior unchanged in normal run.
msg333696 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-01-15 12:34
New changeset 1462234baf7398a6b00c0f51905e26caa17d3c60 by Serhiy Storchaka in branch '2.7':
[2.7] bpo-8765: Deprecate writing unicode to binary streams in Py3k mode. (GH-11127)
https://github.com/python/cpython/commit/1462234baf7398a6b00c0f51905e26caa17d3c60
History
Date User Action Args
2022-04-11 14:57:01adminsetgithub: 53011
2019-01-15 12:35:38serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-01-15 12:34:51serhiy.storchakasetmessages: + msg333696
2019-01-12 07:59:51serhiy.storchakasetnosy: + benjamin.peterson
2018-12-12 08:11:49serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg331678
2018-12-12 08:03:36serhiy.storchakasetpull_requests: + pull_request10357
2013-02-02 21:21:54ezio.melottisettype: behavior
stage: patch review
2010-05-25 11:27:45amaury.forgeotdarcsetmessages: + msg106428
2010-05-22 11:22:21pitrousetnosy: + amaury.forgeotdarc
messages: + msg106295
2010-05-22 08:51:16ysj.raysetfiles: + issue_8765.diff

messages: + msg106289
2010-05-21 11:02:45pitrousetmessages: + msg106227
2010-05-21 10:59:19pakalsetmessages: + msg106226
2010-05-21 09:08:10ysj.raysetnosy: + ysj.ray
messages: + msg106219
2010-05-21 08:15:34pakalsetmessages: + msg106215
2010-05-20 17:54:00pitrousetmessages: + msg106176
2010-05-20 17:33:36pakalsetmessages: + msg106174
2010-05-20 16:15:43pitrousetmessages: + msg106162
2010-05-19 12:28:51r.david.murraysetnosy: + pitrou
2010-05-19 12:17:54pakalcreate