Issue 23004: mock_open() should allow reading binary data

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/67193

classification

Title:	mock_open() should allow reading binary data
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.6, Python 3.4, Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:	17467	Superseder:
Assigned To:	berker.peksag	Nosy List:	Aaron1011, a.badger, berker.peksag, jcea, michael.foord, python-dev, r.david.murray, rbcollins
Priority:	normal	Keywords:	easy, patch

Created on 2014-12-07 15:24 by jcea, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
mock-open-allow-binary-data.patch	Aaron1011, 2014-12-13 03:07		review
mock-open-allow-binary-data-updated.patch	Aaron1011, 2014-12-13 14:13		review
mock-open-allow-binary-without-coerce.patch	Aaron1011, 2014-12-14 13:45		review
mock-open-allow-binary-without-coerce-fixup.patch	Aaron1011, 2014-12-16 01:06		review
mock-open-allow-binary-data-fix-formatting.patch	Aaron1011, 2014-12-20 02:04		review

Messages (18)
msg232271 - (view)	Author: Jesús Cea Avión (jcea) *	Date: 2014-12-07 15:24
mock_open(read_data=b'...') gives an error: """Traceback (most recent call last): File "z.py", line 6, in <module> print(f.read()) File "/usr/local/lib/python3.4/unittest/mock.py", line 896, in __call__ return _mock_self._mock_call(args, kwargs) File "/usr/local/lib/python3.4/unittest/mock.py", line 962, in _mock_call ret_val = effect(args, **kwargs) File "/usr/local/lib/python3.4/unittest/mock.py", line 2279, in _read_side_effect return ''.join(_data) File "/usr/local/lib/python3.4/unittest/mock.py", line 2244, in _iterate_read_data data_as_list = ['{}\n'.format(l) for l in read_data.split('\n')] """ Easy to reproduce: """ from unittest.mock import mock_open, patch m = mock_open(read_data= b'abc') with patch('__main__.open', m, create=True) : with open('abc', 'rb') as f : print(f.read()) """ Looks like this bug was introduced as result of issue #17467. I add those people to the nosy list.
msg232589 - (view)	Author: Aaron Hill (Aaron1011) *	Date: 2014-12-13 03:07
I've created a patch that fixes this, and added an accompanying unit test (which fails without the change).
msg232591 - (view)	Author: Demian Brecht (demian.brecht) *	Date: 2014-12-13 03:41
Thanks for the patch Aaron. Unfortunately this doesn't quite fix the issue. There are two problems with the patch: If a bytes object is passed into mock_open, I'd expect a bytes object in the output. In your patch, not only is this not the case (the output is a string object), but the bytes object is being coerced into its string representation in the resulting list. For example (simplified from your patch): >>> data = b'foo\nbar' >>> newline = b'\n' >>> >>> ['{}\n'.format(l) for l in data.split(newline)] ["b'foo'\n", "b'bar'\n"] What I would expect to see in this case is: [b'foo\n', b'bar\n']
msg232592 - (view)	Author: Demian Brecht (demian.brecht) *	Date: 2014-12-13 03:42
> There are two problems with the patch That was intended to be removed after I changed the wording :P
msg232611 - (view)	Author: Aaron Hill (Aaron1011) *	Date: 2014-12-13 14:13
I've created a new patch, which addresses the problem. Your example now currently returns [b'foo\n', b'bar\n']
msg232629 - (view)	Author: Demian Brecht (demian.brecht) *	Date: 2014-12-14 04:54
Thanks for the update, but this doesn't quite work either as you're assuming utf-8 (which is what .encode() and .decode() default to). For example, when using latin-1: >>> m = mock_open(read_data= b'\xc6') >>> with patch('__main__.open', m, create=True) : ... with open('abc', 'rb') as f : ... print(f.read()) ... Traceback (most recent call last): [snip] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 0: unexpected end of data Additionally, a bytes object may simply be binary data that doesn't adhere to any specific encoding. My suggestion is to remove the use of format() altogether as it's really not doing anything complex and simply append either '\n' or b'\n' depending on the type of object passed in. That way, you can deal with the type of object passed in directly without coercion.
msg232637 - (view)	Author: Aaron Hill (Aaron1011) *	Date: 2014-12-14 13:45
Thanks, I've fixed that. Not sure why I thought decoding and re-encoding would work with any binary data. I've also updated one of the tests to use non-utf8-decodeable binary data, to prevent a future regression.
msg232645 - (view)	Author: Demian Brecht (demian.brecht) *	Date: 2014-12-15 01:04
Thanks again for the update Aaron, I've left a couple small comments in Rietveld. Other than those, the patch looks good to me. Thanks for the contribution!
msg232693 - (view)	Author: Aaron Hill (Aaron1011) *	Date: 2014-12-16 01:06
I've fixed the issues you pointed out. Is there a better way than uploading a new patch file to make changes?
msg232704 - (view)	Author: Demian Brecht (demian.brecht) *	Date: 2014-12-16 03:10
Thanks for the updated patch, looks good to me. If you haven't already read it, the patch workflow is here: https://docs.python.org/devguide/patch.html and is the only workflow currently available.
msg232954 - (view)	Author: Aaron Hill (Aaron1011) *	Date: 2014-12-20 02:04
I've fixed the formatting issues.
msg233231 - (view)	Author: Demian Brecht (demian.brecht) *	Date: 2014-12-31 09:23
A few more comments were left in Rietveld for you, likely hidden by spam filters.
msg237020 - (view)	Author: Berker Peksag (berker.peksag) *	Date: 2015-03-02 05:46
LGTM.
msg248119 - (view)	Author: Roundup Robot (python-dev)	Date: 2015-08-06 10:17
New changeset 3d7adf5b3fb3 by Berker Peksag in branch '3.4': Issue #23004: mock_open() now reads binary data correctly when the type of read_data is bytes. https://hg.python.org/cpython/rev/3d7adf5b3fb3 New changeset 526a186de32d by Berker Peksag in branch '3.5': Issue #23004: mock_open() now reads binary data correctly when the type of read_data is bytes. https://hg.python.org/cpython/rev/526a186de32d New changeset ed15f399a292 by Berker Peksag in branch 'default': Issue #23004: mock_open() now reads binary data correctly when the type of read_data is bytes. https://hg.python.org/cpython/rev/ed15f399a292
msg248120 - (view)	Author: Berker Peksag (berker.peksag) *	Date: 2015-08-06 10:21
Thanks for the patch, Aaron(also thanks to Demian for reviews). I've fixed the merge conflict and added more tests.
msg248153 - (view)	Author: Robert Collins (rbcollins) *	Date: 2015-08-06 21:57
Post merge review: looks like data_as_list = read_data.splitlines(True) would be a little cleaner.
msg248197 - (view)	Author: Berker Peksag (berker.peksag) *	Date: 2015-08-07 14:52
data_as_list = read_data.splitlines(True) is not actually the equivalent of data_as_list = [l + sep for l in read_data.split(sep)] It will change the behavior of the _iterate_read_data helper. See the comment at https://github.com/python/cpython/blob/78d05eb847c6b8fede08ca74bb59210c00e4c599/Lib/unittest/mock.py#L2278 However, in default branch, we can simplify it to yield from read_data.splitlines(keepends=True)
msg248201 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2015-08-07 15:50
splitlines(keepends=True) is not ever equivalent to splitting by just '\n'. I don't know the details here, but switching to that would certainly be a behavior change. (Especially if the code path also applies to non-binary data!): >>> b'abc\nde\r\nf\x1dg'.splitlines(True) [b'abc\n', b'de\r\n', b'f\x1dg'] >>> 'abc\nde\r\nf\x1dg'.splitlines(True) ['abc\n', 'de\r\n', 'f\x1d', 'g']

History
Date	User	Action	Args
2022-04-11 14:58:10	admin	set	github: 67193
2015-08-07 15:50:42	r.david.murray	set	nosy: + r.david.murray messages: + msg248201
2015-08-07 14:52:39	berker.peksag	set	messages: + msg248197
2015-08-06 21:57:48	rbcollins	set	messages: + msg248153
2015-08-06 10:21:50	berker.peksag	set	status: open -> closed resolution: fixed messages: + msg248120 stage: commit review -> resolved
2015-08-06 10:17:42	python-dev	set	nosy: + python-dev messages: + msg248119
2015-08-06 09:18:50	berker.peksag	set	assignee: berker.peksag
2015-07-15 01:14:07	rbcollins	set	nosy: + rbcollins versions: + Python 3.6
2015-03-02 05:46:01	berker.peksag	set	messages: + msg237020 components: + Library (Lib) stage: patch review -> commit review
2015-02-13 01:26:16	demian.brecht	set	nosy: - demian.brecht
2014-12-31 09:23:41	demian.brecht	set	messages: + msg233231
2014-12-20 02:04:43	Aaron1011	set	files: + mock-open-allow-binary-data-fix-formatting.patch messages: + msg232954
2014-12-16 03:26:34	berker.peksag	set	stage: needs patch -> patch review
2014-12-16 03:10:32	demian.brecht	set	messages: + msg232704
2014-12-16 01:06:12	Aaron1011	set	files: + mock-open-allow-binary-without-coerce-fixup.patch messages: + msg232693
2014-12-15 01:04:58	demian.brecht	set	messages: + msg232645
2014-12-14 13:45:04	Aaron1011	set	files: + mock-open-allow-binary-without-coerce.patch messages: + msg232637
2014-12-14 04:54:56	demian.brecht	set	messages: + msg232629
2014-12-13 14:13:42	Aaron1011	set	files: + mock-open-allow-binary-data-updated.patch messages: + msg232611
2014-12-13 03:42:30	demian.brecht	set	messages: + msg232592
2014-12-13 03:41:18	demian.brecht	set	nosy: + demian.brecht messages: + msg232591
2014-12-13 03:07:50	Aaron1011	set	files: + mock-open-allow-binary-data.patch nosy: + Aaron1011 messages: + msg232589 keywords: + patch
2014-12-07 16:09:22	berker.peksag	set	nosy: + berker.peksag stage: needs patch type: behavior versions: + Python 3.4
2014-12-07 15:25:59	jcea	set	dependencies: + Enhancement: give mock_open readline() and readlines() methods
2014-12-07 15:25:42	jcea	link	issue17467 superseder
2014-12-07 15:24:17	jcea	create