mock_open() should allow reading binary data #67193

jcea · 2014-12-07T15:24:17Z

BPO	23004
Nosy	@jcea, @rbtcollins, @abadger, @bitdancer, @voidspace, @berkerpeksag
Dependencies	bpo-17467: Enhancement: give mock_open readline() and readlines() methods
Files	mock-open-allow-binary-data.patch mock-open-allow-binary-data-updated.patch mock-open-allow-binary-without-coerce.patch mock-open-allow-binary-without-coerce-fixup.patch mock-open-allow-binary-data-fix-formatting.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/berkerpeksag'
closed_at = <Date 2015-08-06.10:21:50.534>
created_at = <Date 2014-12-07.15:24:17.185>
labels = ['easy', 'type-bug', 'library']
title = 'mock_open() should allow reading binary data'
updated_at = <Date 2015-08-07.15:50:42.900>
user = 'https://github.com/jcea'

bugs.python.org fields:

activity = <Date 2015-08-07.15:50:42.900>
actor = 'r.david.murray'
assignee = 'berker.peksag'
closed = True
closed_date = <Date 2015-08-06.10:21:50.534>
closer = 'berker.peksag'
components = ['Library (Lib)']
creation = <Date 2014-12-07.15:24:17.185>
creator = 'jcea'
dependencies = ['17467']
files = ['37435', '37440', '37446', '37458', '37510']
hgrepos = []
issue_num = 23004
keywords = ['patch', 'easy']
message_count = 18.0
messages = ['232271', '232589', '232591', '232592', '232611', '232629', '232637', '232645', '232693', '232704', '232954', '233231', '237020', '248119', '248120', '248153', '248197', '248201']
nosy_count = 8.0
nosy_names = ['jcea', 'rbcollins', 'a.badger', 'r.david.murray', 'michael.foord', 'python-dev', 'berker.peksag', 'Aaron1011']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue23004'
versions = ['Python 3.4', 'Python 3.5', 'Python 3.6']

jcea · 2014-12-07T15:24:17Z

mock_open(read_data=b'...') gives an error:

"""Traceback (most recent call last):
  File "z.py", line 6, in <module>
    print(f.read())
  File "/usr/local/lib/python3.4/unittest/mock.py", line 896, in __call__
    return _mock_self._mock_call(*args, **kwargs)
  File "/usr/local/lib/python3.4/unittest/mock.py", line 962, in _mock_call
    ret_val = effect(*args, **kwargs)
  File "/usr/local/lib/python3.4/unittest/mock.py", line 2279, in _read_side_effect
    return ''.join(_data)
  File "/usr/local/lib/python3.4/unittest/mock.py", line 2244, in _iterate_read_data
    data_as_list = ['{}\n'.format(l) for l in read_data.split('\n')]
"""

Easy to reproduce:

"""
from unittest.mock import mock_open, patch

m = mock_open(read_data= b'abc')
with patch('__main__.open', m, create=True) :
    with open('abc', 'rb') as f :
        print(f.read())
"""

Looks like this bug was introduced as result of issue bpo-17467. I add those people to the nosy list.

Aaron1011 · 2014-12-13T03:07:50Z

I've created a patch that fixes this, and added an accompanying unit test (which fails without the change).

demianbrecht · 2014-12-13T03:41:18Z

Thanks for the patch Aaron. Unfortunately this doesn't quite fix the issue. There are two problems with the patch:

If a bytes object is passed into mock_open, I'd expect a bytes object in the output. In your patch, not only is this not the case (the output is a string object), but the bytes object is being coerced into its string representation in the resulting list. For example (simplified from your patch):

>>> data = b'foo\nbar'
>>> newline = b'\n'
>>>
>>> ['{}\n'.format(l) for l in data.split(newline)]
["b'foo'\n", "b'bar'\n"]

What I would expect to see in this case is:

[b'foo\n', b'bar\n']

demianbrecht · 2014-12-13T03:42:30Z

There are two problems with the patch

That was intended to be removed after I changed the wording :P

Aaron1011 · 2014-12-13T14:13:42Z

I've created a new patch, which addresses the problem. Your example now currently returns [b'foo\n', b'bar\n']

demianbrecht · 2014-12-14T04:54:55Z

Thanks for the update, but this doesn't quite work either as you're assuming utf-8 (which is what .encode() and .decode() default to). For example, when using latin-1:

>>> m = mock_open(read_data= b'\xc6')
>>> with patch('__main__.open', m, create=True) :
...     with open('abc', 'rb') as f :
...         print(f.read())
...
Traceback (most recent call last):
  [snip]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 0: unexpected end of data

Additionally, a bytes object may simply be binary data that doesn't adhere to any specific encoding.

My suggestion is to remove the use of format() altogether as it's really not doing anything complex and simply append either '\n' or b'\n' depending on the type of object passed in. That way, you can deal with the type of object passed in directly without coercion.

Aaron1011 · 2014-12-14T13:45:05Z

Thanks, I've fixed that. Not sure why I thought decoding and re-encoding would work with any binary data.

I've also updated one of the tests to use non-utf8-decodeable binary data, to prevent a future regression.

demianbrecht · 2014-12-15T01:04:58Z

Thanks again for the update Aaron, I've left a couple small comments in Rietveld. Other than those, the patch looks good to me. Thanks for the contribution!

Aaron1011 · 2014-12-16T01:06:12Z

I've fixed the issues you pointed out.

Is there a better way than uploading a new patch file to make changes?

demianbrecht · 2014-12-16T03:10:33Z

Thanks for the updated patch, looks good to me. If you haven't already read it, the patch workflow is here: https://docs.python.org/devguide/patch.html and is the only workflow currently available.

Aaron1011 · 2014-12-20T02:04:40Z

I've fixed the formatting issues.

demianbrecht · 2014-12-31T09:23:42Z

A few more comments were left in Rietveld for you, likely hidden by spam filters.

berkerpeksag · 2015-03-02T05:46:01Z

LGTM.

python-dev · 2015-08-06T10:17:42Z

New changeset 3d7adf5b3fb3 by Berker Peksag in branch '3.4':
Issue bpo-23004: mock_open() now reads binary data correctly when the type of read_data is bytes.
https://hg.python.org/cpython/rev/3d7adf5b3fb3

New changeset 526a186de32d by Berker Peksag in branch '3.5':
Issue bpo-23004: mock_open() now reads binary data correctly when the type of read_data is bytes.
https://hg.python.org/cpython/rev/526a186de32d

New changeset ed15f399a292 by Berker Peksag in branch 'default':
Issue bpo-23004: mock_open() now reads binary data correctly when the type of read_data is bytes.
https://hg.python.org/cpython/rev/ed15f399a292

berkerpeksag · 2015-08-06T10:21:50Z

Thanks for the patch, Aaron(also thanks to Demian for reviews). I've fixed the merge conflict and added more tests.

rbtcollins · 2015-08-06T21:57:48Z

Post merge review:

looks like

data_as_list = read_data.splitlines(True)

would be a little cleaner.

berkerpeksag · 2015-08-07T14:52:39Z

data_as_list = read_data.splitlines(True)

is not actually the equivalent of

    data_as_list = [l + sep for l in read_data.split(sep)]

It will change the behavior of the _iterate_read_data helper. See the comment at

cpython/Lib/unittest/mock.py

Line 2278 in 78d05eb

# If the last line ended in a newline, the list comprehension will have an

However, in default branch, we can simplify it to

yield from read_data.splitlines(keepends=True)

bitdancer · 2015-08-07T15:50:42Z

splitlines(keepends=True) is not ever equivalent to splitting by just '\n'. I don't know the details here, but switching to that would certainly be a behavior change. (Especially if the code path also applies to non-binary data!):

>>> b'abc\nde\r\nf\x1dg'.splitlines(True)
[b'abc\n', b'de\r\n', b'f\x1dg']
>>> 'abc\nde\r\nf\x1dg'.splitlines(True)
['abc\n', 'de\r\n', 'f\x1d', 'g']

jcea added the easy label Dec 7, 2014

berkerpeksag added the type-bug An unexpected behavior, bug, or error label Dec 7, 2014

berkerpeksag added the stdlib Python modules in the Lib dir label Mar 2, 2015

berkerpeksag self-assigned this Aug 6, 2015

berkerpeksag closed this as completed Aug 6, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mock_open() should allow reading binary data #67193

mock_open() should allow reading binary data #67193

jcea commented Dec 7, 2014

jcea commented Dec 7, 2014

Aaron1011 mannequin commented Dec 13, 2014

demianbrecht mannequin commented Dec 13, 2014

demianbrecht mannequin commented Dec 13, 2014

Aaron1011 mannequin commented Dec 13, 2014

demianbrecht mannequin commented Dec 14, 2014

Aaron1011 mannequin commented Dec 14, 2014

demianbrecht mannequin commented Dec 15, 2014

Aaron1011 mannequin commented Dec 16, 2014

demianbrecht mannequin commented Dec 16, 2014

Aaron1011 mannequin commented Dec 20, 2014

demianbrecht mannequin commented Dec 31, 2014

berkerpeksag commented Mar 2, 2015

python-dev mannequin commented Aug 6, 2015

berkerpeksag commented Aug 6, 2015

rbtcollins commented Aug 6, 2015

berkerpeksag commented Aug 7, 2015

bitdancer commented Aug 7, 2015

mock_open() should allow reading binary data #67193

mock_open() should allow reading binary data #67193

Comments

jcea commented Dec 7, 2014

jcea commented Dec 7, 2014

Aaron1011 mannequin commented Dec 13, 2014

demianbrecht mannequin commented Dec 13, 2014

demianbrecht mannequin commented Dec 13, 2014

Aaron1011 mannequin commented Dec 13, 2014

demianbrecht mannequin commented Dec 14, 2014

Aaron1011 mannequin commented Dec 14, 2014

demianbrecht mannequin commented Dec 15, 2014

Aaron1011 mannequin commented Dec 16, 2014

demianbrecht mannequin commented Dec 16, 2014

Aaron1011 mannequin commented Dec 20, 2014

demianbrecht mannequin commented Dec 31, 2014

berkerpeksag commented Mar 2, 2015

python-dev mannequin commented Aug 6, 2015

berkerpeksag commented Aug 6, 2015

rbtcollins commented Aug 6, 2015

berkerpeksag commented Aug 7, 2015

bitdancer commented Aug 7, 2015