New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TextIOWrapper does not support reading bytearrays or memoryviews #65256
Comments
In Python 3.4, TextIOWrapper can not read from streams that return bytearrays or memoryviews: from io import TextIOWrapper, BytesIO
class MyByteStream(BytesIO):
def read1(self, len_):
return memoryview(super().read(len_))
bs = MyByteStream(b'some data in ascii\n')
ss = TextIOWrapper(bs, encoding='ascii')
print(ss.read(200)) results in: TypeError: underlying read1() should have returned a bytes object, not 'memoryview' I don't think there's any good reason for TextIOWrapper not accepting any bytes-like object (especially now that b''.join finally accepts bytes-like objects as well). |
If someone is willing to review, I'd be happy to write a patch for this. |
I'm interested to review a patch, but I'm not sure that read1() can return a type different than bytes. You can use the Py_buffer API for your patch. |
read1() should return bytes. MyByteStream doesn't implement the io.BufferedIOBase interface. |
On 03/25/2014 01:39 PM, Serhiy Storchaka wrote:
Indeed, this is what this issue is about :-). The question is: is there a good reason to require io.BufferedIOBase I'd also argue that the current documentation of io.BufferedIOBase So if it turns out that there is a good reason for this requirement, the Best, --
|
bytes objects have two things going for them:
This is especially handy when writing C code. |
Yes, bytes objects have some advantages. But if a bytes object is desired, it can always be created from bytes-like object. If a BufferedIOBase instance is required to only provide bytes objects, this conversion is forced even when it may not be necessary. If someone is willing to do the work (and I am), is there a reason *not* to allow TextIOWrapper to accept bytes-like objects? |
Yes, there are. The code which works only with bytes is much simpler. Not only |
2014-03-26 0:35 GMT+01:00 Antoine Pitrou <report@bugs.python.org>:
Is it possible to request this feature using PyObject_GetBuffer()? I If it's not possible, a new flag is maybe needed? |
I guess that you are trying to implement a zero-copy I/O. The problem is that BytesIO does copy data. Example: >>> data=b'abcdef'
>>> x=io.BytesIO(data)
>>> x.read() is data
False Before trying to avoid copies in the "buffered" layer, something should be done for the "raw" layer (BytesIO in this case). |
On 03/26/2014 03:43 AM, STINNER Victor wrote:
Right on the first count, but wrong on the second. The class I'm |
I'm attaching a patch that enables TextIOWrapper to work with bytes-like objects from the underlying file descriptor. The code changes are pretty small, without introducing any significant additional complexity. For streams providing bytes objects, this patch does not change anything. For streams that are able to provide bytearrays or memoryviews, this patch removes the overhead that would be incurred by explicitly converting to bytes. |
Thanks for the feedback! I have attached an updated patch. I did not include any testcase because the patch did not create any new code paths, so I was assuming it'd be covered by the existing test case. But of course I was wrong. In the revised patch, I added a testcase based on your example of a more complex memoryview. (Note, however, that even with the previous implementation using nbytes = PySequence_Size(input_chunk) this test does not fail, because nbytes is used only to estimate the size of the text string). |
Nikolaus, can you please restate from scratch what the issue is? If the issue is still the one you posed in msg214776, I think the issue should be closed as "invalid" - it's *not* the case that there is no good reason for TextIOWrapper not accepting any bytes-like object. Or, to drop the double negation: as Serhiy states, read1() should return bytes, and it's perfectly fine to rely on that. Your MyByteStream ought to fail, so it is correct behaviour that it does fail. |
My usecase is that I have a binary stream class that internally uses memoryviews. I would like to read text data from this stream and thus encapsulate it in a TextIOWrapper. Currently, TextIOWrapper (correctly) expects read() to return bytes and fails if it receives any other bytes-like object. I can change my custom class to internally convert to bytes, but this means that the data is needlessly copied around and affects every other consumer of the class as well. Changing TextIOWrapper to work with any bytes-like object is (as far as I can see) rather simple. It does not introduce any new branches in the code, and it does not change the behavior for bytes objects at all. It does, however, eliminate unnecessary memcopies for classes that do not internally work with bytes objects. Therefore, I was hoping this patch could be considered for inclusion. The MyByteStream example that I gave in the first message is useless. I merely included it as the smallest possible code fragment that currently does not work, but would work after the patch in an attempt to illustrate what I meant - but apparently it had the opposite effect. Thanks for considering! |
Ok, I think it is a reasonable use case. I'm gonna look at your patch and give it a review. |
New changeset 2a1d63f09560 by Antoine Pitrou in branch 'default': |
Thank you, I've committed tha patch to 3.5. |
Note that this is not work with the punycode encoding (and may be some third-party encodings). |
On 05/13/2014 12:41 PM, Serhiy Storchaka wrote:
I do not understand. Could you elaborate? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: