This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: mmap.mmap.__iter__ is broken (yields bytes instead of ints)
Type: enhancement Stage: needs patch
Components: Extension Modules Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, serhiy.storchaka, twouters, xiang.zhang, ztane
Priority: low Keywords: patch

Created on 2016-02-14 03:37 by ztane, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
mmap_bytearray_like.patch xiang.zhang, 2016-04-29 08:04 review
Messages (8)
msg260261 - (view) Author: Antti Haapala (ztane) * Date: 2016-02-14 03:37
Just noticed when answering a question on StackOverflow (http://stackoverflow.com/q/35387843/918959) that on Python 3 iterating over a mmap object yields individual bytes as bytes objects, even though iterating over slices, indexing and so on gives ints

Example:

    import mmap

    with open('test.dat', 'rb') as f:
        mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        for b in mm:
            print(b)
            # prints for example b'A' instead of 65
        mm.close()

I believe this should be fixed for the sake of completeness - the documentation says that "Memory-mapped file objects behave like both bytearray and like file objects." - however the current behaviour is neither like a bytearray nor like a file object, and quite confusing.

Similarly the `in` operator seems to be broken; one could search for space using `32 in bytesobj`, which would work for slices but not for the whole mmap object.
msg263470 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-04-15 10:37
I don't think we can change this in 3.5 since it would break backward compatibility.

> Similarly the `in` operator seems to be broken; one could search for space using `32 in bytesobj`, which would work for slices but not for the whole mmap object.

Seems a reasonable request to me. Could you please open a separate issue?
msg263471 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-15 10:47
Iterating a slice produces ints too (a slice is just a bytes object).

>>> import mmap, sys
>>> with open(sys.executable, 'rb') as f:
...     mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
...     print(next(iter(mm)))
...     print(next(iter(mm[:10])))
... 
b'\x7f'
127

Seems this module taken little love when migrated to 3.0.
msg264461 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-04-29 05:52
I tried to write a patch to make mmap behave like bytearray more. Making iteration returns int is easy. But I am trapped in the contains operation. To support operation like b'aa' in b'aabbcc', we have to do a str in str search. I don't find any portable way except writing my own. bytes and bytearray use stringlib_find, but that is not reachable in a c module. Any advice?
msg264462 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-04-29 06:00
Thanks for taking a look at this, Xiang.

Like I said in msg263470, making the in operator work with mmap objects is out of scope for this issue and it should be handled in a separate issue (I already have a WIP patch, but please feel free to work on it).
msg264463 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-04-29 06:07
Ho, I'm really curious to see the resolution. ;-)
msg264464 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-29 06:08
Making iteration returns int is backward incompatible change. I afraid it is too later to do this. We lost a chance at a time of Python 3.0.

We need separate mmap class that behave more like bytes/bytearray/memoryview/sequence of 8-bit integers.
msg264473 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-04-29 08:04
Although Serhiy thinks we need a separate class for this but I still want to upload my patch first. Maybe some of it can be helpful later, or garbage.

I add a mmap_contains to fix the in operator's behaviour (I don't find the separate issue). I use the simplest search method which is O(m*n). Previously I thought it is not acceptable but I find out that mmap_gfind goes this way too.

By the way, only operations related to mmap_item are affected, which I can see is iteration and in (search does not need to iterate since there is find method), indexing is not affected. So maybe this does not break the backward compatibility that hard.

Hope no disturb.
History
Date User Action Args
2022-04-11 14:58:27adminsetgithub: 70546
2016-04-29 08:04:31xiang.zhangsetfiles: + mmap_bytearray_like.patch
keywords: + patch
messages: + msg264473
2016-04-29 06:08:29serhiy.storchakasetmessages: + msg264464
2016-04-29 06:07:20xiang.zhangsetmessages: + msg264463
2016-04-29 06:00:17berker.peksagsetmessages: + msg264462
2016-04-29 05:52:12xiang.zhangsetmessages: + msg264461
2016-04-28 09:54:07xiang.zhangsetnosy: + xiang.zhang
2016-04-15 10:47:21serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg263471
2016-04-15 10:37:05berker.peksagsetpriority: normal -> low

type: behavior -> enhancement
components: + Extension Modules
versions: - Python 3.5
nosy: + berker.peksag

messages: + msg263470
stage: needs patch
2016-02-20 00:03:24terry.reedysetnosy: + twouters

versions: - Python 3.2, Python 3.3, Python 3.4
2016-02-14 03:37:46ztanecreate