This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: mmap: read_byte/write_byte and object type
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.1
process
Status: closed Resolution: fixed
Dependencies: 5499 5666 Superseder:
Assigned To: Nosy List: benjamin.peterson, benrg, loewis, ocean-city, vstinner
Priority: release blocker Keywords: patch

Created on 2009-02-28 11:02 by ocean-city, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
py3k_mmap_bytes_cleanup_with_getarg_c.patch ocean-city, 2009-04-02 09:12 with getarg('c')
py3k_mmap_bytes_cleanup_with_getarg_b.patch ocean-city, 2009-04-02 11:59 with getarg('b')
Messages (22)
msg82903 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-02-28 11:02
On Python3000, mmap.read_byte returns str not bytes, and mmap.write_byte
accepts str. Is this intended behavior?

>>> import mmap
>>> m = mmap.mmap(-1, 10)
>>> type(m.read_byte())
<class 'str'>
>>> m.write_byte("a")
>>> m.write_byte(b"a")

Maybe another possibility. read_byte() returns int which represents
byte, write_byte accepts int which represents byte. (Like b"abc"[0]
returns int not 1-length bytes)
msg82904 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-02-28 11:05
Indeed, I think it should use the "b" code, instead of the "c" code.
Please discuss this on python-dev, though.

It might not be ok to backport this to 3.0, since it may break existing
code.
msg82905 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-02-28 11:06
Furthermore, all other uses of the "c" code might need to be reconsidered.
msg82910 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-02-28 14:39
loewis> Furthermore, all other uses of the "c" code might 
loewis> need to be reconsidered.

$ grep 'BuildValue.*"c"' */*c
Modules/_cursesmodule.c:    return Py_BuildValue("c", rtn);
Modules/mmapmodule.c:           return Py_BuildValue("c", value);

$ grep '_Parse[^"]\+"[^":;]*c' */*c
Modules/mmapmodule.c:   if (!PyArg_ParseTuple(args, "c:write_byte", 
&value))
PC/msvcrtmodule.c:      if (!PyArg_ParseTuple(args, "c:putch", &ch))
PC/msvcrtmodule.c:      if (!PyArg_ParseTuple(args, "c:ungetch", &ch))

So we have:
* mmap.read_byte()->char, mmap.write_byte(char): should be fixed to 
use bytes
* <curses window>.getkey()->char: it looks correct because the 
function uses also "return PyUnicode_FromString(...);"
* msvcrt.putch(char), msvcrt.ungetch(char): msvcrt has also:
  - msvcrt.getch()->byte string of 1 byte
  - msvcrt.getwch()->unicode string of 1 character
  - msvcrt.putwch(unicode string of 1 character)
  - msvcrt_ungetwch(unicode string of 1 character)
  Hum, putch(), ungetch(), getch() use inconsistent types 
(unicode/bytes) and should be fixed. Another issue should be open for 
that.

Notes: msvcrt.putwch() accepts string of length > 1 and 
msvcrt.ungetwch() doesn't check string length (and so may crash with 
length=0 or length > 1?).
msg82912 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-02-28 14:48
I think more *bytes* cleanup is needed for mmap module documentation &
implementation. (and other modules?) Especially mmap.find() and its friends.

>>> import mmap
>>> m = mmap.mmap(-1, 10)
>>> m[:] = b"0123456789"
>>> m.find(b'2')
2
>>> m.find('2') # XXX: accepts unicode
2
msg82947 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-02-28 21:18
Patch attached. read_byte and write_byte use integer as byte, and other
bytes related cleanup.
msg83677 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-17 13:05
In the python-dev thread, most people agree to use bytes in mmap. Did 
anyone reviewed the patch? Can you commit it to py3k than then to the 
3.0 branch?
msg83678 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-03-17 13:39
>In the python-dev thread, most people agree to use bytes in mmap. Did 
>anyone reviewed the patch?

Well, there is no conclusion which of your choices (a or b) is preferred.
http://www.nabble.com/What-type-of-object-mmap.read_byte-should-return-on-py3k--tt22261245.html#a22261245
I opened similar issue for msvcrt in issue5410.

>Can you commit it to py3k than then to the 3.0 branch?
If the patch is acceptable, yes. This patch will change behavior of
functions, I don't think I can commit this without review.
msg83684 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-17 17:08
Le Tuesday 17 March 2009 14:39:59 Hirokazu Yamamoto, vous avez écrit :
> Well, there is no conclusion which of your choices (a or b) is preferred.

Guido just wrote in other mail thread (py3k: accept unicode for 'c' and byte 
for 'C' in getarg?):

   "Yeah, mmap should be defined exclusively in terms of bytes."

> I opened similar issue for msvcrt in issue5410.

Cool, thanks.

> >Can you commit it to py3k than then to the 3.0 branch?
>
> If the patch is acceptable, yes. This patch will change behavior of
> functions, I don't think I can commit this without review.

Sure, we need a review of the patch. Should the patch be ported to 3.0?
msg83688 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-03-17 17:39
Ah, no, this should not be backported to 3.0. Martin saids so in
msg82904, and I agreed.
msg83693 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-03-17 19:41
STINNER Victor wrote:
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> Le Tuesday 17 March 2009 14:39:59 Hirokazu Yamamoto, vous avez écrit :
>> Well, there is no conclusion which of your choices (a or b) is preferred.
> 
> Guido just wrote in other mail thread (py3k: accept unicode for 'c' and byte 
> for 'C' in getarg?):
> 
>    "Yeah, mmap should be defined exclusively in terms of bytes."

How does that answer the question? We know what data type to use for
multiple bytes - but what data type should be used for a single byte?
msg83695 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-17 20:55
> How does that answer the question? We know what data type to 
> use for multiple bytes - but what data type should be used 
> for a single byte?

Hum, what was the question? :-) Quote of my email:

« About m.read_byte(), we have two choices:
 (a) Py_BuildValue("b", value) => 0
 (b) Py_BuildValue("y#", &value, 1) => b"\x00"

About m.write_byte(x), we have also two choices:
 (a) PyArg_ParseTuple(args, "b:write_byte", &value): write_byte(0)
 (b) PyArg_ParseTuple(args, "y#:write_byte", &value, &length) and
     check for length=1: write_byte(b"\x00")

(b) choices are close to Python 2.x API. But we can already use 
m.read(1)->b"\x00" and m.write(b"\x00") to use byte string of 1 byte. 
So it would be better to break the API and use integers, (a) choices 
(...) »

Oh, I though that the question was about bytes vs str :-/ Ocean-city 
and I prefer the solution (a). And you Martin?
msg83696 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-03-17 21:10
> Oh, I though that the question was about bytes vs str :-/ Ocean-city 
> and I prefer the solution (a). And you Martin?

I also think that ints should be used to represent individual bytes.

However, your list of alternatives is incomplete: we *could* also change
the "c" code to accept and produce int - then mmapmodule would not need
to change at all.
msg84229 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-26 22:35
martin> However, your list of alternatives is incomplete: we 
martin> *could* also change the "c" code to accept and produce 
martin> int - then mmapmodule would not need to change at all.

That's extactly the idea that I proposed in issue #5499 ;-) I prefer 
to have strict getarg('c') and getarg('C') than fixing each module.
msg85183 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-04-02 08:35
@ocean-city: Can you update your patch to leave 
Py_BuildValue("c", ...) and 
PyArg_ParseTuple(args, "c:write_byte", ...) unchanged, since #5499 is 
fixed?
msg85185 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-04-02 09:07
Yes, here is the patch. But I noticed Py_BuildValue('c') still returns
unicode. To pass the test, #5666 is needed.
msg85197 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-04-02 11:41
So <mmap object>.read_byte() gives a byte string of 1 byte, ok. Port 
from Python2 will be easier.

The patch looks correct, thanks for updating it. We know have to wait 
for #5666 :-)
msg85198 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-04-02 11:59
Yes, you can do
  m.write_byte(b"a")
but on the other hand you cannot do
  a = b"abc"
  m.write_byte(a[1])
instead you should do
  a = b"abc"
  m.write_byte(a[1:2])
This is trade off, though.

I'll update "with getarg('b') version" to compare.
msg85354 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-04-04 00:31
#5666 is closed. I finally prefers getarg('c') (byte string of lenght 1).
msg85412 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-04-04 17:10
Fixed in r71174.
msg118902 - (view) Author: (benrg) Date: 2010-10-16 22:10
With this patch, read_byte returns an integer in the range -128 to 127 instead of 0 to 255 if char is signed. Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32 is affected by this. I think it is a bug. The test code would fail if the test string contained any bytes outside the ASCII range.

(Did this really go unnoticed for a year and a half? I noticed it the moment I first tried to use read_byte (which was just now). I see that read_byte was broken in a different way in 3.0. Does anybody actually use it?)
msg120398 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2010-11-04 12:36
Thank for pointing this out. I've looked at bytearray and
bytes implementations, they actually return unsigned value.
I fixed this in r86159(py3k) and r86160(release31-maint).
History
Date User Action Args
2022-04-11 14:56:46adminsetgithub: 49641
2010-11-04 12:36:16ocean-citysetmessages: + msg120398
2010-10-16 22:10:24benrgsetnosy: + benrg
messages: + msg118902
2009-04-04 17:10:28benjamin.petersonsetstatus: open -> closed

nosy: + benjamin.peterson
messages: + msg85412

resolution: fixed
2009-04-04 00:31:45vstinnersetmessages: + msg85354
2009-04-02 11:59:18ocean-citysetfiles: - py3k_mmap_and_bytes.patch
2009-04-02 11:59:05ocean-citysetfiles: + py3k_mmap_bytes_cleanup_with_getarg_b.patch

messages: + msg85198
2009-04-02 11:41:48vstinnersetmessages: + msg85197
2009-04-02 09:12:05ocean-citysetfiles: + py3k_mmap_bytes_cleanup_with_getarg_c.patch
2009-04-02 09:11:43ocean-citysetfiles: - py3k_mmap_bytes_cleanup_with_getarg_c.patch
2009-04-02 09:07:06ocean-citysetfiles: + py3k_mmap_bytes_cleanup_with_getarg_c.patch

dependencies: + Py_BuildValue("c") should return bytes?
messages: + msg85185
2009-04-02 08:35:03vstinnersetmessages: + msg85183
2009-04-02 00:41:42ocean-citysetpriority: release blocker
2009-03-31 19:23:27ocean-citysetdependencies: + only accept byte for getarg('c') and unicode for getarg('C')
2009-03-26 22:35:19vstinnersetmessages: + msg84229
2009-03-17 21:10:07loewissetmessages: + msg83696
2009-03-17 20:55:57vstinnersetmessages: + msg83695
2009-03-17 19:41:47loewissetmessages: + msg83693
2009-03-17 17:39:11ocean-citysetmessages: + msg83688
2009-03-17 17:08:27vstinnersetmessages: + msg83684
2009-03-17 13:39:58ocean-citysetmessages: + msg83678
2009-03-17 13:05:25vstinnersetmessages: + msg83677
2009-02-28 21:18:10ocean-citysetfiles: + py3k_mmap_and_bytes.patch
keywords: + patch
messages: + msg82947
2009-02-28 14:48:58ocean-citysetmessages: + msg82912
2009-02-28 14:39:42vstinnersetnosy: + vstinner
messages: + msg82910
2009-02-28 14:07:07ocean-citysetversions: - Python 3.0
2009-02-28 11:06:51loewissetmessages: + msg82905
2009-02-28 11:05:53loewissetnosy: + loewis
messages: + msg82904
2009-02-28 11:02:27ocean-citycreate