msg82903 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2009-02-28 11:02 |
On Python3000, mmap.read_byte returns str not bytes, and mmap.write_byte
accepts str. Is this intended behavior?
>>> import mmap
>>> m = mmap.mmap(-1, 10)
>>> type(m.read_byte())
<class 'str'>
>>> m.write_byte("a")
>>> m.write_byte(b"a")
Maybe another possibility. read_byte() returns int which represents
byte, write_byte accepts int which represents byte. (Like b"abc"[0]
returns int not 1-length bytes)
|
msg82904 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-02-28 11:05 |
Indeed, I think it should use the "b" code, instead of the "c" code.
Please discuss this on python-dev, though.
It might not be ok to backport this to 3.0, since it may break existing
code.
|
msg82905 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-02-28 11:06 |
Furthermore, all other uses of the "c" code might need to be reconsidered.
|
msg82910 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-02-28 14:39 |
loewis> Furthermore, all other uses of the "c" code might
loewis> need to be reconsidered.
$ grep 'BuildValue.*"c"' */*c
Modules/_cursesmodule.c: return Py_BuildValue("c", rtn);
Modules/mmapmodule.c: return Py_BuildValue("c", value);
$ grep '_Parse[^"]\+"[^":;]*c' */*c
Modules/mmapmodule.c: if (!PyArg_ParseTuple(args, "c:write_byte",
&value))
PC/msvcrtmodule.c: if (!PyArg_ParseTuple(args, "c:putch", &ch))
PC/msvcrtmodule.c: if (!PyArg_ParseTuple(args, "c:ungetch", &ch))
So we have:
* mmap.read_byte()->char, mmap.write_byte(char): should be fixed to
use bytes
* <curses window>.getkey()->char: it looks correct because the
function uses also "return PyUnicode_FromString(...);"
* msvcrt.putch(char), msvcrt.ungetch(char): msvcrt has also:
- msvcrt.getch()->byte string of 1 byte
- msvcrt.getwch()->unicode string of 1 character
- msvcrt.putwch(unicode string of 1 character)
- msvcrt_ungetwch(unicode string of 1 character)
Hum, putch(), ungetch(), getch() use inconsistent types
(unicode/bytes) and should be fixed. Another issue should be open for
that.
Notes: msvcrt.putwch() accepts string of length > 1 and
msvcrt.ungetwch() doesn't check string length (and so may crash with
length=0 or length > 1?).
|
msg82912 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2009-02-28 14:48 |
I think more *bytes* cleanup is needed for mmap module documentation &
implementation. (and other modules?) Especially mmap.find() and its friends.
>>> import mmap
>>> m = mmap.mmap(-1, 10)
>>> m[:] = b"0123456789"
>>> m.find(b'2')
2
>>> m.find('2') # XXX: accepts unicode
2
|
msg82947 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2009-02-28 21:18 |
Patch attached. read_byte and write_byte use integer as byte, and other
bytes related cleanup.
|
msg83677 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-17 13:05 |
In the python-dev thread, most people agree to use bytes in mmap. Did
anyone reviewed the patch? Can you commit it to py3k than then to the
3.0 branch?
|
msg83678 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2009-03-17 13:39 |
>In the python-dev thread, most people agree to use bytes in mmap. Did
>anyone reviewed the patch?
Well, there is no conclusion which of your choices (a or b) is preferred.
http://www.nabble.com/What-type-of-object-mmap.read_byte-should-return-on-py3k--tt22261245.html#a22261245
I opened similar issue for msvcrt in issue5410.
>Can you commit it to py3k than then to the 3.0 branch?
If the patch is acceptable, yes. This patch will change behavior of
functions, I don't think I can commit this without review.
|
msg83684 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-17 17:08 |
Le Tuesday 17 March 2009 14:39:59 Hirokazu Yamamoto, vous avez écrit :
> Well, there is no conclusion which of your choices (a or b) is preferred.
Guido just wrote in other mail thread (py3k: accept unicode for 'c' and byte
for 'C' in getarg?):
"Yeah, mmap should be defined exclusively in terms of bytes."
> I opened similar issue for msvcrt in issue5410.
Cool, thanks.
> >Can you commit it to py3k than then to the 3.0 branch?
>
> If the patch is acceptable, yes. This patch will change behavior of
> functions, I don't think I can commit this without review.
Sure, we need a review of the patch. Should the patch be ported to 3.0?
|
msg83688 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2009-03-17 17:39 |
Ah, no, this should not be backported to 3.0. Martin saids so in
msg82904, and I agreed.
|
msg83693 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-03-17 19:41 |
STINNER Victor wrote:
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
>
> Le Tuesday 17 March 2009 14:39:59 Hirokazu Yamamoto, vous avez écrit :
>> Well, there is no conclusion which of your choices (a or b) is preferred.
>
> Guido just wrote in other mail thread (py3k: accept unicode for 'c' and byte
> for 'C' in getarg?):
>
> "Yeah, mmap should be defined exclusively in terms of bytes."
How does that answer the question? We know what data type to use for
multiple bytes - but what data type should be used for a single byte?
|
msg83695 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-17 20:55 |
> How does that answer the question? We know what data type to
> use for multiple bytes - but what data type should be used
> for a single byte?
Hum, what was the question? :-) Quote of my email:
« About m.read_byte(), we have two choices:
(a) Py_BuildValue("b", value) => 0
(b) Py_BuildValue("y#", &value, 1) => b"\x00"
About m.write_byte(x), we have also two choices:
(a) PyArg_ParseTuple(args, "b:write_byte", &value): write_byte(0)
(b) PyArg_ParseTuple(args, "y#:write_byte", &value, &length) and
check for length=1: write_byte(b"\x00")
(b) choices are close to Python 2.x API. But we can already use
m.read(1)->b"\x00" and m.write(b"\x00") to use byte string of 1 byte.
So it would be better to break the API and use integers, (a) choices
(...) »
Oh, I though that the question was about bytes vs str :-/ Ocean-city
and I prefer the solution (a). And you Martin?
|
msg83696 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-03-17 21:10 |
> Oh, I though that the question was about bytes vs str :-/ Ocean-city
> and I prefer the solution (a). And you Martin?
I also think that ints should be used to represent individual bytes.
However, your list of alternatives is incomplete: we *could* also change
the "c" code to accept and produce int - then mmapmodule would not need
to change at all.
|
msg84229 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-26 22:35 |
martin> However, your list of alternatives is incomplete: we
martin> *could* also change the "c" code to accept and produce
martin> int - then mmapmodule would not need to change at all.
That's extactly the idea that I proposed in issue #5499 ;-) I prefer
to have strict getarg('c') and getarg('C') than fixing each module.
|
msg85183 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-04-02 08:35 |
@ocean-city: Can you update your patch to leave
Py_BuildValue("c", ...) and
PyArg_ParseTuple(args, "c:write_byte", ...) unchanged, since #5499 is
fixed?
|
msg85185 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2009-04-02 09:07 |
Yes, here is the patch. But I noticed Py_BuildValue('c') still returns
unicode. To pass the test, #5666 is needed.
|
msg85197 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-04-02 11:41 |
So <mmap object>.read_byte() gives a byte string of 1 byte, ok. Port
from Python2 will be easier.
The patch looks correct, thanks for updating it. We know have to wait
for #5666 :-)
|
msg85198 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2009-04-02 11:59 |
Yes, you can do
m.write_byte(b"a")
but on the other hand you cannot do
a = b"abc"
m.write_byte(a[1])
instead you should do
a = b"abc"
m.write_byte(a[1:2])
This is trade off, though.
I'll update "with getarg('b') version" to compare.
|
msg85354 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-04-04 00:31 |
#5666 is closed. I finally prefers getarg('c') (byte string of lenght 1).
|
msg85412 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2009-04-04 17:10 |
Fixed in r71174.
|
msg118902 - (view) |
Author: (benrg) |
Date: 2010-10-16 22:10 |
With this patch, read_byte returns an integer in the range -128 to 127 instead of 0 to 255 if char is signed. Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32 is affected by this. I think it is a bug. The test code would fail if the test string contained any bytes outside the ASCII range.
(Did this really go unnoticed for a year and a half? I noticed it the moment I first tried to use read_byte (which was just now). I see that read_byte was broken in a different way in 3.0. Does anybody actually use it?)
|
msg120398 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2010-11-04 12:36 |
Thank for pointing this out. I've looked at bytearray and
bytes implementations, they actually return unsigned value.
I fixed this in r86159(py3k) and r86160(release31-maint).
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:46 | admin | set | github: 49641 |
2010-11-04 12:36:16 | ocean-city | set | messages:
+ msg120398 |
2010-10-16 22:10:24 | benrg | set | nosy:
+ benrg messages:
+ msg118902
|
2009-04-04 17:10:28 | benjamin.peterson | set | status: open -> closed
nosy:
+ benjamin.peterson messages:
+ msg85412
resolution: fixed |
2009-04-04 00:31:45 | vstinner | set | messages:
+ msg85354 |
2009-04-02 11:59:18 | ocean-city | set | files:
- py3k_mmap_and_bytes.patch |
2009-04-02 11:59:05 | ocean-city | set | files:
+ py3k_mmap_bytes_cleanup_with_getarg_b.patch
messages:
+ msg85198 |
2009-04-02 11:41:48 | vstinner | set | messages:
+ msg85197 |
2009-04-02 09:12:05 | ocean-city | set | files:
+ py3k_mmap_bytes_cleanup_with_getarg_c.patch |
2009-04-02 09:11:43 | ocean-city | set | files:
- py3k_mmap_bytes_cleanup_with_getarg_c.patch |
2009-04-02 09:07:06 | ocean-city | set | files:
+ py3k_mmap_bytes_cleanup_with_getarg_c.patch
dependencies:
+ Py_BuildValue("c") should return bytes? messages:
+ msg85185 |
2009-04-02 08:35:03 | vstinner | set | messages:
+ msg85183 |
2009-04-02 00:41:42 | ocean-city | set | priority: release blocker |
2009-03-31 19:23:27 | ocean-city | set | dependencies:
+ only accept byte for getarg('c') and unicode for getarg('C') |
2009-03-26 22:35:19 | vstinner | set | messages:
+ msg84229 |
2009-03-17 21:10:07 | loewis | set | messages:
+ msg83696 |
2009-03-17 20:55:57 | vstinner | set | messages:
+ msg83695 |
2009-03-17 19:41:47 | loewis | set | messages:
+ msg83693 |
2009-03-17 17:39:11 | ocean-city | set | messages:
+ msg83688 |
2009-03-17 17:08:27 | vstinner | set | messages:
+ msg83684 |
2009-03-17 13:39:58 | ocean-city | set | messages:
+ msg83678 |
2009-03-17 13:05:25 | vstinner | set | messages:
+ msg83677 |
2009-02-28 21:18:10 | ocean-city | set | files:
+ py3k_mmap_and_bytes.patch keywords:
+ patch messages:
+ msg82947 |
2009-02-28 14:48:58 | ocean-city | set | messages:
+ msg82912 |
2009-02-28 14:39:42 | vstinner | set | nosy:
+ vstinner messages:
+ msg82910 |
2009-02-28 14:07:07 | ocean-city | set | versions:
- Python 3.0 |
2009-02-28 11:06:51 | loewis | set | messages:
+ msg82905 |
2009-02-28 11:05:53 | loewis | set | nosy:
+ loewis messages:
+ msg82904 |
2009-02-28 11:02:27 | ocean-city | create | |