Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap: read_byte/write_byte and object type #49641

Closed
ocean-city mannequin opened this issue Feb 28, 2009 · 22 comments
Closed

mmap: read_byte/write_byte and object type #49641

ocean-city mannequin opened this issue Feb 28, 2009 · 22 comments
Labels
extension-modules C modules in the Modules dir release-blocker type-bug An unexpected behavior, bug, or error

Comments

@ocean-city
Copy link
Mannequin

ocean-city mannequin commented Feb 28, 2009

BPO 5391
Nosy @loewis, @vstinner, @benjaminp
Dependencies
  • bpo-5499: only accept byte for getarg('c') and unicode for getarg('C')
  • bpo-5666: Py_BuildValue("c") should return bytes?
  • Files
  • py3k_mmap_bytes_cleanup_with_getarg_c.patch: with getarg('c')
  • py3k_mmap_bytes_cleanup_with_getarg_b.patch: with getarg('b')
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-04-04.17:10:28.974>
    created_at = <Date 2009-02-28.11:02:27.231>
    labels = ['extension-modules', 'type-bug', 'release-blocker']
    title = 'mmap: read_byte/write_byte and object type'
    updated_at = <Date 2010-11-04.12:36:16.692>
    user = 'https://bugs.python.org/ocean-city'

    bugs.python.org fields:

    activity = <Date 2010-11-04.12:36:16.692>
    actor = 'ocean-city'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-04-04.17:10:28.974>
    closer = 'benjamin.peterson'
    components = ['Extension Modules']
    creation = <Date 2009-02-28.11:02:27.231>
    creator = 'ocean-city'
    dependencies = ['5499', '5666']
    files = ['13576', '13580']
    hgrepos = []
    issue_num = 5391
    keywords = ['patch']
    message_count = 22.0
    messages = ['82903', '82904', '82905', '82910', '82912', '82947', '83677', '83678', '83684', '83688', '83693', '83695', '83696', '84229', '85183', '85185', '85197', '85198', '85354', '85412', '118902', '120398']
    nosy_count = 5.0
    nosy_names = ['loewis', 'vstinner', 'ocean-city', 'benjamin.peterson', 'benrg']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue5391'
    versions = ['Python 3.1']

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Feb 28, 2009

    On Python3000, mmap.read_byte returns str not bytes, and mmap.write_byte
    accepts str. Is this intended behavior?

    >>> import mmap
    >>> m = mmap.mmap(-1, 10)
    >>> type(m.read_byte())
    <class 'str'>
    >>> m.write_byte("a")
    >>> m.write_byte(b"a")

    Maybe another possibility. read_byte() returns int which represents
    byte, write_byte accepts int which represents byte. (Like b"abc"[0]
    returns int not 1-length bytes)

    @ocean-city ocean-city mannequin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Feb 28, 2009
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Feb 28, 2009

    Indeed, I think it should use the "b" code, instead of the "c" code.
    Please discuss this on python-dev, though.

    It might not be ok to backport this to 3.0, since it may break existing
    code.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Feb 28, 2009

    Furthermore, all other uses of the "c" code might need to be reconsidered.

    @vstinner
    Copy link
    Member

    loewis> Furthermore, all other uses of the "c" code might
    loewis> need to be reconsidered.

    $ grep 'BuildValue.*"c"' */*c
    Modules/_cursesmodule.c:    return Py_BuildValue("c", rtn);
    Modules/mmapmodule.c:           return Py_BuildValue("c", value);
    
    $ grep '_Parse[^"]\+"[^":;]*c' */*c
    Modules/mmapmodule.c:   if (!PyArg_ParseTuple(args, "c:write_byte", 
    &value))
    PC/msvcrtmodule.c:      if (!PyArg_ParseTuple(args, "c:putch", &ch))
    PC/msvcrtmodule.c:      if (!PyArg_ParseTuple(args, "c:ungetch", &ch))

    So we have:

    • mmap.read_byte()->char, mmap.write_byte(char): should be fixed to
      use bytes
    • <curses window>.getkey()->char: it looks correct because the
      function uses also "return PyUnicode_FromString(...);"
    • msvcrt.putch(char), msvcrt.ungetch(char): msvcrt has also:
      • msvcrt.getch()->byte string of 1 byte
      • msvcrt.getwch()->unicode string of 1 character
      • msvcrt.putwch(unicode string of 1 character)
      • msvcrt_ungetwch(unicode string of 1 character)
        Hum, putch(), ungetch(), getch() use inconsistent types
        (unicode/bytes) and should be fixed. Another issue should be open for
        that.

    Notes: msvcrt.putwch() accepts string of length > 1 and
    msvcrt.ungetwch() doesn't check string length (and so may crash with
    length=0 or length > 1?).

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Feb 28, 2009

    I think more *bytes* cleanup is needed for mmap module documentation &
    implementation. (and other modules?) Especially mmap.find() and its friends.

    >>> import mmap
    >>> m = mmap.mmap(-1, 10)
    >>> m[:] = b"0123456789"
    >>> m.find(b'2')
    2
    >>> m.find('2') # XXX: accepts unicode
    2

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Feb 28, 2009

    Patch attached. read_byte and write_byte use integer as byte, and other
    bytes related cleanup.

    @vstinner
    Copy link
    Member

    In the python-dev thread, most people agree to use bytes in mmap. Did
    anyone reviewed the patch? Can you commit it to py3k than then to the
    3.0 branch?

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Mar 17, 2009

    In the python-dev thread, most people agree to use bytes in mmap. Did
    anyone reviewed the patch?

    Well, there is no conclusion which of your choices (a or b) is preferred.
    http://www.nabble.com/What-type-of-object-mmap.read_byte-should-return-on-py3k--tt22261245.html#a22261245
    I opened similar issue for msvcrt in bpo-5410.

    Can you commit it to py3k than then to the 3.0 branch?
    If the patch is acceptable, yes. This patch will change behavior of
    functions, I don't think I can commit this without review.

    @vstinner
    Copy link
    Member

    Le Tuesday 17 March 2009 14:39:59 Hirokazu Yamamoto, vous avez écrit :

    Well, there is no conclusion which of your choices (a or b) is preferred.

    Guido just wrote in other mail thread (py3k: accept unicode for 'c' and byte
    for 'C' in getarg?):

    "Yeah, mmap should be defined exclusively in terms of bytes."

    I opened similar issue for msvcrt in bpo-5410.

    Cool, thanks.

    >Can you commit it to py3k than then to the 3.0 branch?

    If the patch is acceptable, yes. This patch will change behavior of
    functions, I don't think I can commit this without review.

    Sure, we need a review of the patch. Should the patch be ported to 3.0?

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Mar 17, 2009

    Ah, no, this should not be backported to 3.0. Martin saids so in
    msg82904, and I agreed.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 17, 2009

    STINNER Victor wrote:

    STINNER Victor <victor.stinner@haypocalc.com> added the comment:

    Le Tuesday 17 March 2009 14:39:59 Hirokazu Yamamoto, vous avez écrit :
    > Well, there is no conclusion which of your choices (a or b) is preferred.

    Guido just wrote in other mail thread (py3k: accept unicode for 'c' and byte
    for 'C' in getarg?):

    "Yeah, mmap should be defined exclusively in terms of bytes."

    How does that answer the question? We know what data type to use for
    multiple bytes - but what data type should be used for a single byte?

    @vstinner
    Copy link
    Member

    How does that answer the question? We know what data type to
    use for multiple bytes - but what data type should be used
    for a single byte?

    Hum, what was the question? :-) Quote of my email:

    « About m.read_byte(), we have two choices:
    (a) Py_BuildValue("b", value) => 0
    (b) Py_BuildValue("y#", &value, 1) => b"\x00"

    About m.write_byte(x), we have also two choices:
    (a) PyArg_ParseTuple(args, "b:write_byte", &value): write_byte(0)
    (b) PyArg_ParseTuple(args, "y#:write_byte", &value, &length) and
    check for length=1: write_byte(b"\x00")

    (b) choices are close to Python 2.x API. But we can already use
    m.read(1)->b"\x00" and m.write(b"\x00") to use byte string of 1 byte.
    So it would be better to break the API and use integers, (a) choices
    (...) »

    Oh, I though that the question was about bytes vs str :-/ Ocean-city
    and I prefer the solution (a). And you Martin?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 17, 2009

    Oh, I though that the question was about bytes vs str :-/ Ocean-city
    and I prefer the solution (a). And you Martin?

    I also think that ints should be used to represent individual bytes.

    However, your list of alternatives is incomplete: we *could* also change
    the "c" code to accept and produce int - then mmapmodule would not need
    to change at all.

    @vstinner
    Copy link
    Member

    martin> However, your list of alternatives is incomplete: we
    martin> *could* also change the "c" code to accept and produce
    martin> int - then mmapmodule would not need to change at all.

    That's extactly the idea that I proposed in issue bpo-5499 ;-) I prefer
    to have strict getarg('c') and getarg('C') than fixing each module.

    @ocean-city ocean-city mannequin added the release-blocker label Apr 2, 2009
    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2009

    @Ocean-City: Can you update your patch to leave
    Py_BuildValue("c", ...) and
    PyArg_ParseTuple(args, "c:write_byte", ...) unchanged, since bpo-5499 is
    fixed?

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Apr 2, 2009

    Yes, here is the patch. But I noticed Py_BuildValue('c') still returns
    unicode. To pass the test, bpo-5666 is needed.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2009

    So <mmap object>.read_byte() gives a byte string of 1 byte, ok. Port
    from Python2 will be easier.

    The patch looks correct, thanks for updating it. We know have to wait
    for bpo-5666 :-)

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Apr 2, 2009

    Yes, you can do
    m.write_byte(b"a")
    but on the other hand you cannot do
    a = b"abc"
    m.write_byte(a[1])
    instead you should do
    a = b"abc"
    m.write_byte(a[1:2])
    This is trade off, though.

    I'll update "with getarg('b') version" to compare.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 4, 2009

    bpo-5666 is closed. I finally prefers getarg('c') (byte string of lenght 1).

    @benjaminp
    Copy link
    Contributor

    Fixed in r71174.

    @benrg
    Copy link
    Mannequin

    benrg mannequin commented Oct 16, 2010

    With this patch, read_byte returns an integer in the range -128 to 127 instead of 0 to 255 if char is signed. Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32 is affected by this. I think it is a bug. The test code would fail if the test string contained any bytes outside the ASCII range.

    (Did this really go unnoticed for a year and a half? I noticed it the moment I first tried to use read_byte (which was just now). I see that read_byte was broken in a different way in 3.0. Does anybody actually use it?)

    @ocean-city
    Copy link
    Mannequin Author

    ocean-city mannequin commented Nov 4, 2010

    Thank for pointing this out. I've looked at bytearray and
    bytes implementations, they actually return unsigned value.
    I fixed this in r86159(py3k) and r86160(release31-maint).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir release-blocker type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants