Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open().write() and .read() fails on 2 GB+ data (OS X) #68846

Closed
lebigot mannequin opened this issue Jul 18, 2015 · 32 comments
Closed

open().write() and .read() fails on 2 GB+ data (OS X) #68846

lebigot mannequin opened this issue Jul 18, 2015 · 32 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes extension-modules C modules in the Modules dir topic-IO type-bug An unexpected behavior, bug, or error

Comments

@lebigot
Copy link
Mannequin

lebigot mannequin commented Jul 18, 2015

BPO 24658
Nosy @warsaw, @ronaldoussoren, @vstinner, @lebigot, @ned-deily, @zware, @matrixise, @miss-islington
PRs
  • bpo-24658: Fix read/write on file with a size greater than 2GB on OSX #1705
  • [3.7] bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) #9936
  • [3.6] bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) #9937
  • WIP: [2.7] bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) #9938
  • bpo-24658: os.read() reuses _PY_READ_MAX #10657
  • [3.7] bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657) #10658
  • [3.6] bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657) #10659
  • Files
  • issue24658.txt
  • issue24658-3.6.diff
  • issue24658-3.5.diff
  • issue24658-2-3.6.diff
  • issue24658-3-3.6.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-01-19.18:17:44.745>
    created_at = <Date 2015-07-18.02:59:28.391>
    labels = ['extension-modules', '3.8', 'type-bug', '3.7', 'expert-IO']
    title = 'open().write() and .read() fails on 2 GB+ data (OS X)'
    updated_at = <Date 2020-01-19.18:17:57.633>
    user = 'https://github.com/lebigot'

    bugs.python.org fields:

    activity = <Date 2020-01-19.18:17:57.633>
    actor = 'zach.ware'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-01-19.18:17:44.745>
    closer = 'zach.ware'
    components = ['Extension Modules', 'IO']
    creation = <Date 2015-07-18.02:59:28.391>
    creator = 'lebigot'
    dependencies = []
    files = ['39960', '44021', '44024', '45177', '45178']
    hgrepos = []
    issue_num = 24658
    keywords = ['patch']
    message_count = 32.0
    messages = ['246878', '246879', '246979', '246983', '246985', '246987', '246993', '246994', '246999', '247007', '247122', '256882', '272030', '272044', '278672', '278724', '279132', '279159', '294113', '294122', '294160', '294195', '327912', '327916', '327918', '327940', '330259', '330260', '330262', '335566', '335569', '360264']
    nosy_count = 11.0
    nosy_names = ['barry', 'ronaldoussoren', 'vstinner', 'lebigot', 'ned.deily', 'zach.ware', 'matrixise', 'Mali Akmanalp', 'Ian Carroll', 'Harry Li', 'miss-islington']
    pr_nums = ['1705', '9936', '9937', '9938', '10657', '10658', '10659']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue24658'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @lebigot
    Copy link
    Mannequin Author

    lebigot mannequin commented Jul 18, 2015

    On OS X, the Homebrew and MacPorts versions of Python 3.4.3 raise an exception when writing a 4 GB bytearray:

    >>> open('/dev/null', 'wb').write(bytearray(2**31-1))
    2147483647
    
    >>> open('/dev/null', 'wb').write(bytearray(2**31))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: [Errno 22] Invalid argument

    This has an impact on pickle, in particular (http://stackoverflow.com/questions/31468117/python-3-can-pickle-handle-byte-objects-larger-than-4gb).

    @lebigot lebigot mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Jul 18, 2015
    @lebigot
    Copy link
    Mannequin Author

    lebigot mannequin commented Jul 18, 2015

    PS: I should have written "2 GB" bytearray (so this looks like a signed 32 bit integer issue).

    @lebigot lebigot mannequin changed the title open().write() fails on 4 GB+ data (OS X) open().write() fails on 2 GB+ data (OS X) Jul 18, 2015
    @serhiy-storchaka serhiy-storchaka added extension-modules C modules in the Modules dir topic-IO and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jul 18, 2015
    @ronaldoussoren
    Copy link
    Contributor

    This is likely a platform bug, it fails with os.write as well. Interestingly enough file.write works fine on Python 2.7 (which uses stdio), that appearently works around this kernel misfeature.

    A possible partial workaround is recognise this error in the implementation of os.write and then perform a partial write. Problem is: while write(2) is documented as possibly writing less data than expected most users writing to normal files (as opposed to sockets) probably don’t expect that behavior. On the other hand, os.write already limits writes to INT_MAX on Windows (see _Py_write in Python/fileutils.c)

    Because of this I’m in favour of adding a simular workaround on OSX (and can provide a patch).

    BTW. the manpage for write says that writev(2) might fail with EINVAL:

     [EINVAL]           The sum of the iov_len values in the iov array over-
                        flows a 32-bit integer.
    

    I wouldn’t be surprised if write(2) is implemented using writev(2) and that this explains the problem.

    On 18 Jul 2015, at 06:05, Serhiy Storchaka <report@bugs.python.org> wrote:

    Changes by Serhiy Storchaka <storchaka@gmail.com>:

    ----------
    components: +Extension Modules, IO -Interpreter Core
    nosy: +haypo, ned.deily, ronaldoussoren


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue24658\>


    @ronaldoussoren
    Copy link
    Contributor

    The attached patch is a first stab at a workaround. It will unconditionally limit the write size in os.write to INT_MAX on OSX.

    I haven't tested yet if this actually fixes the problem mentioned on stack overflow.

    @lebigot
    Copy link
    Mannequin Author

    lebigot mannequin commented Jul 20, 2015

    Thank you for looking into this, Ronald.

    What does your patch do, exactly? does it only limit the returned byte count, or does it really limit the size of the data written by truncating it?

    In any case, it would be very useful to have a warning from the Python interpreter. If the data is truncated, I would even prefer an explicit exception (e.g. "data too big for this platform (>= 2 GB)"), along with an explicit mention of it in the documentation. What do you think?

    @ronaldoussoren
    Copy link
    Contributor

    The patch limits os.write to writing at most INT_MAX bytes on OSX. Buffered I/O using open("/some/file", "wb") should still write all data (at least according to the limited tests I've done so far).

    The same limitation is already present on Windows.

    And as I wrote before: os.write may accoding to the manpage for write(2) already write less bytes than requested.

    I'm -1 on using an explicit exception or printing a warning about this.

    @lebigot
    Copy link
    Mannequin Author

    lebigot mannequin commented Jul 20, 2015

    I see, thanks.

    This sounds good to me too: no need for a warning or exception, indeed, since file.write() should work and the behavior of os.write() is documented.

    @vstinner
    Copy link
    Member

    The Windows limit to INT_MAX is one many functions:

    • os.write()
    • io.FileIO.write()
    • hum, maybe other, I don't remember

    In the default branch, there is now _Py_write(), so only one place should be fixed.

    See the issue bpo-11395 which fixed the bug on Windows.

    If it's a bug, it should be fixed on Python 2.7, 3.4, 3.5 and default branches.

    @ronaldoussoren
    Copy link
    Contributor

    The patch I attached earlier is for the default branch. More work is needed for the other active branches.

    @MaliAkmanalp
    Copy link
    Mannequin

    MaliAkmanalp mannequin commented Jul 20, 2015

    I don't know how helpful it is at this point, but the issue happens while reading also.

    Here's some related discussion in the numpy tracker:

    numpy/numpy#3858 (The claim was that OSX Mavericks fixed this issue, it didn't, and there is an Apple bug ID in there somewhere, plus there is a link to a patch the torch folks used)

    and also in pandas: pandas-dev/pandas#10641

    I'd be happy to try to test patches out.

    @ronaldoussoren
    Copy link
    Contributor

    Indeed, read(2) has the same problem. I just tested this with a small C program.

    I'll rework the patch for this, and will work on patches for 3.4/3.5 and 2.7 as well.

    @IanCarroll
    Copy link
    Mannequin

    IanCarroll mannequin commented Dec 22, 2015

    Write still fails on 3.5.1 and OS X 10.11.2. I'm no dev, so can someone explain how to use the patch while it's under review?

    @matrixise
    Copy link
    Member

    Here is my patch 3.6, I am going to provide the patch for 3.5

    @matrixise
    Copy link
    Member

    Sorry, I was busy with a task but here is my patch for 3.5, in fact, it's just the same for 3.6

    @matrixise
    Copy link
    Member

    ping

    @matrixise
    Copy link
    Member

    Ned Deily, I added you because you are in the expert for the OSX platform.

    @matrixise
    Copy link
    Member

    Victor, could you check the new patch ?

    @matrixise
    Copy link
    Member

    upload a new version

    @matrixise
    Copy link
    Member

    Hello....

    I just updated this ticket with a PR on Github.

    @vstinner
    Copy link
    Member

    I see that we have other clamps on Windows using INT_MAX:

    • sock_setsockopt()
    • sock_sendto_impl()

    Are these functions ok on macOS? If not, a new issue should be opened ;-)

    @matrixise
    Copy link
    Member

    1. in the case of Windows, maybe we could open a new issue because this fix is only for MacOS

    2. the issue was only for the files and not the sockets

    what do you suggest ?

    @vstinner
    Copy link
    Member

    I don't say that something is broken. Just that it would be nice if someone
    could test socket methods.

    On Windows, the bug was obvious: the function takes a C int...

    @matrixise
    Copy link
    Member

    Hi all,

    Could you test the PR with Windows? I don't have a Windows computer.

    Thank you,

    Stéphane

    @vstinner
    Copy link
    Member

    New changeset 74a8b6e by Victor Stinner (Stéphane Wirtel) in branch 'master':
    bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)
    74a8b6e

    @vstinner vstinner added 3.7 (EOL) end of life 3.8 only security fixes labels Oct 17, 2018
    @vstinner
    Copy link
    Member

    New changeset a5ebc20 by Victor Stinner (Stéphane Wirtel) in branch '3.6':
    [3.6] bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) (GH-9937)
    a5ebc20

    @miss-islington
    Copy link
    Contributor

    New changeset 178d1c0 by Miss Islington (bot) in branch '3.7':
    bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)
    178d1c0

    @vstinner
    Copy link
    Member

    New changeset 9a0d7a7 by Victor Stinner in branch 'master':
    bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
    9a0d7a7

    @miss-islington
    Copy link
    Contributor

    New changeset 18f3327 by Miss Islington (bot) in branch '3.7':
    bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
    18f3327

    @miss-islington
    Copy link
    Contributor

    New changeset 0c15e50 by Miss Islington (bot) in branch '3.6':
    bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
    0c15e50

    @warsaw
    Copy link
    Member

    warsaw commented Feb 14, 2019

    Nosying myself since I just landed here based on an internal $work bug report. We're seeing it with reads. I'll try to set aside some work time to review the PRs.

    @warsaw warsaw changed the title open().write() fails on 2 GB+ data (OS X) open().write() and .read() fails on 2 GB+ data (OS X) Feb 14, 2019
    @matrixise
    Copy link
    Member

    Hi @barry

    normally this issue is fixed for 3.x but I need to finish my PR for 2.7.

    I think to fix for 2.7 in the next weeks.

    @ned-deily ned-deily removed their assignment Feb 16, 2019
    @zware
    Copy link
    Member

    zware commented Jan 19, 2020

    Since 3.x is fixed and 2.7 has reached EOL, I'm closing the issue. Thanks for getting it fixed in 3.x, Stephane and Victor!

    @zware zware closed this as completed Jan 19, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes extension-modules C modules in the Modules dir topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants