This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: open().write() and .read() fails on 2 GB+ data (OS X)
Type: behavior Stage: resolved
Components: Extension Modules, IO Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Harry Li, Ian Carroll, Mali Akmanalp, barry, lebigot, matrixise, miss-islington, ned.deily, ronaldoussoren, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2015-07-18 02:59 by lebigot, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue24658.txt ronaldoussoren, 2015-07-20 12:50 review
issue24658-3.6.diff matrixise, 2016-08-05 13:57 review
issue24658-3.5.diff matrixise, 2016-08-05 17:28 review
issue24658-2-3.6.diff matrixise, 2016-10-21 14:25 review
issue24658-3-3.6.diff matrixise, 2016-10-21 21:38 review
Pull Requests
URL Status Linked Edit
PR 1705 merged matrixise, 2017-05-21 21:15
PR 9936 merged miss-islington, 2018-10-17 23:06
PR 9937 merged matrixise, 2018-10-17 23:27
PR 9938 closed matrixise, 2018-10-18 00:25
PR 10657 merged vstinner, 2018-11-22 12:58
PR 10658 merged miss-islington, 2018-11-22 14:03
PR 10659 merged miss-islington, 2018-11-22 14:04
Messages (32)
msg246878 - (view) Author: Eric O. LEBIGOT (lebigot) Date: 2015-07-18 02:59
On OS X, the Homebrew and MacPorts versions of Python 3.4.3 raise an exception when writing a 4 GB bytearray:

>>> open('/dev/null', 'wb').write(bytearray(2**31-1))
2147483647

>>> open('/dev/null', 'wb').write(bytearray(2**31))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument

This has an impact on pickle, in particular (http://stackoverflow.com/questions/31468117/python-3-can-pickle-handle-byte-objects-larger-than-4gb).
msg246879 - (view) Author: Eric O. LEBIGOT (lebigot) Date: 2015-07-18 03:02
PS: I should have written "2 GB" bytearray (so this looks like a signed 32 bit integer issue).
msg246979 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-07-20 12:18
This is likely a platform bug, it fails with os.write as well.  Interestingly enough file.write works fine on Python 2.7 (which uses stdio), that appearently works around this kernel misfeature.

A possible partial workaround is recognise this error in the implementation of os.write and then perform a partial write. Problem is: while write(2) is documented as possibly writing less data than expected most users writing to normal files (as opposed to sockets) probably don’t expect that behavior. On the other hand, os.write already limits writes to INT_MAX on Windows (see _Py_write in Python/fileutils.c)

Because of this I’m in favour of adding a simular workaround on OSX (and can provide a patch).

BTW. the manpage for write says that writev(2) might fail with EINVAL:

     [EINVAL]           The sum of the iov_len values in the iov array over-
                        flows a 32-bit integer.

I wouldn’t be surprised if write(2) is implemented using writev(2) and that this explains the problem.

> On 18 Jul 2015, at 06:05, Serhiy Storchaka <report@bugs.python.org> wrote:
> 
> 
> Changes by Serhiy Storchaka <storchaka@gmail.com>:
> 
> 
> ----------
> components: +Extension Modules, IO -Interpreter Core
> nosy: +haypo, ned.deily, ronaldoussoren
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue24658>
> _______________________________________
msg246983 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-07-20 12:50
The attached patch is a first stab at a workaround. It will unconditionally limit the write size in os.write to INT_MAX on OSX.

I haven't tested yet if this actually fixes the problem mentioned on stack overflow.
msg246985 - (view) Author: Eric O. LEBIGOT (lebigot) Date: 2015-07-20 12:57
Thank you for looking into this, Ronald.

What does your patch do, exactly? does it only limit the returned byte count, or does it really limit the size of the data written by truncating it?

In any case, it would be very useful to have a warning from the Python interpreter. If the data is truncated, I would even prefer an explicit exception (e.g. "data too big for this platform (>= 2 GB)"), along with an explicit mention of it in the documentation. What do you think?
msg246987 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-07-20 13:05
The patch limits os.write to writing at most INT_MAX bytes on OSX. Buffered I/O using open("/some/file", "wb") should still write all data (at least according to the limited tests I've done so far).

The same limitation is already present on Windows.

And as I wrote before: os.write may accoding to the manpage for write(2) already write less bytes than requested.

I'm -1 on using an explicit exception or printing a warning about this.
msg246993 - (view) Author: Eric O. LEBIGOT (lebigot) Date: 2015-07-20 13:33
I see, thanks.

This sounds good to me too: no need for a warning or exception, indeed, since file.write() should work and the behavior of os.write() is documented.
msg246994 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-07-20 13:40
The Windows limit to INT_MAX is one many functions:

* os.write()
* io.FileIO.write()
* hum, maybe other, I don't remember

In the default branch, there is now _Py_write(), so only one place should be fixed.

See the issue #11395 which fixed the bug on Windows.

If it's a bug, it should be fixed on Python 2.7, 3.4, 3.5 and default branches.
msg246999 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-07-20 16:25
The patch I attached earlier is for the default branch. More work is needed for the other active branches.
msg247007 - (view) Author: Mali Akmanalp (Mali Akmanalp) Date: 2015-07-20 22:41
I don't know how helpful it is at this point, but the issue happens while reading also.

Here's some related discussion in the numpy tracker:

https://github.com/numpy/numpy/issues/3858 (The claim was that OSX Mavericks fixed this issue, it didn't, and there is an Apple bug ID in there somewhere, plus there is a link to a patch the torch folks used)

and also in pandas: https://github.com/pydata/pandas/issues/10641

I'd be happy to try to test patches out.
msg247122 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-07-22 14:45
Indeed, read(2) has the same problem. I just tested this with a small C program. 

I'll rework the patch for this, and will work on patches for 3.4/3.5 and 2.7 as well.
msg256882 - (view) Author: Ian Carroll (Ian Carroll) Date: 2015-12-22 23:42
Write still fails on 3.5.1 and OS X 10.11.2. I'm no dev, so can someone explain how to use the patch while it's under review?
msg272030 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2016-08-05 13:57
Here is my patch 3.6, I am going to provide the patch for 3.5
msg272044 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2016-08-05 17:25
Sorry, I was busy with a task but here is my patch for 3.5, in fact, it's just the same for 3.6
msg278672 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2016-10-14 22:28
ping
msg278724 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2016-10-15 12:54
Ned Deily, I added you because you are in the expert for the OSX platform.
msg279132 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2016-10-21 14:25
Victor, could you check the new patch ?
msg279159 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2016-10-21 21:38
upload a new version
msg294113 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2017-05-21 21:16
Hello....

I just updated this ticket with a PR on Github.
msg294122 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-22 05:35
I see that we have other clamps on Windows using INT_MAX:

* sock_setsockopt()
* sock_sendto_impl()

Are these functions ok on macOS? If not, a new issue should be opened ;-)
msg294160 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2017-05-22 16:24
1. in the case of Windows, maybe we could open a new issue because this fix is only for MacOS

2. the issue was only for the files and not the sockets

what do you suggest ?
msg294195 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-22 22:16
I don't say that something is broken. Just that it would be nice if someone
could test socket methods.

On Windows, the bug was obvious: the function takes a C int...
msg327912 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2018-10-17 19:30
Hi all,

Could you test the PR with Windows? I don't have a Windows computer.

Thank you,

Stéphane
msg327916 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-17 23:05
New changeset 74a8b6ea7e0a8508b13a1c75ec9b91febd8b5557 by Victor Stinner (Stéphane Wirtel) in branch 'master':
bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)
https://github.com/python/cpython/commit/74a8b6ea7e0a8508b13a1c75ec9b91febd8b5557
msg327918 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-17 23:52
New changeset a5ebc205beea2bf1501e4ac33ed6e81732dd0604 by Victor Stinner (Stéphane Wirtel) in branch '3.6':
[3.6] bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) (GH-9937)
https://github.com/python/cpython/commit/a5ebc205beea2bf1501e4ac33ed6e81732dd0604
msg327940 - (view) Author: miss-islington (miss-islington) Date: 2018-10-18 06:58
New changeset 178d1c07778553bf66e09fe0bb13796be3fb9abf by Miss Islington (bot) in branch '3.7':
bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)
https://github.com/python/cpython/commit/178d1c07778553bf66e09fe0bb13796be3fb9abf
msg330259 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-22 14:03
New changeset 9a0d7a7648547ffb77144bf2480155f6d7940dea by Victor Stinner in branch 'master':
bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
https://github.com/python/cpython/commit/9a0d7a7648547ffb77144bf2480155f6d7940dea
msg330260 - (view) Author: miss-islington (miss-islington) Date: 2018-11-22 14:17
New changeset 18f3327d9a99163a658697465eb00c31f86535eb by Miss Islington (bot) in branch '3.7':
bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
https://github.com/python/cpython/commit/18f3327d9a99163a658697465eb00c31f86535eb
msg330262 - (view) Author: miss-islington (miss-islington) Date: 2018-11-22 14:25
New changeset 0c15e508baec7e542933db2b31ea950a646cd968 by Miss Islington (bot) in branch '3.6':
bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
https://github.com/python/cpython/commit/0c15e508baec7e542933db2b31ea950a646cd968
msg335566 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-02-14 21:18
Nosying myself since I just landed here based on an internal $work bug report.  We're seeing it with reads.  I'll try to set aside some work time to review the PRs.
msg335569 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2019-02-14 21:40
Hi @barry

normally this issue is fixed for 3.x but I need to finish my PR for 2.7. 

I think to fix for 2.7 in the next weeks.
msg360264 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2020-01-19 18:17
Since 3.x is fixed and 2.7 has reached EOL, I'm closing the issue.  Thanks for getting it fixed in 3.x, Stephane and Victor!
History
Date User Action Args
2022-04-11 14:58:19adminsetgithub: 68846
2020-01-19 18:17:57zach.waresetkeywords: - needs review
2020-01-19 18:17:44zach.waresetstatus: open -> closed
versions: - Python 2.7
messages: + msg360264

resolution: fixed
stage: patch review -> resolved
2019-02-16 07:33:22ned.deilysetassignee: ned.deily ->
2019-02-14 21:40:24matrixisesetmessages: + msg335569
2019-02-14 21:18:55barrysettitle: open().write() fails on 2 GB+ data (OS X) -> open().write() and .read() fails on 2 GB+ data (OS X)
2019-02-14 21:18:18barrysetnosy: + barry
messages: + msg335566
2018-11-22 14:25:28miss-islingtonsetmessages: + msg330262
2018-11-22 14:17:37miss-islingtonsetmessages: + msg330260
2018-11-22 14:04:07miss-islingtonsetpull_requests: + pull_request9912
2018-11-22 14:03:57miss-islingtonsetpull_requests: + pull_request9911
2018-11-22 14:03:45vstinnersetmessages: + msg330259
2018-11-22 12:58:56vstinnersetpull_requests: + pull_request9910
2018-10-18 06:58:44miss-islingtonsetnosy: + miss-islington
messages: + msg327940
2018-10-18 00:25:45matrixisesetpull_requests: + pull_request9289
2018-10-17 23:52:27vstinnersetmessages: + msg327918
2018-10-17 23:27:19matrixisesetpull_requests: + pull_request9287
2018-10-17 23:09:15vstinnersetversions: + Python 2.7, Python 3.7, Python 3.8, - Python 3.5
2018-10-17 23:06:01miss-islingtonsetpull_requests: + pull_request9286
2018-10-17 23:05:09vstinnersetmessages: + msg327916
2018-10-17 19:30:39matrixisesetmessages: + msg327912
2017-05-22 22:16:22vstinnersetmessages: + msg294195
2017-05-22 16:24:26matrixisesetmessages: + msg294160
2017-05-22 05:39:39zach.waresetnosy: + zach.ware
2017-05-22 05:35:38vstinnersetmessages: + msg294122
2017-05-21 21:16:49matrixisesetmessages: + msg294113
2017-05-21 21:15:26matrixisesetpull_requests: + pull_request1798
2016-11-06 23:27:24Harry Lisetnosy: + Harry Li
2016-10-21 21:38:27matrixisesetfiles: + issue24658-3-3.6.diff

messages: + msg279159
2016-10-21 14:25:03matrixisesetfiles: + issue24658-2-3.6.diff

messages: + msg279132
2016-10-15 12:54:08matrixisesetassignee: ned.deily
messages: + msg278724
2016-10-14 22:28:06matrixisesetmessages: + msg278672
2016-08-05 17:28:43matrixisesetfiles: + issue24658-3.5.diff
2016-08-05 17:28:33matrixisesetfiles: - issue24658-3.5.diff
2016-08-05 17:25:07matrixisesetfiles: + issue24658-3.5.diff

messages: + msg272044
2016-08-05 13:57:08matrixisesetfiles: + issue24658-3.6.diff
nosy: + matrixise
messages: + msg272030

2016-08-04 20:48:40zach.waresetversions: + Python 3.6, - Python 3.4
2015-12-22 23:42:26Ian Carrollsetnosy: + Ian Carroll
messages: + msg256882
2015-07-22 14:45:16ronaldoussorensetmessages: + msg247122
2015-07-20 22:41:41Mali Akmanalpsetnosy: + Mali Akmanalp
messages: + msg247007
2015-07-20 16:25:52ronaldoussorensetmessages: + msg246999
2015-07-20 13:40:42vstinnersetmessages: + msg246994
2015-07-20 13:33:35lebigotsetmessages: + msg246993
2015-07-20 13:05:20ronaldoussorensetmessages: + msg246987
2015-07-20 12:57:44lebigotsetmessages: + msg246985
2015-07-20 12:50:28ronaldoussorensetkeywords: + patch, needs review
files: + issue24658.txt
messages: + msg246983

stage: patch review
2015-07-20 12:19:00ronaldoussorensetmessages: + msg246979
2015-07-18 04:05:12serhiy.storchakasetnosy: + ronaldoussoren, vstinner, ned.deily
components: + Extension Modules, IO, - Interpreter Core
2015-07-18 03:02:19lebigotsetmessages: + msg246879
title: open().write() fails on 4 GB+ data (OS X) -> open().write() fails on 2 GB+ data (OS X)
2015-07-18 02:59:28lebigotcreate