classification
Title: `OverflowError: signed integer is greater than maximum` in ssl.py for files larger than 2GB
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: methane Nosy List: amacd31, christian.heimes, jakirkham, jan-xyz, matan1008, methane, ronaldoussoren
Priority: normal Keywords:

Created on 2021-01-07 08:03 by amacd31, last changed 2021-06-24 20:39 by christian.heimes.

Messages (14)
msg384565 - (view) Author: Andrew MacDonald (amacd31) Date: 2021-01-07 08:03
When attempting to read a large file (> 2GB) over HTTPS the read fails with "OverflowError: signed integer is greater than maximum".

This occurs with Python >=3.8 and I've been able to reproduce the problem with the below snippet of code on Linux, Mac OS X, and Windows (the remote file can be any HTTPS hosted file larger than 2GB, e.g. an empty file generated with `dd if=/dev/zero of=2g.img bs=1 count=0 seek=2G` will also do the job.).

```
import http.client
connection = http.client.HTTPSConnection("mirror.aarnet.edu.au")
connection.request("GET", "/pub/centos/8/isos/x86_64/CentOS-8.3.2011-x86_64-dvd1.iso")
response = connection.getresponse()
data = response.read()
```

Doing a git bisect it looks like this is the result of a change in commit d6bf6f2d0c83f0c64ce86e7b9340278627798090 (https://github.com/python/cpython/commit/d6bf6f2d0c83f0c64ce86e7b9340278627798090). Looking over the associated issue and commit message it seems like this was not an intended outcome for the change.
msg384566 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-01-07 08:15
I cannot lift the overflow restriction until we drop support for OpenSSL 1.0.2. The function SSL_write() and SSL_read() are limited to signed 32bit int. OpenSSL 1.1.1 has new SSL_write_ex() and SSL_read_ex() functions that support size_t. Even size_t limits the maximum value to unsigned 32bit (~4GB) on 32bit systems.
msg384649 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2021-01-08 09:52
The API documentation already implies that write might not write the entire buffer because it returns the number of bytes actually written (just like os.write).  

A possible workaround on the SSL layer is hence to clamp the amount of bytes to write to MAX_INT (or later MAX_SSIZE_T) bytes. 

That said, this does require checking that users of the SSL layer write method in the stdib actually check for the number of bytes written, otherwise we'd exchange the exception to a silent error.
msg384658 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-01-08 13:05
That's a good idea, Ronald! socket.c:sock_send_impl() already clamps the input length on Windows:

#ifdef MS_WINDOWS
    if (ctx->len > INT_MAX)
        ctx->len = INT_MAX;
    ctx->result = send(s->sock_fd, ctx->buf, (int)ctx->len, ctx->flags);
#else
    ctx->result = send(s->sock_fd, ctx->buf, ctx->len, ctx->flags);
#endif

I could implement a similar logic for SSLSocket. Applications have to check the return value of send() any way or use sendall(). The socket.send() method / send(2) libc function may also write less bytes.
msg391359 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-04-19 04:04
Python 3.10 will use SSL_write_ex() and SSL_read_ex(), which support > 2 GB data.
msg394947 - (view) Author: (jakirkham) Date: 2021-06-02 21:35
Would it be possible to check for these newer OpenSSL symbols during the builds of Python 3.8 & 3.9 (using them when available and otherwise falling back to the older API otherwise)? This would allow people to build Python 3.8 & 3.9 with the newer OpenSSL benefiting from the fix

That said, not sure if there are other obstacles to using OpenSSL 1.1.1 with Python 3.8 & 3.9
msg396454 - (view) Author: Matan Perelman (matan1008) * Date: 2021-06-24 06:51
A bit of extra context to save clicking through: 
the PR which introduced the regression: https://github.com/python/cpython/pull/12698

and the bug: https://bugs.python.org/issue36050

Maybe some people have context about why we couldn't just roll back that PR and increase `MAXAMOUNT` to something > 1 MiB and < 2 GiB?
msg396458 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-06-24 08:12
Because this is SSL issue. HTTPS is not the only user of SSL.
So we should try to fix SSL issue before reverting GH-12698.
msg396459 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-06-24 08:28
The ssl module supports sending or receiving buffers of more than 2GB in Python 3.10. It cannot be reliable fixed in Python 3.9 and earlier.
msg396460 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-06-24 08:33
I see. But Python 3.8 is now security fix mode.
Let's revert the optimization in the 3.9 branch.
msg396461 - (view) Author: (jakirkham) Date: 2021-06-24 08:34
Right with this change ( https://github.com/python/cpython/pull/25468 ). Thanks for adding that Christian :)

I guess what I'm wondering is if in older Python versions we could do an `#ifdef` check to try and use `SSL_read_ex` & `SSL_write_ex` if the symbols are found at build time? This would allow package maintainers the option to build with a newer OpenSSL to fix this issue (even on older Pythons)
msg396463 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-06-24 08:54
No, it would be a backwards incompatible change and introduce inconsistent behavior. SSL_read_ex() is not available in LibreSSL, OpenSSL 1.0.2, and OpenSSL 1.1.0. Your code would work on CentOS 8, Debian 10, and Ubuntu 20.04, but break on CentOS 7, Debian 9, Ubuntu 18.04, and OpenBSD.
msg396509 - (view) Author: (jakirkham) Date: 2021-06-24 20:15
Not following. Why would it break? Presumably once one builds Python for a particular OS they keep there (like system package managers for example).

Or alternatively they build on a much older OS and then deploy to newer ones. The latter case is what we do in conda-forge (Anaconda does the same thing). We also build our own OpenSSL so have control of that knob too. Though I've seen application developers do similar things.

Do you have an example where this wouldn't work? Maybe that would help as we can define a better solution there.
msg396510 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-06-24 20:39
Right now sending and receiving buffers >2 GB over TLS consistently fails on all platforms with Python 3.9. A backport of bf624032c12c763b72594e5f41ff8af309b85264 to Python 3.9 would make the behavior inconsistent. Your code would work on your laptop with a recent Fedora or Ubuntu version that has OpenSSL 1.1.1. But it would suddenly fail on your production system with Debian 10, because it has OpenSSL 1.1.0 and Python would have to fall back to SSL_read().

In my experience these kinds of inconsistencies cause headaches and frustrations. A consistent error gives you a chance to notice a problem early and to implement a workaround.
History
Date User Action Args
2021-06-24 20:39:31christian.heimessetmessages: + msg396510
versions: - Python 3.8
2021-06-24 20:15:36jakirkhamsetmessages: + msg396509
2021-06-24 08:54:07christian.heimessetmessages: + msg396463
2021-06-24 08:34:13jakirkhamsetmessages: + msg396461
2021-06-24 08:33:00methanesetmessages: + msg396460
2021-06-24 08:28:40christian.heimessetmessages: + msg396459
2021-06-24 08:12:08methanesetmessages: + msg396458
2021-06-24 06:51:09matan1008setnosy: + matan1008
messages: + msg396454
2021-06-02 21:35:28jakirkhamsetnosy: + jakirkham
messages: + msg394947
2021-04-19 04:04:53christian.heimessetmessages: + msg391359
versions: - Python 3.10
2021-02-05 11:01:02jan-xyzsetnosy: + jan-xyz
2021-01-08 13:05:25christian.heimessetmessages: + msg384658
2021-01-08 09:52:17ronaldoussorensetnosy: + ronaldoussoren
messages: + msg384649
2021-01-07 08:15:02christian.heimessetmessages: + msg384566
2021-01-07 08:07:32christian.heimessetassignee: christian.heimes -> methane

components: + Library (Lib), - SSL
nosy: + methane
2021-01-07 08:03:09amacd31create