This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: StreamReader does not return reamaing and ready data buffer before raise the Exeption
Type: Stage: patch review
Components: asyncio Versions: Python 3.7, Python 3.6, Python 3.4, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Dima.Tisnek, gvanrossum, pfreixes, yselivanov
Priority: normal Keywords: patch

Created on 2017-07-05 20:55 by pfreixes, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL Status Linked Edit
PR 2593 closed pfreixes, 2017-07-05 20:59
PR 7473 open pfreixes, 2018-06-07 09:15
Messages (20)
msg297781 - (view) Author: pfreixes (pfreixes) * Date: 2017-07-05 20:55
Current implementation of StreamReader does not take care of the status of the buffer, once an exception has been set via `set_exception` any call to the read methods won't be able to get the missing data still pending to be processed.

From the point of view of the developer, if there is no scheduled task for waiting data into the Streamreader between a network data gets into the buffer socket and a closing connection by the other peer arrives, the developer won't be able to gather the previous data.
msg297783 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2017-07-05 21:19
Do you have a specific use case where it's important to access the remaining data? ISTM that this is happening when the connection is lost and then it shouldn't matter how much of the data you read or not -- the connection was broken before the remote side closed it, and you should not trust the data.
msg297784 - (view) Author: pfreixes (pfreixes) * Date: 2017-07-05 21:27
Yeps,

I have an example related to Redis [1], the server warns that the
connection will be closed because you reached out the maximum number
of connections.

But IMHO the example it's just that an example, if the connection at
some point was not broken/closed/... and some data came into the
buffer, this data should be put available for the user.

[1] https://github.com/antirez/redis/blob/09dd7b5ff02de4a311032939c27fd6fc62fbd4a3/src/networking.c#L615

On Wed, Jul 5, 2017 at 11:19 PM, Guido van Rossum
<report@bugs.python.org> wrote:
>
> Guido van Rossum added the comment:
>
> Do you have a specific use case where it's important to access the remaining data? ISTM that this is happening when the connection is lost and then it shouldn't matter how much of the data you read or not -- the connection was broken before the remote side closed it, and you should not trust the data.
>
> ----------
> nosy: +gvanrossum
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue30861>
> _______________________________________
msg297785 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-07-05 21:29
> But IMHO the example it's just that an example, if the connection at
some point was not broken/closed/... and some data came into the
buffer, this data should be put available for the user.

In my experience, you never want to recover any data from broken connections.  In cases where it's safe, StreamReader attaches the buffered data to the exception (see IncompleteReadError).
msg297787 - (view) Author: pfreixes (pfreixes) * Date: 2017-07-05 22:00
One of the disadvantages with the Exception is that you are relying on
how the events are being succeeded, and this is something that you
can't predict. With just a different delay between the data and the
RST packet and you might get the data in two different ways, either
via a regular read or via one adhoc Exception.

This might be confusing for the developer and will end up with the
second thing that IMHO is also important, the code that has to be
produced by the developer gets complicated.

On Wed, Jul 5, 2017 at 11:29 PM, Yury Selivanov <report@bugs.python.org> wrote:
>
> Yury Selivanov added the comment:
>
>> But IMHO the example it's just that an example, if the connection at
> some point was not broken/closed/... and some data came into the
> buffer, this data should be put available for the user.
>
> In my experience, you never want to recover any data from broken connections.  In cases where it's safe, StreamReader attaches the buffered data to the exception (see IncompleteReadError).
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue30861>
> _______________________________________
msg297789 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2017-07-05 23:05
The Redis example doesn't sound valid to me, as (IIUC) it closes the connection immediately?
msg297808 - (view) Author: Dima Tisnek (Dima.Tisnek) * Date: 2017-07-06 06:40
My 2c:

Pau's concern seems valid, in a sense that stream should work like TCP.
That's what most users would assume -- read out data until the end, only then you can see what the actual error was (socket closed, or timeout or hard error)

However, I suspect the devil may be in details -- the patch appears to assume that exception is only set by the underlying source of data (in which case user wants to read out everything up to the exception).

I can imagine a couple of use-case where an exception is set "out of band":
* to terminate bad reader (e.g. against Slowris attack)
* there's a matching StreamWriter (e.g. shared socket)

P.S.
If I'm wrong here, then bug report and patch must be clearer :)
msg297842 - (view) Author: pfreixes (pfreixes) * Date: 2017-07-06 21:04
As was said, the assumption here is the data that came to the buffer must be available.

For example, the next snippet shows a Redis client that expects the data message plus the RST packet, where the redis-server was configured to accept max N connections, the connection used by this snippet was the N+1

import socket
import time

s = socket.socket(
    socket.AF_INET, socket.SOCK_STREAM)
s.connect(("localhost", 6379))

# give enought time to get the
# data plus the RST Package
time.sleep(1)

# read the data
print(s.recv(100))

It works like a charm.

IMHO this is not a matter of RST vs close graceful, its a matter of giving always the chance to the developer to read the data that is still remaining and ready into the buffer.

The current implementation is in somehow not predictable. Different network latencies with the same scenario can produce different outputs and the same with how the read of the Streamer is handled by the developer.
msg297846 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2017-07-06 21:52
We seem to have a failure to communicate. I'm sure your example code "works", but you're not showing what's in the data it receives that is important for the app to read (your example just prints it).

And surely your app should be robust even if the connection is closed *without* ever receiving that data (since it may be lost in transit).

If you need an application-level error your design should include it in the protocol, not use the connection closing.
msg297866 - (view) Author: pfreixes (pfreixes) * Date: 2017-07-07 07:10
I think that we are not in the same page, the following snippet is a nodejs client that emulates the same:

var net = require('net');
var sleep = require('sleep');

var client = new net.Socket();
client.connect(6379, '127.0.0.1', function() {
    console.log('Connected');
    sleep.sleep(10); // wait for data and RST package
});

client.on('data', function(data) {
    console.log('Received: ' + data);
});

client.on('close', function() {
    console.log('Connection closed');
});

The client will be able to get the data, even taking into account that the RST package has been processed by the Operating System. Meanwhile, the same behavior with asyncio is unpredictable.
msg297873 - (view) Author: Dima Tisnek (Dima.Tisnek) * Date: 2017-07-07 09:37
It seems Guido sets a higher bar on the proposed change.

@pfreixes, if you can show that this change is needed to implement "TCP half-closed" state (i.e. when remote calls shutdown(SHUT_WR) after it's done sending data but continues to recv(), then local is expected to read out the data, and then confirm reception and issue it's own shutdown(SHUT_WR) so that remote gets the acknowledgement), then there would be no question.
msg297890 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2017-07-07 15:06
But half closed state is already supported.

On Jul 7, 2017 2:37 AM, "Dima Tisnek" <report@bugs.python.org> wrote:

>
> Dima Tisnek added the comment:
>
> It seems Guido sets a higher bar on the proposed change.
>
> @pfreixes, if you can show that this change is needed to implement "TCP
> half-closed" state (i.e. when remote calls shutdown(SHUT_WR) after it's
> done sending data but continues to recv(), then local is expected to read
> out the data, and then confirm reception and issue it's own
> shutdown(SHUT_WR) so that remote gets the acknowledgement), then there
> would be no question.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue30861>
> _______________________________________
>
msg297894 - (view) Author: pfreixes (pfreixes) * Date: 2017-07-07 15:59
I would like to focus the issue as was initially described, Im still
convinced that this is a buggy behaviour. As has been seen other systems
such as python sync or nodejs behaves as is expected.

This last one is IMHO something that can be skipped.

Im wondering how curio or trio will handle this scenario. My bet is on as
nodejs.

El 07/07/2017 17:06, "Guido van Rossum" <report@bugs.python.org> escribió:

>
> Guido van Rossum added the comment:
>
> But half closed state is already supported.
>
> On Jul 7, 2017 2:37 AM, "Dima Tisnek" <report@bugs.python.org> wrote:
>
> >
> > Dima Tisnek added the comment:
> >
> > It seems Guido sets a higher bar on the proposed change.
> >
> > @pfreixes, if you can show that this change is needed to implement "TCP
> > half-closed" state (i.e. when remote calls shutdown(SHUT_WR) after it's
> > done sending data but continues to recv(), then local is expected to read
> > out the data, and then confirm reception and issue it's own
> > shutdown(SHUT_WR) so that remote gets the acknowledgement), then there
> > would be no question.
> >
> > ----------
> >
> > _______________________________________
> > Python tracker <report@bugs.python.org>
> > <http://bugs.python.org/issue30861>
> > _______________________________________
> >
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue30861>
> _______________________________________
>
msg297898 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2017-07-07 17:01
IMO you haven't demonstrated a need, you are just complaining that you
don't like it.
msg297900 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-07-07 18:14
I looked at the PR and it looks good. The intent is also clear. But the idea of delaying exceptions in StreamReader read methods still feels finicky.

To go forward with this we need more examples than just nodejs. And it doesn't really matter what trio and curio do, as they aren't mainstream. We need a (set of) *concrete* examples where this is needed.

An analysis of how this is approached in another mainstream libraries/ecosystems like C#, Twisted, etc would also help.
msg297907 - (view) Author: pfreixes (pfreixes) * Date: 2017-07-07 19:33
fair enough. I'm out with few chances to grab a  computer. I'll be back in
a week with more information.

El 07/07/2017 20:14, "Yury Selivanov" <report@bugs.python.org> escribió:

>
> Yury Selivanov added the comment:
>
> I looked at the PR and it looks good. The intent is also clear. But the
> idea of delaying exceptions in StreamReader read methods still feels
> finicky.
>
> To go forward with this we need more examples than just nodejs. And it
> doesn't really matter what trio and curio do, as they aren't mainstream. We
> need a (set of) *concrete* examples where this is needed.
>
> An analysis of how this is approached in another mainstream
> libraries/ecosystems like C#, Twisted, etc would also help.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue30861>
> _______________________________________
>
msg298532 - (view) Author: pfreixes (pfreixes) * Date: 2017-07-17 14:35
The following links point to three different implementations of the same scenario using a Twisted, Node and Python blocking clients that try to reproduce the same scenario.

- Twisted https://gist.github.com/pfreixes/0d8b24b98567e557d6059b3308aa07ca
- Node https://gist.github.com/pfreixes/b62c199e62ae09d1a5b9e13652c7a273
- Python blocking https://gist.github.com/pfreixes/7a85e43642782eb7e7d0669eadd0216a

All of them are able to give you the buffer when the RST package was already processed by the OS.

I'm still trying to get a Java client, but TBH I won't expect a different behavior.
msg298541 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2017-07-17 15:52
There's no need. You seem to have accidentally shown a use case for getting the buffered data -- it can contain an error message explaining why the connection was closed.
msg298580 - (view) Author: pfreixes (pfreixes) * Date: 2017-07-18 08:53
The Java client https://gist.github.com/pfreixes/13fedf2a589c260e6c7c64ae73653bb1

Works as is expected the buffer can be read till it gets empty, no matter when the RST was sent.
msg309261 - (view) Author: pfreixes (pfreixes) * Date: 2017-12-30 21:37
Yeps, any update on this bug and the fix proposed?
History
Date User Action Args
2022-04-11 14:58:48adminsetgithub: 75044
2018-06-07 09:15:04pfreixessetkeywords: + patch
stage: patch review
pull_requests: + pull_request7095
2017-12-30 21:37:51pfreixessetmessages: + msg309261
2017-07-18 08:53:55pfreixessetmessages: + msg298580
2017-07-17 15:52:23gvanrossumsetmessages: + msg298541
2017-07-17 14:35:55pfreixessetmessages: + msg298532
2017-07-07 19:33:45pfreixessetmessages: + msg297907
2017-07-07 18:14:58yselivanovsetmessages: + msg297900
2017-07-07 17:01:05gvanrossumsetmessages: + msg297898
2017-07-07 15:59:13pfreixessetmessages: + msg297894
2017-07-07 15:06:24gvanrossumsetmessages: + msg297890
2017-07-07 09:37:19Dima.Tisneksetmessages: + msg297873
2017-07-07 07:10:33pfreixessetmessages: + msg297866
2017-07-06 21:52:59gvanrossumsetmessages: + msg297846
2017-07-06 21:04:22pfreixessetmessages: + msg297842
2017-07-06 06:40:28Dima.Tisneksetnosy: + Dima.Tisnek
messages: + msg297808
2017-07-05 23:05:22gvanrossumsetmessages: + msg297789
2017-07-05 22:00:47pfreixessetmessages: + msg297787
2017-07-05 21:29:16yselivanovsetmessages: + msg297785
2017-07-05 21:27:10pfreixessetmessages: + msg297784
2017-07-05 21:19:21gvanrossumsetnosy: + gvanrossum
messages: + msg297783
2017-07-05 20:59:19pfreixessetpull_requests: + pull_request2664
2017-07-05 20:55:08pfreixescreate