classification
Title: nntplib file write failure causes exception from QUIT command
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: jwilk, martin.panter, pitrou, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-09-07 03:40 by martin.panter, last changed 2015-05-25 07:28 by martin.panter.

Files
File name Uploaded Description Edit
fail-close.patch martin.panter, 2014-12-19 04:22 review
getlongresp-loop.patch martin.panter, 2015-05-25 07:28 _getlongresp() loop only review
Messages (9)
msg226526 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-09-07 03:40
The following code triggers an NNTPProtocolError, as long as the body is large enough to cause an intermediate flush of the output file. The reason I think is that the body() method aborts in the middle of reading the BODY response, and when the context manager exits, a QUIT command is attempted, which continues to read the BODY response.

>>> with NNTP("localhost") as nntp:
...     nntp.body("<example>", file="/dev/full")
... 
Traceback (most recent call last):
  File "/usr/lib/python3.4/nntplib.py", line 491, in _getlongresp
    file.write(line)
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python3.4/nntplib.py", line 757, in body
    return self._artcmd(cmd, file)
  File "/usr/lib/python3.4/nntplib.py", line 727, in _artcmd
    resp, lines = self._longcmd(line, file)
  File "/usr/lib/python3.4/nntplib.py", line 518, in _longcmd
    return self._getlongresp(file)
  File "/usr/lib/python3.4/nntplib.py", line 504, in _getlongresp
    openedFile.close()
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python3.4/nntplib.py", line 367, in __exit__
    self.quit()
  File "/usr/lib/python3.4/nntplib.py", line 936, in quit
    resp = self._shortcmd('QUIT')
  File "/usr/lib/python3.4/nntplib.py", line 512, in _shortcmd
    return self._getresp()
  File "/usr/lib/python3.4/nntplib.py", line 459, in _getresp
    raise NNTPProtocolError(resp)
nntplib.NNTPProtocolError: <line of data from BODY command>

It is hard to work around because the context manager and quit() methods seem to be the only public interfaces for closing the connection, and they both try to do a QUIT command first. I am thinking of something equivalent to this for a workaround, however it is bit hacky and may not be reliable in all cases:

nntp = NNTP("localhost")
abort = False
try:
    ...
    try:
        nntp.body("<example>", file="/dev/full")
    except (NNTPTemporaryError, NNTPPermanentError):
        raise  # NNTP connection still intact
    except:
        abort = True
        raise
    ...
finally:
    try:
        nntp.quit()
    except NNTPError:
        # Connection cleaned up despite exception
        if not abort:
            raise

Perhaps the “nntplib” module could abort the connection itself if any command does not complete according to the protocol. Or at the very least, provide an API to manually abort the connection without poking at the internals.
msg232924 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-12-19 04:22
Here is a patch with a fix and a test
msg238882 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-22 07:59
But the same issue exists with regular files.

>>> with open('/dev/full', 'wb') as f:
...     f.write(b'x'*10000)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
OSError: [Errno 28] No space left on device

And using Python implementation for detailed traceback:

>>> import _pyio
>>> with _pyio.open('/dev/full', 'wb') as f:
...     f.write(b'x'*10000)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/serhiy/py/cpython/Lib/_pyio.py", line 1184, in write
    self._flush_unlocked()
  File "/home/serhiy/py/cpython/Lib/_pyio.py", line 1211, in _flush_unlocked
    n = self.raw.write(self._write_buf)
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/serhiy/py/cpython/Lib/_pyio.py", line 451, in __exit__
    self.close()
  File "/home/serhiy/py/cpython/Lib/_pyio.py", line 767, in close
    self.flush()
  File "/home/serhiy/py/cpython/Lib/_pyio.py", line 1204, in flush
    self._flush_unlocked()
  File "/home/serhiy/py/cpython/Lib/_pyio.py", line 1211, in _flush_unlocked
    n = self.raw.write(self._write_buf)
OSError: [Errno 28] No space left on device


What behavior you expect?
msg238896 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-22 13:29
Well with a regular BufferedWriter, the chained exception is the same type as the original exception, so it does not really matter. I was just using the out-of-space exception as an easy way to get the body() method to abort in the middle of a command.

The problem with the NNTP library is that the NNTP class assumes that it can send a QUIT command when it shouldn’t be. The result can be a giant error message full of apparent garbage data, rather like in Issue 21468.

The minimum behaviour that I expect is that when the context manager exits, it should not cause a double exception with a different error type. My patch closes the connection as soon as an unrecoverable error happens, so __exit__() will not send a QUIT command and receive an invalid response.
msg238910 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-22 14:47
I doubt that it is good idea to close connection on *any* error. For example in _getresp() resp.decode() can raise an exception, but this failure isn't related to the ability of sending QUIT and other commands.

Sample error handling in your first message looks more reliable. Set special flags after failing command (but only if error is related to the connection), and ignore QUIT errors if it is set. After successful command this flag should be reset.
msg238921 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-03-22 17:11
It looks to be clearly the case that there needs to be some error handling code wrapped around the file write in _getlongresp.  Your patch doesn't do just that, however, so it seems like there may be other errors you are also worried about?  Can you explain their nature?  They should be dealt with separately from the "errors writing to file" issue, though, which is a special case.  And that case should be dealt with by draining the response before returning.
msg239874 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-04-02 04:30
Serhiy, regarding _getresp() calling resp.decode(): You are probably right that a failure decoding the response line is recoverable in some circumstances, such a GROUP command. But in other cases, say an ARTICLE command, the response won’t have been fully drained, so the connection is no longer usable. I guess I could shift the error handling to a higher level if you think it is necessary.

Regarding the workaround in my original post: It is not a good solution, because it is still trying to send a QUIT request and wait for a response in the error handler. If you are trying to interrupt a hanging network connection for example, you have to hit Ctrl+C twice, or wait a long time for the QUIT command to get some invalid data and raise its confusing protocol error.

David: The error handling inside _getlongresp() is the main place that concerned me. But as I made the changes there I identified the other places I thought could raise unexpected exceptions (network errors) or block (therefore a KeyboardInterrupt is likely).

Draining the response might be sensible for handling an error writing to the file in some cases. My /dev/full example was actually meant to be a simple example of more complicated exceptions, such as KeyboardInterrupt, network timeouts, or if the programmer wants to peek at the start of a large message and then abort the download. So it seems my simplified test case was misleading, because in these other cases, draining the response isn’t so appropriate.
msg239914 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-04-02 13:46
That's fine, but you need to look at each case individually, and not try to deal with them as if they were all the same.  This is because you want the correct error to percolate up.  For example, in smtplib we have a case where the server may respond to a command by closing the connection (which is technically a violation of the RFC).  It used to be that smtplib would raise a connection error at that point because it would try to send an RST command to the server over the closed connection.  The original error message reported by the server was lost.  The solution was to rewrite the error handling so that we had an internal send_RST that was different from the one that would be used by the application, and that internal RST send was wrapped in a try/except ignoring the connection error.   That way the command the application called returned the received response, and then the application got the connection closed error the *next* time it tried to use the connection.

nntplib presumably requires as a similar strategy if the network connection terminates unexpectedly: just ignoring the connection closed error when the quit is sent.  The code already closes the connection after the QUIT attempt, so that should leave things in the correct state for all other errors.

Other network errors may benefit from additional handling, but I don't know since I don't have examples here to think about :)

As for the keyboard interrupt problem, it looks like what needs to happen is a custom response handler for the internal QUIT, that will recognize when it is not getting "its" response back and just return.

This is more similar to the smptlib problem than I originally thought: it seems like the basic problem is that that internal QUIT needs custom handling that is different from the QUIT issued at the application level.
msg244016 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-25 07:28
For the record, the SMTP scenario was Issue 17498, where code that is about to raise an exception attempts an RSET command that could also fail.

I do think each change in my patch is essentially the same case: restoring the invariant expected by the __exit__() method, that either the protocol state allows QUIT, or there is no connection. But I agree it may help divide this into smaller pieces. I am uploading getlongresp-loop.patch, which fixes just the receiving loop in _getlongresp().

I have also added some logic to avoid errors raised when flushing and closing the socket do not mask the original exception, which I understand David was concerned about. I guess it is possible (though unlikely) that an EPIPE or ECONNRESET flushing the send buffer could mask a KeyboardInterrupt raised inside the loop.

Sorry I took a while to follow up on this, but please have a look and let me know if this simplified patch makes any sense.
History
Date User Action Args
2015-05-25 07:28:45martin.pantersetfiles: + getlongresp-loop.patch

messages: + msg244016
2015-04-02 13:46:50r.david.murraysetmessages: + msg239914
2015-04-02 04:30:01martin.pantersetmessages: + msg239874
2015-03-22 17:11:53r.david.murraysetnosy: + r.david.murray
messages: + msg238921
2015-03-22 14:47:50serhiy.storchakasetmessages: + msg238910
2015-03-22 13:29:09martin.pantersetmessages: + msg238896
2015-03-22 07:59:32serhiy.storchakasetmessages: + msg238882
2015-02-28 13:31:34serhiy.storchakasetassignee: serhiy.storchaka

nosy: + serhiy.storchaka
2014-12-23 20:24:27jwilksetnosy: + jwilk
2014-12-20 16:48:24serhiy.storchakasetnosy: + pitrou
stage: patch review
type: behavior

versions: + Python 2.7, Python 3.5
2014-12-19 04:23:02martin.pantersetfiles: + fail-close.patch
keywords: + patch
messages: + msg232924
2014-09-07 03:40:19martin.pantercreate