classification
Title: Exception raised when an NNTP overview field is absent
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: jelie, pitrou, r.david.murray, rhettinger
Priority: normal Keywords: patch

Created on 2010-11-01 19:58 by jelie, last changed 2010-11-03 18:19 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
nntpover.patch pitrou, 2010-11-02 22:52
nntpover2.patch pitrou, 2010-11-02 23:19
Messages (12)
msg120158 - (view) Author: Julien ÉLIE (jelie) Date: 2010-11-01 19:58
Following the first example of the documentation:

import nntplib

s = nntplib.NNTP('news.trigofacile.com')
resp, count, first, last, name = s.group('fr.comp.lang.python')
print('Group', name, 'has', count, 'articles, range', first, 'to', last)
resp, overviews = s.over((last - 9, last))
for id, over in overviews:
    print(id, nntplib.decode_header(over['subject']))
s.quit()


An exception is raised:
"OVER/XOVER response doesn't include names of additional headers"


I believe the issue comes from the fact that the source code does not
handle the case described in Section 8.3.2 of RFC 3977:

   For all fields, the value is processed by first removing all CRLF
   pairs (that is, undoing any folding and removing the terminating
   CRLF) and then replacing each TAB with a single space.  If there is
   no such header in the article, no such metadata item, or no header or
   item stored in the database for that article, the corresponding field
   MUST be empty.

   Example of a successful retrieval of overview information for a range
   of articles:

      [C] GROUP misc.test
      [S] 211 1234 3000234 3002322 misc.test
      [C] OVER 3000234-3000240
      [S] 224 Overview information follows
      [S] 3000234|I am just a test article|"Demo User"
          <nobody@example.com>|6 Oct 1998 04:38:40 -0500|
          <45223423@example.com>|<45454@example.net>|1234|
          17|Xref: news.example.com misc.test:3000363
      [S] 3000235|Another test article|nobody@nowhere.to
          (Demo User)|6 Oct 1998 04:38:45 -0500|<45223425@to.to>||
          4818|37||Distribution: fi
      [S] 3000238|Re: I am just a test article|somebody@elsewhere.to|
          7 Oct 1998 11:38:40 +1200|<kfwer3v@elsewhere.to>|
          <45223423@to.to>|9234|51
      [S] .

   Note the missing "References" and Xref headers in the second line,
   the missing trailing fields in the first and last lines, and that
   there are only results for those articles that still exist.






Also please note that nntplib should also work in case the database is not consistent.  Some news servers might be broken and do not follow the MUST NOT...

   The LIST OVERVIEW.FMT command SHOULD list all the fields for which
   the database is consistent at that moment.  It MAY omit such fields
   (for example, if it is not known whether the database is consistent
   or inconsistent).  It MUST NOT include fields for which the database
   is inconsistent or that are not stored in the database.  Therefore,
   if a header appears in the LIST OVERVIEW.FMT output but not in the
   OVER output for a given article, that header does not appear in the
   article (similarly for metadata items).
msg120272 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-02 22:41
I am wondering how to return the corresponding information. Should the field be totally absent from the returned dictionary, should it map to the empty string, or should it map to None?
I'm leaning towards the latter (map to None), but perhaps the empty string is better?
msg120275 - (view) Author: Julien ÉLIE (jelie) Date: 2010-11-02 22:45
The empty string would mean the header exists, and is empty (though not RFC-compliant).
For instance:

"User-Agent: \r\n"

I believe None is better.
msg120278 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-02 22:52
Here is a patch for returning None on absent fields.
(works with trigofacile.com)
msg120282 - (view) Author: Julien ÉLIE (jelie) Date: 2010-11-02 23:05
OK, thanks.
By the way, why is the token stripped?
  token = token[len(h):].lstrip(" ")

"X-Header:   test  \r\n" in an header is kept in the overview as-is.
I do not see why "  test  " should not be the value returned.

Also, with:
  token = token or None

"X-Header: \r\n" becomes None if I understand how the source code works...  Yet, it is a real '', not None.
msg120283 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-02 23:08
> OK, thanks.
> By the way, why is the token stripped?
>   token = token[len(h):].lstrip(" ")
> 
> "X-Header:   test  \r\n" in an header is kept in the overview as-is.
> I do not see why "  test  " should not be the value returned.

It's a simple way of handling "Xref: foo" and returning "foo" rather
than " foo". If spaces are supposed to be significant I can just strip
the first one, though.

> Also, with:
>   token = token or None
> 
> "X-Header: \r\n" becomes None if I understand how the source code
> works...  Yet, it is a real '', not None.

Er, so you're disagreeing with your previous message? Or am I missing
something? :)
msg120286 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-02 23:18
Here is a patch trying to better handle whitespace. Would it be ok for you?
msg120287 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-02 23:19
Oops, sorry.
msg120299 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-11-03 02:00
My conclusion in working on the email package is that only the first space after the ':', if it exists, should be stripped.  That is, even though the RFC (for email) reads as if the space after the colon is part of the value, in practice it is part of the delimiter, but is optional (and almost always present, in email).  Whether additional leading spaces are significant depends on why they are there.  Since they are an unusual case, I would choose to preserve them on the theory that someone might care, and that someone who doesn't care can strip them.
msg120332 - (view) Author: Julien ÉLIE (jelie) Date: 2010-11-03 17:55
> Er, so you're disagreeing with your previous message?
> Or am I missing something? :)

I was saying that if an empty string is returned, then it means that the header exists and is empty.  An example was "User-Agent: \r\n".
And my remark "I believe None is better." concerned your initial question "Should the field be totally absent [...]" regarding how to deal with a header that does not exist.

Therefore, "User-Agent: \r\n" becomes a real '', not None.  None is only when the User-Agent: header field is absent from the headers.



> Here is a patch trying to better handle whitespace.
> Would it be ok for you?

Yes Antoine, thanks!
msg120333 - (view) Author: Julien ÉLIE (jelie) Date: 2010-11-03 18:01
> My conclusion in working on the email package is that only
> the first space after the ':', if it exists, should be stripped.
> That is, even though the RFC (for email) reads as if the space
> after the colon is part of the value, in practice it is part
> of the delimiter, but is optional (and almost always present,
> in email).

That is why the RFC (for netnews) explicitly mentions that the space after the colon is not part of the value.  See the grammar for OVER in RFC 3977:

  hdr-n-content = [(header-name ":" / metadata-name) SP hdr-content]

So yes, only the first space should be stripped.
msg120335 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-03 18:19
Ok, committed in r86139.
History
Date User Action Args
2010-11-03 18:19:27pitrousetstatus: open -> closed
resolution: fixed
messages: + msg120335

stage: resolved
2010-11-03 18:01:08jeliesetmessages: + msg120333
2010-11-03 17:55:54jeliesetmessages: + msg120332
2010-11-03 02:30:10rhettingersetmessages: - msg120300
2010-11-03 02:13:20rhettingersetnosy: + rhettinger
messages: + msg120300
2010-11-03 02:00:49r.david.murraysetnosy: + r.david.murray
messages: + msg120299
2010-11-02 23:19:31pitrousetfiles: + nntpover2.patch

messages: + msg120287
2010-11-02 23:19:20pitrousetfiles: - nntpover2.patch
2010-11-02 23:18:46pitrousetfiles: + nntpover2.patch

messages: + msg120286
2010-11-02 23:08:37pitrousetmessages: + msg120283
2010-11-02 23:05:08jeliesetmessages: + msg120282
2010-11-02 22:52:33pitrousetfiles: + nntpover.patch
keywords: + patch
messages: + msg120278
2010-11-02 22:45:10jeliesetmessages: + msg120275
2010-11-02 22:41:23pitrousetmessages: + msg120272
2010-11-01 20:02:43pitrousetnosy: + pitrou
2010-11-01 19:58:22jeliecreate