Author rhettinger
Recipients rhettinger
Date 2014-05-11.18:21:44
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1399832505.17.0.149910731712.issue21469@psf.upfronthosting.co.za>
In-reply-to
Content
Attaching a draft patch:

* Repair the broken link to norobots-rfc.txt.

* HTTP response codes >= 500 treated as a failed read rather than as a not found.  Not found means that we can assume the entire site is allowed.  A 5xx server error tells us nothing.

* A successful read() updates the mtime (which is defined to be "the time the robots.txt file was last fetched").

* The can_fetch() method returns False unless we've had a read() with a 2xx or 4xx response.  This avoids false positives in the case where a user calls can_fetch() before calling read().
History
Date User Action Args
2014-05-11 18:21:45rhettingersetrecipients: + rhettinger
2014-05-11 18:21:45rhettingersetmessageid: <1399832505.17.0.149910731712.issue21469@psf.upfronthosting.co.za>
2014-05-11 18:21:45rhettingerlinkissue21469 messages
2014-05-11 18:21:44rhettingercreate