Author rhettinger
Recipients rhettinger
Date 2014-05-11.18:21:44
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Attaching a draft patch:

* Repair the broken link to norobots-rfc.txt.

* HTTP response codes >= 500 treated as a failed read rather than as a not found.  Not found means that we can assume the entire site is allowed.  A 5xx server error tells us nothing.

* A successful read() updates the mtime (which is defined to be "the time the robots.txt file was last fetched").

* The can_fetch() method returns False unless we've had a read() with a 2xx or 4xx response.  This avoids false positives in the case where a user calls can_fetch() before calling read().
Date User Action Args
2014-05-11 18:21:45rhettingersetrecipients: + rhettinger
2014-05-11 18:21:45rhettingersetmessageid: <>
2014-05-11 18:21:45rhettingerlinkissue21469 messages
2014-05-11 18:21:44rhettingercreate