Author rhettinger
Recipients rhettinger
Date 2014-05-10.16:55:09
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
* The can_fetch() method is not checking to see if read() has been called, so it returns false positives if read() has not been called.

* When read() is called, it fails to call modified() so that mtime() returns an incorrect result.  The user has to manually call modified() to update the mtime().

>>> from urllib.robotparser import RobotFileParser
>>> rp = RobotFileParser('')
>>> rp.can_fetch('UbiCrawler', '')
>>> rp.can_fetch('UbiCrawler', '')
>>> rp.mtime()
>>> rp.modified()
>>> rp.mtime()

Suggested improvements:

1) Trigger internal calls to modified() every time the parse is modified using read() or add_entry().  That would assure that mtime() actually reflects the modification time.

2) Raise an exception or return False whenever can_fetch() is called and the mtime() is zero (meaning that the parser has not be initialized with any rules).
Date User Action Args
2014-05-10 16:55:10rhettingersetrecipients: + rhettinger
2014-05-10 16:55:10rhettingersetmessageid: <>
2014-05-10 16:55:10rhettingerlinkissue21469 messages
2014-05-10 16:55:09rhettingercreate