Message 218226 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	rhettinger
Date	2014-05-10.16:55:09
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1399740910.18.0.482286720777.issue21469@psf.upfronthosting.co.za>
In-reply-to

Content
* The can_fetch() method is not checking to see if read() has been called, so it returns false positives if read() has not been called. * When read() is called, it fails to call modified() so that mtime() returns an incorrect result. The user has to manually call modified() to update the mtime(). >>> from urllib.robotparser import RobotFileParser >>> rp = RobotFileParser('http://en.wikipedia.org/robots.txt') >>> rp.can_fetch('UbiCrawler', 'http://en.wikipedia.org/index.html') True >>> rp.read() >>> rp.can_fetch('UbiCrawler', 'http://en.wikipedia.org/index.html') False >>> rp.mtime() 0 >>> rp.modified() >>> rp.mtime() 1399740268.628497 Suggested improvements: 1) Trigger internal calls to modified() every time the parse is modified using read() or add_entry(). That would assure that mtime() actually reflects the modification time. 2) Raise an exception or return False whenever can_fetch() is called and the mtime() is zero (meaning that the parser has not be initialized with any rules).

* The can_fetch() method is not checking to see if read() has been called, so it returns false positives if read() has not been called.

* When read() is called, it fails to call modified() so that mtime() returns an incorrect result.  The user has to manually call modified() to update the mtime().

>>> from urllib.robotparser import RobotFileParser
>>> rp = RobotFileParser('http://en.wikipedia.org/robots.txt')
>>> rp.can_fetch('UbiCrawler', 'http://en.wikipedia.org/index.html')
True
>>> rp.read()
>>> rp.can_fetch('UbiCrawler', 'http://en.wikipedia.org/index.html')
False
>>> rp.mtime()
0
>>> rp.modified()
>>> rp.mtime()
1399740268.628497

Suggested improvements:

1) Trigger internal calls to modified() every time the parse is modified using read() or add_entry().  That would assure that mtime() actually reflects the modification time.

2) Raise an exception or return False whenever can_fetch() is called and the mtime() is zero (meaning that the parser has not be initialized with any rules).

History
Date	User	Action	Args
2014-05-10 16:55:10	rhettinger	set	recipients: + rhettinger
2014-05-10 16:55:10	rhettinger	set	messageid: <1399740910.18.0.482286720777.issue21469@psf.upfronthosting.co.za>
2014-05-10 16:55:10	rhettinger	link	issue21469 messages
2014-05-10 16:55:09	rhettinger	create