Author XapaJIaMnu
Recipients XapaJIaMnu
Date 2012-10-01.12:58:24
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1349096305.17.0.983395980337.issue16099@psf.upfronthosting.co.za>
In-reply-to
Content
Robotparser doesn't support two quite important optional parameters from the robots.txt file. I have implemented those in the following way:
(Robotparser should be initialized in the usual way:
rp = robotparser.RobotFileParser()
rp.set_url(..)
rp.read
)

crawl_delay(useragent) - Returns time in seconds that you need to wait for crawling
if none is specified, or doesn't apply to this user agent, returns -1
request_rate(useragent) - Returns a list in the form [request,seconds].
if none is specified, or doesn't apply to this user agent, returns -1
History
Date User Action Args
2012-10-01 12:58:25XapaJIaMnusetrecipients: + XapaJIaMnu
2012-10-01 12:58:25XapaJIaMnusetmessageid: <1349096305.17.0.983395980337.issue16099@psf.upfronthosting.co.za>
2012-10-01 12:58:25XapaJIaMnulinkissue16099 messages
2012-10-01 12:58:25XapaJIaMnucreate