This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author larsfuse
Recipients larsfuse
Date 2018-12-11.09:30:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1544520647.95.0.788709270274.issue35457@psf.upfronthosting.co.za>
In-reply-to
Content
The standard (http://www.robotstxt.org/robotstxt.html) says:

> To allow all robots complete access:
> User-agent: *
> Disallow:
> (or just create an empty "/robots.txt" file, or don't use one at all)

Here I give python an empty file:
$ curl http://10.223.68.186/robots.txt
$

Code:

rp = robotparser.RobotFileParser()
print (robotsurl)
rp.set_url(robotsurl)
rp.read()
print( "fetch /", rp.can_fetch(useragent = "*", url = "/"))
print( "fetch /admin", rp.can_fetch(useragent = "*", url = "/admin"))

Result:

$ ./test.py
http://10.223.68.186/robots.txt
('fetch /', False)
('fetch /admin', False)

And the result is, robotparser thinks the site is blocked.
History
Date User Action Args
2018-12-11 09:30:47larsfusesetrecipients: + larsfuse
2018-12-11 09:30:47larsfusesetmessageid: <1544520647.95.0.788709270274.issue35457@psf.upfronthosting.co.za>
2018-12-11 09:30:47larsfuselinkissue35457 messages
2018-12-11 09:30:47larsfusecreate