Message331595
The standard (http://www.robotstxt.org/robotstxt.html) says:
> To allow all robots complete access:
> User-agent: *
> Disallow:
> (or just create an empty "/robots.txt" file, or don't use one at all)
Here I give python an empty file:
$ curl http://10.223.68.186/robots.txt
$
Code:
rp = robotparser.RobotFileParser()
print (robotsurl)
rp.set_url(robotsurl)
rp.read()
print( "fetch /", rp.can_fetch(useragent = "*", url = "/"))
print( "fetch /admin", rp.can_fetch(useragent = "*", url = "/admin"))
Result:
$ ./test.py
http://10.223.68.186/robots.txt
('fetch /', False)
('fetch /admin', False)
And the result is, robotparser thinks the site is blocked. |
|
Date |
User |
Action |
Args |
2018-12-11 09:30:47 | larsfuse | set | recipients:
+ larsfuse |
2018-12-11 09:30:47 | larsfuse | set | messageid: <1544520647.95.0.788709270274.issue35457@psf.upfronthosting.co.za> |
2018-12-11 09:30:47 | larsfuse | link | issue35457 messages |
2018-12-11 09:30:47 | larsfuse | create | |
|