Message 331595 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	larsfuse
Recipients	larsfuse
Date	2018-12-11.09:30:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1544520647.95.0.788709270274.issue35457@psf.upfronthosting.co.za>
In-reply-to

Content
The standard (http://www.robotstxt.org/robotstxt.html) says: > To allow all robots complete access: > User-agent: * > Disallow: > (or just create an empty "/robots.txt" file, or don't use one at all) Here I give python an empty file: $ curl http://10.223.68.186/robots.txt $ Code: rp = robotparser.RobotFileParser() print (robotsurl) rp.set_url(robotsurl) rp.read() print( "fetch /", rp.can_fetch(useragent = "", url = "/")) print( "fetch /admin", rp.can_fetch(useragent = "", url = "/admin")) Result: $ ./test.py http://10.223.68.186/robots.txt ('fetch /', False) ('fetch /admin', False) And the result is, robotparser thinks the site is blocked.

The standard (http://www.robotstxt.org/robotstxt.html) says:

> To allow all robots complete access:
> User-agent: *
> Disallow:
> (or just create an empty "/robots.txt" file, or don't use one at all)

Here I give python an empty file:
$ curl http://10.223.68.186/robots.txt
$

Code:

rp = robotparser.RobotFileParser()
print (robotsurl)
rp.set_url(robotsurl)
rp.read()
print( "fetch /", rp.can_fetch(useragent = "*", url = "/"))
print( "fetch /admin", rp.can_fetch(useragent = "*", url = "/admin"))

Result:

$ ./test.py
http://10.223.68.186/robots.txt
('fetch /', False)
('fetch /admin', False)

And the result is, robotparser thinks the site is blocked.

History
Date	User	Action	Args
2018-12-11 09:30:47	larsfuse	set	recipients: + larsfuse
2018-12-11 09:30:47	larsfuse	set	messageid: <1544520647.95.0.788709270274.issue35457@psf.upfronthosting.co.za>
2018-12-11 09:30:47	larsfuse	link	issue35457 messages
2018-12-11 09:30:47	larsfuse	create