This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gallicrooster
Recipients gallicrooster, larsfuse, terry.reedy
Date 2020-01-02.03:41:12
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1577936473.45.0.756125026518.issue35457@roundup.psfhosted.org>
In-reply-to
Content
Hi,

Is this ticket still relevant for Python 3.8?

While running some tests with an empty robotstxt file I realized that it was returning "ALLOWED" for any path (as per the current draft of the Robots Exclusion Protocol: https://tools.ietf.org/html/draft-koster-rep-00#section-2.2.1 ")

Code:

from urllib import robotparser

robots_url = "file:///tmp/empty.txt"

rp = robotparser.RobotFileParser()
print(robots_url)
rp.set_url(robots_url)
rp.read()
print( "fetch /", rp.can_fetch(useragent = "*", url = "/"))
print( "fetch /admin", rp.can_fetch(useragent = "*", url = "/admin"))

Output:

$ cat /tmp/empty.txt
$ python -V
Python 3.8.1
$ python test_robot3.py
file:///tmp/empty.txt
fetch / True
fetch /admin True
History
Date User Action Args
2020-01-02 03:41:13gallicroostersetrecipients: + gallicrooster, terry.reedy, larsfuse
2020-01-02 03:41:13gallicroostersetmessageid: <1577936473.45.0.756125026518.issue35457@roundup.psfhosted.org>
2020-01-02 03:41:13gallicroosterlinkissue35457 messages
2020-01-02 03:41:12gallicroostercreate