This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Robotparser does not handle empty paths
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: cmalamas, loewis
Priority: normal Keywords:

Created on 2002-02-26 10:40 by cmalamas, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (2)
msg9423 - (view) Author: Costas Malamas (cmalamas) Date: 2002-02-26 10:40
The robotparser module handles incorrectly empty paths 
in the allow/disallow directives.

According to: http://www.robotstxt.org/wc/norobots-
rfc.html, the following rule should be a global 
*allow*:
User-agent: *
Disallow: 
      
My reading of the RFC is that an empty path is always 
a global allow (for both Allow and Disallow 
directives) so that the syntax is backwards 
compatible --there was no Allow directive in the 
original syntax.

Suggested fix:
robotparser.RuleLine.applies_to() becomes:
    def applies_to(self, filename):
        if not self.path:
           self.allowance = 1
        return self.path=="*" or re.match(self.path, 
filename)
msg9424 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-02-28 15:32
Logged In: YES 
user_id=21627

This is fixed in robotparser.py 1.11.
History
Date User Action Args
2022-04-10 16:05:02adminsetgithub: 36161
2002-02-26 10:40:51cmalamascreate