classification
Title: robotparser.py missing one line
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.0, Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: skip.montanaro Nosy List: mbloore, skip.montanaro
Priority: normal Keywords:

Created on 2008-07-24 15:50 by mbloore, last changed 2008-07-27 00:49 by skip.montanaro. This issue is now closed.

Messages (6)
msg70209 - (view) Author: mARK (mbloore) Date: 2008-07-24 15:50
RobotFileParser.parse() contains the lines

                elif line[0] == "disallow":
                    if state != 0:
                        entry.rulelines.append(RuleLine(line[1], False))
                        state = 2
                elif line[0] == "allow":
                    if state != 0:
                        entry.rulelines.append(RuleLine(line[1], True))

with no 'state = 2' in the 'allow' part.
This causes different behaviour depending on whether the file ends with
'allow' or 'disallow', or an empty line.

Those lines were taken from revision 65118.  My Python 2.5 sources are
similar.  I have not checked others.
msg70234 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-07-25 02:13
Do you have a concrete robots.txt file I can use in a test case?
msg70235 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-07-25 02:19
Perhaps more important than a test case, can you explain what states 0, 1 
and 2 are (maybe give them some symbolic names I can at least put in a 
comment)?  This is not my code.  Though I wrote the first version of the 
robotparser module and I served as the person who checked this version 
into the repo, as I recall this is a complete rewrite.
msg70237 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-07-25 02:29
*sigh* there are no test cases in the current code with Allow: lines in 
test_robotparser.py.
msg70298 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-07-26 11:48
Attached is a patch against 2.6 which adds the missing line (state = 2), a 
comment describing the three states the parser can be in and expands the 
test cases to cover this change (fail without it, pass with it).  In the 
process I snagged some broken example robots.txt files from Google's 
Googlebot help pages and turned them into test cases, both before and 
after fixing the examples.  I think this can probably go into the 
repository as a bug fix and get merged to the py3k branch.  If nobody 
complains in the next day or two I'll apply it.
msg70307 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2008-07-27 00:49
committed as r 65255.
History
Date User Action Args
2008-07-27 00:49:48skip.montanarosetstatus: open -> closed
resolution: fixed
messages: + msg70307
2008-07-26 11:48:48skip.montanarosetmessages: + msg70298
2008-07-25 02:30:02skip.montanarosetmessages: - msg70236
2008-07-25 02:29:52skip.montanarosetpriority: normal
messages: + msg70237
versions: + Python 2.6
2008-07-25 02:26:55skip.montanarosetmessages: + msg70236
2008-07-25 02:19:10skip.montanarosetmessages: + msg70235
2008-07-25 02:13:04skip.montanarosetmessages: + msg70234
2008-07-24 15:52:49benjamin.petersonsetassignee: skip.montanaro
nosy: + skip.montanaro
2008-07-24 15:50:23mbloorecreate