Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(199126)

#17403: Robotparser fails to parse some robots.txt

Can't Edit
Can't Publish+Mail
Start Review
Created:
6 years, 8 months ago by mezger.benjamin
Modified:
6 years, 8 months ago
Reviewers:
ezio.melotti
CC:
orsenthil, ezio.melotti, acooke, r.david.murray, mher, lukasz.langa, devnull_psf.upfronthosting.co.za, berkerpeksag, Martin Panter, benmezger
Visibility:
Public.

Patch Set 1 #

Total comments: 3

Patch Set 2 #

Unified diffs Side-by-side diffs Delta from patch set Stats Patch
Lib/test/test_robotparser.py View 1 1 chunk +11 lines, -0 lines 0 comments Download
Lib/urllib/robotparser.py View 1 1 chunk +1 line, -0 lines 0 comments Download

Messages

Total messages: 1
ezio.melotti
6 years, 8 months ago #1
http://bugs.python.org/review/17403/diff/7622/Lib/test/test_robotparser.py
File Lib/test/test_robotparser.py (right):

http://bugs.python.org/review/17403/diff/7622/Lib/test/test_robotparser.py#ne...
Lib/test/test_robotparser.py:245: good =
['/catalogs/test?','/catalogs/sub-catalogs']
Is '/catalogs/test' valid too?
'/catalogs/test?foo=bar'?
'/catalogs/test/foo'?

http://bugs.python.org/review/17403/diff/7622/Lib/urllib/robotparser.py
File Lib/urllib/robotparser.py (right):

http://bugs.python.org/review/17403/diff/7622/Lib/urllib/robotparser.py#newco...
Lib/urllib/robotparser.py:202: lines = list(filter(lambda x:
x.applies_to(filename), self.rulelines))
lines = [line for line in self.rulelines if line.applies_to(filename)]

http://bugs.python.org/review/17403/diff/7622/Lib/urllib/robotparser.py#newco...
Lib/urllib/robotparser.py:204: return max(lines, key=lambda x:
len(x.path)).allowance
This should probably have a comment that explains why it's taking the max()
(possibly with a link to this issue).
Sign in to reply to this message.

RSS Feeds Recent Issues | This issue
This is Rietveld 894c83f36cb7+