classification
Title: robotparser crawl_delay and request_rate do not work with no matching entry
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, joseph_myers, miss-islington, orsenthil, remi.lapeyre, taleinat
Priority: normal Keywords: patch

Created on 2019-02-06 20:33 by joseph_myers, last changed 2019-06-16 07:14 by taleinat. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 11791 merged remi.lapeyre, 2019-02-08 16:05
PR 14121 merged miss-islington, 2019-06-16 06:49
PR 14122 merged miss-islington, 2019-06-16 06:49
Messages (8)
msg334982 - (view) Author: Joseph Myers (joseph_myers) Date: 2019-02-06 20:33
RobotFileParser.crawl_delay and RobotFileParser.request_rate raise AttributeError for a robots.txt with no matching entry for the given user-agent, including no default entry, rather than returning None which would be correct according to the documentation.  E.g.:

>>> from urllib.robotparser import RobotFileParser
>>> parser = RobotFileParser()
>>> parser.parse([])
>>> parser.crawl_delay('example')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/urllib/robotparser.py", line 182, in crawl_delay
    return self.default_entry.delay
AttributeError: 'NoneType' object has no attribute 'delay'
msg335093 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2019-02-08 16:07
Thanks for your report Joseph, I opened a new PR to fix this.
msg345251 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-06-11 16:21
The PR is looking good, I'll likely merge it soon.

I'm quite sure this should go into 3.8, but should it be backported to 3.7?  This is certainly a bugfix, but still a slight change of behavior, so perhaps we should avoid changing this in 3.7?
msg345296 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-06-12 01:36
Yes, this looks like a bugfix. Who wants an AttributeError? :-)
msg345730 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-06-16 06:49
New changeset 8047e0e1c620f69cc21f9ca48b24bf2cdd5c3668 by Tal Einat (Rémi Lapeyre) in branch 'master':
bpo-35922: Fix RobotFileParser when robots.txt has no relevant crawl delay or request rate (GH-11791)
https://github.com/python/cpython/commit/8047e0e1c620f69cc21f9ca48b24bf2cdd5c3668
msg345732 - (view) Author: miss-islington (miss-islington) Date: 2019-06-16 07:07
New changeset 58a1a76baefc92d9e2392a5dbf65e39e44fb8f55 by Miss Islington (bot) in branch '3.8':
bpo-35922: Fix RobotFileParser when robots.txt has no relevant crawl delay or request rate (GH-11791)
https://github.com/python/cpython/commit/58a1a76baefc92d9e2392a5dbf65e39e44fb8f55
msg345733 - (view) Author: miss-islington (miss-islington) Date: 2019-06-16 07:10
New changeset 45d6547acfb9ae1639adbe03dd14f38cd0642ca2 by Miss Islington (bot) in branch '3.7':
bpo-35922: Fix RobotFileParser when robots.txt has no relevant crawl delay or request rate (GH-11791)
https://github.com/python/cpython/commit/45d6547acfb9ae1639adbe03dd14f38cd0642ca2
msg345734 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-06-16 07:14
Rémi, thanks for the great work writing the PR and quickly going through several iterations of reviews and revisions!
History
Date User Action Args
2019-06-16 07:14:52taleinatsetstatus: open -> closed
resolution: fixed
messages: + msg345734

stage: patch review -> resolved
2019-06-16 07:10:09miss-islingtonsetmessages: + msg345733
2019-06-16 07:07:58miss-islingtonsetnosy: + miss-islington
messages: + msg345732
2019-06-16 06:49:31miss-islingtonsetpull_requests: + pull_request13971
2019-06-16 06:49:24miss-islingtonsetpull_requests: + pull_request13970
2019-06-16 06:49:02taleinatsetmessages: + msg345730
2019-06-12 01:36:42gvanrossumsetnosy: + gvanrossum
messages: + msg345296
2019-06-11 16:21:14taleinatsetnosy: + taleinat
messages: + msg345251
2019-06-09 19:09:10taleinatsetversions: + Python 3.9, - Python 3.6
2019-02-08 16:07:45remi.lapeyresetnosy: + orsenthil
messages: + msg335093
2019-02-08 16:05:33remi.lapeyresetkeywords: + patch
stage: patch review
pull_requests: + pull_request11796
2019-02-06 20:37:38remi.lapeyresetnosy: + remi.lapeyre
2019-02-06 20:33:03joseph_myerscreate