Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(31011)

Side by Side Diff: Doc/library/urllib.robotparser.rst

Issue 16099: robotparser doesn't support request rate and crawl delay parameters
Patch Set: Created 6 years, 3 months ago
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments. Please Sign in to add in-line comments.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | Lib/test/test_robotparser.py » ('j') | Lib/test/test_robotparser.py » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 :mod:`urllib.robotparser` --- Parser for robots.txt 1 :mod:`urllib.robotparser` --- Parser for robots.txt
2 ==================================================== 2 ====================================================
3 3
4 .. module:: urllib.robotparser 4 .. module:: urllib.robotparser
5 :synopsis: Load a robots.txt file and answer questions about 5 :synopsis: Load a robots.txt file and answer questions about
6 fetchability of other URLs. 6 fetchability of other URLs.
7 .. sectionauthor:: Skip Montanaro <skip@pobox.com> 7 .. sectionauthor:: Skip Montanaro <skip@pobox.com>
8 8
9 9
10 .. index:: 10 .. index::
(...skipping 34 matching lines...) Expand 10 before | Expand all | Expand 10 after
45 .. method:: mtime() 45 .. method:: mtime()
46 46
47 Returns the time the ``robots.txt`` file was last fetched. This is 47 Returns the time the ``robots.txt`` file was last fetched. This is
48 useful for long-running web spiders that need to check for new 48 useful for long-running web spiders that need to check for new
49 ``robots.txt`` files periodically. 49 ``robots.txt`` files periodically.
50 50
51 .. method:: modified() 51 .. method:: modified()
52 52
53 Sets the time the ``robots.txt`` file was last fetched to the current 53 Sets the time the ``robots.txt`` file was last fetched to the current
54 time. 54 time.
55
56 .. method:: crawl_delay(useragent)
berkerpeksag 2013/12/09 03:30:54 Is crawl_delay used for search engines? Google rec
57
58 Returns the value of the Crawl-delay: parameter from ``robots.txt`` for
berkerpeksag 2013/12/09 03:30:54 ``Crawl-delay:``
59 the *useragent* in question. If there is no Crawl-delay parameter or
60 it doesn't apply to this user agent, it returns ``-1``
berkerpeksag 2013/12/09 03:30:54 Returning -1 is not Pythonic IMO. I would prefer `
61
berkerpeksag 2013/12/09 03:30:54 Could you add a versionadded directive? e.g.
62 .. method:: request_rate(useragent)
63
64 Returns the contents of the Request-rate: parameter from ``robots.txt``
berkerpeksag 2013/12/09 03:30:54 ``Request-rate:``
65 in the form of a list ``[requests, seconds]``. If there is no such
berkerpeksag 2013/12/09 03:30:54 collections.namedtuple can be used instead of [req
66 parameter or it doesn't apply to the *useragent* specified, return ``-1``
berkerpeksag 2013/12/09 03:30:54 Returning -1 is not Pythonic IMO. I would prefer `
55 67
berkerpeksag 2013/12/09 03:30:54 Could you add a versionadded directive? e.g.
56 68
57 The following example demonstrates basic use of the RobotFileParser class. 69 The following example demonstrates basic use of the RobotFileParser class.
58 70
59 >>> import urllib.robotparser 71 >>> import urllib.robotparser
60 >>> rp = urllib.robotparser.RobotFileParser() 72 >>> rp = urllib.robotparser.RobotFileParser()
61 >>> rp.set_url("http://www.musi-cal.com/robots.txt") 73 >>> rp.set_url("http://www.musi-cal.com/robots.txt")
62 >>> rp.read() 74 >>> rp.read()
75 >>> rp.request_rate("*")
76 [3/20]
77 >>> rp.crawl_delay("*")
78 6
63 >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Franci sco") 79 >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Franci sco")
64 False 80 False
65 >>> rp.can_fetch("*", "http://www.musi-cal.com/") 81 >>> rp.can_fetch("*", "http://www.musi-cal.com/")
66 True 82 True
67 83
OLDNEW
« no previous file with comments | « no previous file | Lib/test/test_robotparser.py » ('j') | Lib/test/test_robotparser.py » ('J')

RSS Feeds Recent Issues | This issue
This is Rietveld 894c83f36cb7+