classification
Title: Support the Sitemap extension in robotparser
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: 25497 Superseder:
Assigned To: Nosy List: berker.peksag, matrixise, mcscope@gmail.com, ned.deily, pwirtz, rhettinger, stevensalbert
Priority: normal Keywords: easy, patch

Created on 2014-05-12 01:35 by rhettinger, last changed 2018-05-16 14:54 by ned.deily. This issue is now closed.

Files
File name Uploaded Description Edit
robotparser_site_maps_v1.patch pwirtz, 2015-10-15 19:51 review
Pull Requests
URL Status Linked Edit
PR 6883 merged python-dev, 2018-05-15 21:55
Messages (14)
msg218308 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-05-12 01:35
Resources:

* http://en.wikipedia.org/wiki/Robots_exclusion_standard#Nonstandard_extensions

* https://support.google.com/webmasters/answer/183669?hl=en

* https://github.com/seomoz/reppy
msg218318 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2014-05-12 09:26
There is a patch for Crawl-delay in issue 16099.
msg252528 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-10-08 10:09
The Crawl-delay part(issue 16099) is now committed.
msg253027 - (view) Author: Peter Wirtz (pwirtz) * Date: 2015-10-15 03:10
I would like to tackle this issue. Should I wait for issue25400 to be resolved first?
msg253035 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-10-15 08:23
issue 25400 is not a blocker of this, so feel free to write a patch.
msg253063 - (view) Author: Peter Wirtz (pwirtz) * Date: 2015-10-15 19:51
Here is a patch that provides support for the Sitemap extension.
msg255225 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2015-11-23 20:50
Add a test with your patch.

Thank you
msg255228 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-11-23 21:17
Peter didn't write a test because issue 25497 (test_robotparser rewrite) needs to be committed first. See msg253016 in issue 25400 for more information.
msg311416 - (view) Author: Stéphane Wirtel (matrixise) * (Python committer) Date: 2018-02-01 10:10
Hi @berker and @pwirtz.

could you write a test for this issue?

thanks
msg315362 - (view) Author: Steven Steven (stevensalbert) Date: 2018-04-16 18:29
Kindly add a test for this issue
msg316740 - (view) Author: Lady Red (mcscope@gmail.com) * Date: 2018-05-15 22:06
I wrote a test for this as it seems to have been abandoned, and opened a PR.  
https://github.com/python/cpython/pull/6878
msg316743 - (view) Author: Lady Red (mcscope@gmail.com) * Date: 2018-05-15 22:09
Sorry, wrong PR number. it is 6883, and attached to this ticket
msg316811 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-16 14:52
New changeset 5db5c0669e624767375593cc1a01f32092c91c58 by Ned Deily (Christopher Beacham) in branch 'master':
bpo-21475: Support the Sitemap extension in robotparser (GH-6883)
https://github.com/python/cpython/commit/5db5c0669e624767375593cc1a01f32092c91c58
msg316813 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-16 14:54
Thanks for the patch, Peter, and thanks for the PR and test, Lady Red!  Merged for release in 3.8.0.
History
Date User Action Args
2018-05-16 14:54:05ned.deilysetstatus: open -> closed
resolution: fixed
messages: + msg316813

stage: patch review -> resolved
2018-05-16 14:52:15ned.deilysetnosy: + ned.deily
messages: + msg316811
2018-05-15 22:09:13mcscope@gmail.comsetmessages: + msg316743
2018-05-15 22:06:35mcscope@gmail.comsetnosy: + mcscope@gmail.com
messages: + msg316740
2018-05-15 21:55:17python-devsetpull_requests: + pull_request6556
2018-04-16 18:29:21stevensalbertsetnosy: + stevensalbert
messages: + msg315362
2018-02-01 10:10:27matrixisesetmessages: + msg311416
2018-01-29 20:55:19rhettingersetversions: + Python 3.8, - Python 3.6
2015-11-25 13:32:45vstinnersetdependencies: + Rewrite test_robotparser
2015-11-23 21:17:14berker.peksagsetmessages: + msg255228
stage: needs patch -> patch review
2015-11-23 20:50:32matrixisesetnosy: + matrixise
messages: + msg255225
2015-10-15 19:51:18pwirtzsetfiles: + robotparser_site_maps_v1.patch
keywords: + patch
messages: + msg253063
2015-10-15 08:23:49berker.peksagsetmessages: + msg253035
2015-10-15 03:10:23pwirtzsetnosy: + pwirtz
messages: + msg253027
2015-10-08 10:09:09berker.peksagsettitle: Support the Sitemap and Crawl-delay extensions in robotparser -> Support the Sitemap extension in robotparser
stage: needs patch
messages: + msg252528
versions: + Python 3.6, - Python 3.5
2014-05-12 09:26:56berker.peksagsetnosy: + berker.peksag
messages: + msg218318
2014-05-12 01:35:57rhettingercreate