This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Mercurial robots.txt should let robots crawl landing pages.
Type: enhancement Stage: needs patch
Components: None Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Ivaylo.Popov, barry, benjamin.peterson, emily.zhao, ezio.melotti, georg.brandl, pitrou
Priority: normal Keywords: easy

Created on 2012-02-01 22:29 by Ivaylo.Popov, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg152446 - (view) Author: Ivaylo Popov (Ivaylo.Popov) Date: 2012-02-01 22:29
http://hg.python.org/robots.txt currently disallows all robots from all paths. This means that the site doesn't show up in Google search results seeking, for instance, browsing access to the python source
https://www.google.com/search?ie=UTF-8&q=python+source+browse
https://www.google.com/search?ie=UTF-8&q=python+repo+browse
https://www.google.com/search?ie=UTF-8&q=hg+python+browse
etc...

Instead, robots.txt should allow access to the landing page, http://hg.python.org/, and the landing pages for hosted projects, e.g. http://hg.python.org/cpython/, while prohibiting access to the */rev/*, */shortlog/*, ..., directories.

This change would be very easy, cost virtually nothing, and let users find the mercurial repository viewer from search engines. Note that http://svn.python.org/ does show up in search results, as an illustration of how convenient this is.
msg152457 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-02 13:26
Can you propose a robots.txt file?
msg219976 - (view) Author: Emily Zhao (emily.zhao) * Date: 2014-06-07 21:12
I don't know too much about robots.txt but how about

Disallow: */rev/*
Disallow: */shortlog/*
Allow:

Are there any other directories we'd like to exclude?
msg220003 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-06-07 23:54
Unfortunately, I don't think it will be that easy because I don't think robots.txt supports wildcard paths like that. Possibly, we should just whitelist a few important repositories.
msg220109 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-06-09 19:06
Yes, I think we should whitelist rather than blacklist. The problem with letting engines index the repositories is the sheer resource cost when they fetch many heavy pages (such as annotate, etc.).
msg275898 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2016-09-12 00:17
Two things: is it worth fixing this bug given the impending move to github?  Also, why is this reported here and not the pydotorg tracker?  https://github.com/python/pythondotorg/issues

Given that the last comment was 2014, I'm going to go ahead and close this issue.
History
Date User Action Args
2022-04-11 14:57:26adminsetgithub: 58132
2016-09-12 00:17:26barrysetstatus: open -> closed

nosy: + barry
messages: + msg275898

resolution: wont fix
2014-06-09 19:06:07pitrousetmessages: + msg220109
2014-06-07 23:54:14benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg220003
2014-06-07 21:12:46emily.zhaosetnosy: + emily.zhao
messages: + msg219976
2013-08-17 14:53:04ezio.melottisetkeywords: + easy
stage: needs patch
2012-02-02 14:42:52ezio.melottisetnosy: + ezio.melotti
2012-02-02 13:26:24pitrousetnosy: + georg.brandl, pitrou
messages: + msg152457
2012-02-01 22:29:55Ivaylo.Popovcreate