Issue 36207: robotsparser deny all with some rules

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/80388

classification

Title:	robotsparser deny all with some rules
Type:	enhancement	Stage:
Components:	Library (Lib)	Versions:	Python 3.11

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	EricG, adiboo67, iritkatriel, nico.bonefato, quentin-maire, vstinner
Priority:	normal	Keywords:

Created on 2019-03-06 09:42 by quentin-maire, last changed 2022-04-11 14:59 by admin.

Messages (6)
msg337285 - (view)	Author: wats0ns (quentin-maire)	Date: 2019-03-06 09:42
RobotsParser parse a "Disallow: ?" rule as a deny all, but this is a valid rule that should be interpreted as "Disallow: /?" or "Disallow: /?*"
msg338293 - (view)	Author: Cheryl Sabella (cheryl.sabella) *	Date: 2019-03-18 22:13
Can you provide a link to documentation showing that "Disallow: ?" shouldn't be the same as deny all? Thanks!
msg338298 - (view)	Author: wats0ns (quentin-maire)	Date: 2019-03-18 23:20
I can't find a documentation about it, but all of the robots.txt checkers I find behave like this. You can test on this site: http://www.eskimoz.fr/robots.txt, I believe that this is how it's implemented now in most parsers ?
msg390073 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-04-02 15:48
I removed almost all messages of this issue since most of them looked list SPAM. I also blocked user accounts who posted SPAM. If it was a mistake, contact me. This is the Python bug tracker, not a forum to ask questions how to use Python, or to report bugs in your website. Multiple comments were written in French, whereas this bug tracker is in English. I even hesitate to close the issue since it got too many SPAM comments.
msg408351 - (view)	Author: Irit Katriel (iritkatriel) *	Date: 2021-12-12 00:11
I restored one non-spam message from the OP that was deleted. Changing to enhancement because this is not a bug (i.e., deviation from documentation). I don't know enough about this to have a view on whether this enhancement request should be accepted.
msg416852 - (view)	Author: STINNER Victor (vstinner) *	Date: 2022-04-06 10:21
I removed two comments: none of the mentioned URL contains a "Disallow: ?" rule and the comments didn't add any value to this issue. It looks like regular spam (SEO).

History
Date	User	Action	Args
2022-04-11 14:59:12	admin	set	github: 80388
2022-04-06 10:21:58	vstinner	set	messages: + msg416852
2022-04-06 10:21:10	vstinner	set	messages: - msg416767
2022-04-06 10:21:08	vstinner	set	messages: - msg416847
2022-04-06 09:17:05	adiboo67	set	messages: + msg416847
2022-04-05 10:27:26	adiboo67	set	nosy: + adiboo67 messages: + msg416767
2021-12-12 00:11:21	iritkatriel	set	versions: + Python 3.11, - Python 3.5 nosy: + iritkatriel messages: + msg408351 type: behavior -> enhancement
2021-12-12 00:08:06	iritkatriel	set	nosy: + quentin-maire messages: + msg338298
2021-09-29 17:08:26	vstinner	set	messages: - msg402889
2021-09-29 16:17:35	nico.bonefato	set	nosy: + nico.bonefato messages: + msg402889
2021-04-02 15:48:12	vstinner	set	nosy: + vstinner messages: + msg390073
2021-04-02 15:46:06	vstinner	set	messages: - msg338298
2021-04-02 15:46:05	vstinner	set	messages: - msg365770
2021-04-02 15:45:26	vstinner	set	messages: - msg370275
2021-04-02 15:44:53	vstinner	set	messages: - msg367546
2021-04-02 15:44:49	vstinner	set	messages: - msg366509
2021-04-02 15:44:33	vstinner	set	messages: - msg374629
2021-04-02 15:44:05	vstinner	set	messages: - msg377125
2021-04-02 15:42:31	vstinner	set	messages: - msg372112
2021-04-02 15:41:58	vstinner	set	messages: - msg377058
2021-04-02 15:41:37	vstinner	set	messages: - msg376032
2021-04-02 15:41:22	vstinner	set	messages: - msg374642
2021-04-02 15:40:51	vstinner	set	messages: - msg378070
2021-04-02 15:39:54	vstinner	set	messages: - msg379615
2021-04-02 15:39:52	vstinner	set	messages: - msg379616
2021-04-02 15:38:37	vstinner	set	messages: - msg385859
2021-04-02 15:37:42	vstinner	set	messages: - msg381443
2021-04-02 15:36:49	vstinner	set	title: référencement naturel -> robotsparser deny all with some rules
2021-04-02 15:36:09	vstinner	set	messages: - msg390072
2021-04-02 15:36:07	vstinner	set	messages: - msg390071
2021-04-02 15:33:20	EricG	set	messages: + msg390072
2021-04-02 15:30:36	EricG	set	nosy: + EricG, - jeanotlapin, nico702, ideeanimationanniversaire messages: + msg390071 title: robotsparser deny all with some rules -> référencement naturel
2021-01-28 13:35:19	jeanotlapin	set	nosy: + jeanotlapin messages: + msg385859
2020-11-19 17:30:47	ideeanimationanniversaire	set	nosy: + ideeanimationanniversaire messages: + msg381443
2020-10-25 23:01:14	nico702	set	messages: + msg379616
2020-10-25 22:55:38	nico702	set	nosy: + nico702, - matthieuhemea messages: + msg379615
2020-10-05 18:11:29	matthieuhemea	set	nosy: + matthieuhemea, - Patrick Valibus 410 Gone, Jmgray47, arnaud, calamina, amiir.mascud, jeanotlapin messages: + msg378070
2020-09-18 15:20:07	jeanotlapin	set	nosy: + jeanotlapin messages: + msg377125
2020-09-17 15:15:46	amiir.mascud	set	nosy: + amiir.mascud messages: + msg377058
2020-08-28 12:00:03	calamina	set	nosy: + calamina messages: + msg376032
2020-07-31 13:24:17	arnaud	set	nosy: + arnaud messages: + msg374642
2020-07-31 04:34:49	Jmgray47	set	nosy: + Jmgray47 messages: + msg374629
2020-06-22 20:35:43	Patrick Valibus 410 Gone	set	nosy: + Patrick Valibus 410 Gone, - cheryl.sabella, quentin-maire, lagustais, artasca, Fred AYERS, mathias44 messages: + msg372112
2020-05-28 23:54:56	mathias44	set	nosy: + mathias44 messages: + msg370275
2020-04-28 17:20:52	Fred AYERS	set	nosy: + Fred AYERS messages: + msg367546
2020-04-15 12:57:20	artasca	set	nosy: + artasca messages: + msg366509
2020-04-04 16:46:51	lagustais	set	nosy: + lagustais messages: + msg365770
2019-03-18 23:20:00	quentin-maire	set	messages: + msg338298
2019-03-18 22:13:37	cheryl.sabella	set	nosy: + cheryl.sabella messages: + msg338293
2019-03-06 09:42:01	quentin-maire	create