classification
Title: RobotFileParser.parse() should raise an exception when the robots.txt file is invalid
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Guinness, iritkatriel
Priority: normal Keywords:

Created on 2018-02-24 10:53 by Guinness, last changed 2021-11-26 16:53 by iritkatriel. This issue is now closed.

Messages (3)
msg312711 - (view) Author: Oudin (Guinness) Date: 2018-02-24 10:53
When processing an ill-formed robots.txt file (like https://tiny.tobast.fr/robots-file ), the RobotFileParser.parse method does not instantiate the entries or the default_entry attributes.

In my opinion, the method should raise an exception when no valid User-agent entry (or if there exists an invalid User-agent entry) is found in the robots.txt file.

Otherwise, the only method available is to check the None-liness of default_entry, which is not documented in the documentation (https://docs.python.org/dev/library/urllib.robotparser.html).

According to your opinion on this, I can implement what is necessary and create a PR on Github.
msg406722 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-11-21 15:14
The link to the robots.txt file no longer works, so it's not clear how to reproduce the problem you are seeing. Can you post the complete information on this issue?
msg407072 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-11-26 16:53
Please reopen this or create a new issue if this is still a problem and you can provide the missing information.
History
Date User Action Args
2021-11-26 16:53:03iritkatrielsetstatus: pending -> closed
resolution: rejected
messages: + msg407072

stage: resolved
2021-11-21 15:14:16iritkatrielsetstatus: open -> pending
nosy: + iritkatriel
messages: + msg406722

2018-02-24 10:53:13Guinnesscreate