This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Guinness
Recipients Guinness
Date 2018-02-24.10:53:13
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1519469593.78.0.467229070634.issue32936@psf.upfronthosting.co.za>
In-reply-to
Content
When processing an ill-formed robots.txt file (like https://tiny.tobast.fr/robots-file ), the RobotFileParser.parse method does not instantiate the entries or the default_entry attributes.

In my opinion, the method should raise an exception when no valid User-agent entry (or if there exists an invalid User-agent entry) is found in the robots.txt file.

Otherwise, the only method available is to check the None-liness of default_entry, which is not documented in the documentation (https://docs.python.org/dev/library/urllib.robotparser.html).

According to your opinion on this, I can implement what is necessary and create a PR on Github.
History
Date User Action Args
2018-02-24 10:53:13Guinnesssetrecipients: + Guinness
2018-02-24 10:53:13Guinnesssetmessageid: <1519469593.78.0.467229070634.issue32936@psf.upfronthosting.co.za>
2018-02-24 10:53:13Guinnesslinkissue32936 messages
2018-02-24 10:53:13Guinnesscreate