Message 170262 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	orsenthil
Recipients	dualbus, ezio.melotti, orsenthil, terry.reedy
Date	2012-09-11.04:44:56
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAPOVWOQYL783JFzF4e2SWf0BgTU-R=OAGfOmQMirffWw+cwWow@mail.gmail.com>
In-reply-to	<20120909214501.GC5726@claret.lan>

Content
Hello Eduardo, I fail to see the bug in here. Robotparser module is for reading and parsing the robot.txt file, the module responsible for fetching it could urllib. robots.txt is always available from web-server and you can download the robot.txt by any means, even by using robotparser.read by providing the full url to robots.txt. You do not need to set user-agent to read/fetch the robots.txt file. Once fetched, now when you are crawling the site using your custom written crawler or using urllib, you can honor the User-Agent requirement by sending proper headers with your request. That can be done using urllib module itself and there is documentation on adding headers I believe. I think, this is way most folks would be (or I believe are ) using it. Am I missing something? If my above explanation is okay, then we can close this bug as invalid. Thanks, Senthil

Hello Eduardo,

I fail to see the bug in here. Robotparser module is for reading and
parsing the robot.txt file, the module responsible for fetching it
could urllib. robots.txt is always available from web-server and you
can download the robot.txt by any means, even by using
robotparser.read by providing the full url to robots.txt. You do not
need to set user-agent to read/fetch the robots.txt file. Once
fetched, now when you are crawling the site using your custom written
crawler or using urllib, you can honor the User-Agent requirement by
sending proper headers with your request. That can be done using
urllib module itself and there is documentation on adding headers I
believe.

I think, this is way most folks would be (or I believe are ) using it.
Am I missing something? If my above explanation is okay, then we can
close this bug as invalid.

Thanks,
Senthil

History
Date	User	Action	Args
2012-09-11 04:44:57	orsenthil	set	recipients: + orsenthil, terry.reedy, ezio.melotti, dualbus
2012-09-11 04:44:57	orsenthil	link	issue15851 messages
2012-09-11 04:44:56	orsenthil	create