Message 184017 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	benmezger
Recipients	benmezger
Date	2013-03-12.10:58:24
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1363085904.54.0.398684856881.issue17403@psf.upfronthosting.co.za>
In-reply-to

Content
I am trying to parse Google's robots.txt (http://google.com/robots.txt) and it fails when checking whether I can crawl the url /catalogs/p? (which it's allowed) but it's returning false, according to my question on stackoverflow -> http://stackoverflow.com/questions/15344253/robotparser-doesnt-seem-to-parse-correctly Someone has answered it has to do with the line "rllib.quote(urlparse.urlparse(urllib.unquote(url))[2])" in robotparser's module, since it removes the "?" from the end of the url. Here is the answer I received -> http://stackoverflow.com/a/15350039/1649067

I am trying to parse Google's robots.txt (http://google.com/robots.txt) and it fails when checking whether I can crawl the url /catalogs/p? (which it's allowed) but it's returning false, according to my question on stackoverflow -> http://stackoverflow.com/questions/15344253/robotparser-doesnt-seem-to-parse-correctly

Someone has answered it has to do with the line "rllib.quote(urlparse.urlparse(urllib.unquote(url))[2])" in robotparser's module, since it removes the "?" from the end of the url. 

Here is the answer I received -> http://stackoverflow.com/a/15350039/1649067

History
Date	User	Action	Args
2013-03-12 10:58:24	benmezger	set	recipients: + benmezger
2013-03-12 10:58:24	benmezger	set	messageid: <1363085904.54.0.398684856881.issue17403@psf.upfronthosting.co.za>
2013-03-12 10:58:24	benmezger	link	issue17403 messages
2013-03-12 10:58:24	benmezger	create