This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author benmezger
Recipients benmezger
Date 2013-03-12.10:58:24
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1363085904.54.0.398684856881.issue17403@psf.upfronthosting.co.za>
In-reply-to
Content
I am trying to parse Google's robots.txt (http://google.com/robots.txt) and it fails when checking whether I can crawl the url /catalogs/p? (which it's allowed) but it's returning false, according to my question on stackoverflow -> http://stackoverflow.com/questions/15344253/robotparser-doesnt-seem-to-parse-correctly

Someone has answered it has to do with the line "rllib.quote(urlparse.urlparse(urllib.unquote(url))[2])" in robotparser's module, since it removes the "?" from the end of the url. 

Here is the answer I received -> http://stackoverflow.com/a/15350039/1649067
History
Date User Action Args
2013-03-12 10:58:24benmezgersetrecipients: + benmezger
2013-03-12 10:58:24benmezgersetmessageid: <1363085904.54.0.398684856881.issue17403@psf.upfronthosting.co.za>
2013-03-12 10:58:24benmezgerlinkissue17403 messages
2013-03-12 10:58:24benmezgercreate