Message31944
The code in urllib.quote fails on Unicode input, when
called by robotparser with a Unicode URL.
Traceback (most recent call last):
File "./sitetruth/InfoSitePage.py", line 415, in run
pagetree = self.httpfetch() # fetch page
File "./sitetruth/InfoSitePage.py", line 368, in httpfetch
if not self.owner().checkrobotaccess(self.requestedurl) : # if access disallowed by robots.txt file
File "./sitetruth/InfoSiteContent.py", line 446, in checkrobotaccess
return(self.robotcheck.can_fetch(config.kuseragent, url)) # return can fetch
File "/usr/local/lib/python2.5/robotparser.py", line 159, in can_fetch
url = urllib.quote(urlparse.urlparse(urllib.unquote(url))[2]) or "/"
File "/usr/local/lib/python2.5/urllib.py", line 1197, in quote
res = map(safe_map.__getitem__, s)
KeyError: u'\xe2'
That bit of code needs some attention.
- It still assumes ASCII goes up to 255, which hasn't been true in Python for a while now.
- The initialization may not be thread-safe; a table is being initialized on first use.
"robotparser" was trying to check if a URL with a Unicode character in it was allowed. Note the "KeyError: u'\xe2'" |
|
Date |
User |
Action |
Args |
2007-08-23 14:53:34 | admin | link | issue1712522 messages |
2007-08-23 14:53:34 | admin | create | |
|