Message183579
Setting a user agent string should be possible.
My guess is that the default library has been used by an abusive client (by mistake or intent) and wikimedia project has decided to blacklist the client based on the user-agent string sniffing.
The match is on anything which matches
"Python-urllib" in UserAgentString
See below:
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Python-urllib')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 479, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 591, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 517, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 451, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 599, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Pythonurllib/3.3')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
>>> fobj
<http.client.HTTPResponse object at 0x101275850>
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Pyt-honurllib/3.3')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Python-urllib')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 479, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 591, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 517, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 451, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 599, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Python-urlli')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
>>>
Being able to change the header might indeed be a good thing. |
|
Date |
User |
Action |
Args |
2013-03-06 04:26:19 | karlcow | set | recipients:
+ karlcow, terry.reedy, orsenthil, ezio.melotti, dualbus |
2013-03-06 04:26:19 | karlcow | set | messageid: <1362543979.28.0.787877448273.issue15851@psf.upfronthosting.co.za> |
2013-03-06 04:26:19 | karlcow | link | issue15851 messages |
2013-03-06 04:26:18 | karlcow | create | |
|