Author karlcow
Recipients dualbus, ezio.melotti, karlcow, orsenthil, terry.reedy
Date 2013-03-06.04:26:18
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1362543979.28.0.787877448273.issue15851@psf.upfronthosting.co.za>
In-reply-to
Content
Setting a user agent string should be possible.
My guess is that the default library has been used by an abusive client (by mistake or intent) and wikimedia project has decided to blacklist the client based on the user-agent string sniffing.

The match is on anything which matches

"Python-urllib" in UserAgentString

See below:

>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Python-urllib')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 479, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 591, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 517, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 451, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 599, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Pythonurllib/3.3')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
>>> fobj
<http.client.HTTPResponse object at 0x101275850>
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Pyt-honurllib/3.3')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Python-urllib')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 479, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 591, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 517, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 451, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py", line 599, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Python-urlli')]
>>> fobj = opener.open('http://en.wikipedia.org/robots.txt')
>>> 

Being able to change the header might indeed be a good thing.
History
Date User Action Args
2013-03-06 04:26:19karlcowsetrecipients: + karlcow, terry.reedy, orsenthil, ezio.melotti, dualbus
2013-03-06 04:26:19karlcowsetmessageid: <1362543979.28.0.787877448273.issue15851@psf.upfronthosting.co.za>
2013-03-06 04:26:19karlcowlinkissue15851 messages
2013-03-06 04:26:18karlcowcreate