Title: robotfileparser always uses default Python user-agent
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.7
Status: closed Resolution: duplicate
Dependencies: Superseder: Lib/ doesn't accept setting a user agent string, instead it uses the default.
Assigned To: Nosy List: nagle, xiang.zhang
Created on 2016-11-21 01:23 by nagle, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg281314 - (view) Author: John Nagle (nagle) Date: 2016-11-21 01:23
urllib.robotparser.RobotFileParser always uses the default Python user agent. This agent is now blacklisted by many sites, and it's not possible to read the robots.txt file at all.
msg281315 - (view) Author: John Nagle (nagle) Date: 2016-11-21 01:26
Suggest adding a user_agent optional parameter, as shown here:

    def __init__(self, url='', user_agent=None):
        urllib.robotparser.RobotFileParser.__init__(self, url)  # init parent
        self.user_agent = user_agent                    # save user agent
    def read(self):
        Reads the robots.txt URL and feeds it to the parser.
        Overrides parent read function.
            req = urllib.request.Request(               # request with user agent specified
            if self.user_agent is not None :            # if overriding user agent
                req.add_header("User-Agent", self.user_agent)
            f = urllib.request.urlopen(req)             # open connection
        except urllib.error.HTTPError as err:
            if err.code in (401, 403):
                self.disallow_all = True
            elif err.code >= 400 and err.code < 500:
                self.allow_all = True
            raw =
msg281316 - (view) Author: John Nagle (nagle) Date: 2016-11-21 01:29
(That's from a subclass I wrote.  As a change to RobotFileParser, __init__ should start like this.)

    def __init__(self, url='', user_agent=None):
        self.user_agent = user_agent                    # save user agent
msg281323 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-11-21 05:40
Hi, John. This issue of robotparser has been reported in #15851. I'll close this as duplicate and you can discuss in that thread.
