Title: robotparser should support specifying SSL context
Pull Requests
URL Status Linked Edit
PR 24984 closed Tchinmai7, 2021-03-23 00:03
PR 24986 open Tchinmai7, 2021-03-23 00:20
Messages (3)
msg389352 - (view) Author: Tarun Chinmai Sekar (Tchinmai7) * Date: 2021-03-22 23:25
IMO this could be enhanced by adding a sslcontext parameter to read method

a sample change would it could look like
def read(self, sslcontext=None):
    """Reads the robots.txt URL and feeds it to the parser."""
        if sslcontext:
           f = urllib.request.urlopen(self.url, context=sslcontext)
           f = urllib.request.urlopen(self.url)
    except urllib.error.HTTPError as err:
        if err.code in (401, 403):
            self.disallow_all = True
        elif err.code >= 400 and err.code < 500:
            self.allow_all = True
        raw =


Happy to send a PR if this proposal makes sense.
msg390395 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2021-04-07 01:21
I'm not opposing to the idea, but what's the practical use case here? I haven't seen a case where you needed to pass a custom SSLContext in order to fetch the robots.txt file.
msg390396 - (view) Author: Tarun Chinmai Sekar (Tchinmai7) * Date: 2021-04-07 01:56
I am writing a web scraper, that runs in a container that has CA-Certificates stored in a non-standard location. The Ca-Certificates are managed by Certifi. By allowing to override the sslcontext, it is possible for the user to construct a sslcontext and pass it in.
