This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients barneygale, eryksun, ikelos
Date 2022-02-06.09:17:10
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1644139031.07.0.151543014344.issue46654@roundup.psfhosted.org>
In-reply-to
Content
> The value of req.selector never starts with "//", for which file_open() 
> checks, but rather a single slash, such as "/Z:/test.py" or 
> "/share/test.py".

To correct myself, actually req.selector will start with "//" for a "file:////" URI, such as "file:////host/share/test.py". For this example, req.host is an empty string, so file_open() still ends up calling open_local_file(), which will open "//host/share/test.py". In Linux, "//host/share" is the same as "/host/share". In Cygwin and MSYS2 it's a UNC path. I guess this case should be allowed, even though the meaning of a "//" root isn't specifically defined in POSIX.

Unless I'm overlooking something, file_open() only has to check the value of req.host. In POSIX, it should require opening a 'local' path, i.e. if req.host isn't None, empty, or a local host, raise URLError.

In Windows, my tests show that the shell API special cases "localhost" (case insensitive) in "file:" URIs. For example, the following are all equivalent: "file:/C:/Temp", "file:///C:/Temp", and "file://localhost/C:/Temp". The shell API does not special case the real local host name or any of its IP addresses, such as 127.0.0.1. They're all handled as UNC paths.

Here's what I've experimented with thus far, which passes the existing urllib tests in Linux and Windows:

    class FileHandler(BaseHandler):
        def file_open(self, req):
            if not self._is_local_path(req):
                if sys.platform == 'win32':
                    path = url2pathname(f'//{req.host}{req.selector}')
                else:
                    raise URLError("In POSIX, the file:// scheme is only "
                                   "supported for local file paths.")
            else:
                path = url2pathname(req.selector)
            return self._common_open_file(req, path)


        def _is_local_path(self, req):
            if req.host:
                host, port = _splitport(req.host)
                if port:
                    raise URLError(f"the host cannot have a port: {req.host}")
                if host.lower() != 'localhost':
                    # In Windows, all other host names are UNC.
                    if sys.platform == 'win32':
                        return False
                    # In POSIX, support all names for the local host.
                    if _safe_gethostbyname(host) not in self.get_names():
                        return False
            return True


        # names for the localhost
        names = None
        def get_names(self):
            if FileHandler.names is None:
                try:
                    FileHandler.names = tuple(
                        socket.gethostbyname_ex('localhost')[2] +
                        socket.gethostbyname_ex(socket.gethostname())[2])
                except socket.gaierror:
                    FileHandler.names = (socket.gethostbyname('localhost'),)
            return FileHandler.names


        def open_local_file(self, req):
            if not self._is_local_path(req):
                raise URLError('file not on local host')
            return self._common_open_file(req, url2pathname(req.selector))


        def _common_open_file(self, req, path):
            import email.utils
            import mimetypes
            host = req.host
            filename = req.selector
            try:
                if host:
                    origurl = f'file://{host}{filename}'
                else:
                    origurl = f'file://{filename}'
                stats = os.stat(path)
                size = stats.st_size
                modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
                mtype = mimetypes.guess_type(filename)[0] or 'text/plain'
                headers = email.message_from_string(
                            f'Content-type: {mtype}\n'
                            f'Content-length: {size}\n'
                            f'Last-modified: {modified}\n')
                return addinfourl(open(path, 'rb'), headers, origurl)
            except OSError as exp:
                raise URLError(exp)


Unfortunately nturl2path.url2pathname() parses some UNC paths incorrectly. For example, the following path should be an invalid UNC path, since "C:" is an invalid name, but instead it gets converted into an unrelated local path.

    >>> nturl2path.url2pathname('//host/C:/Temp/spam.txt')
    'C:\\Temp\\spam.txt'

This goof depends on finding ":" or "|" in the path. It's arguably worse if the last component has a named data stream (allowed by RFC 8089):

    >>> nturl2path.url2pathname('//host/share/spam.txt:eggs')
    'T:\\eggs'

Drive "T:" is from "t:" in "t:eggs", due to simplistic path parsing.
History
Date User Action Args
2022-02-06 09:17:11eryksunsetrecipients: + eryksun, ikelos, barneygale
2022-02-06 09:17:11eryksunsetmessageid: <1644139031.07.0.151543014344.issue46654@roundup.psfhosted.org>
2022-02-06 09:17:11eryksunlinkissue46654 messages
2022-02-06 09:17:10eryksuncreate