This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author maxking
Recipients Yaroslav.Halchenko, kyleam, lukasz.langa, martin.panter, maxking, ned.deily, vstinner
Date 2019-10-11.21:26:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1570829190.29.0.583026963073.issue38449@roundup.psfhosted.org>
In-reply-to
Content
The bug is interesting due to some of the implementation details of "guess_type". The documentation says that it can parse either a URL or a filename.

Switching from urllib.parse._splittype to urllib.parse.urlparse changed what a valid "path" is. _splittype doesn't care about the rest of the URL except the scheme, but urlparse does. Previously, we used to split things like:

   >>> print(urllib.parse._splittype(';1.tar.gz')
   (None, ';1.tar.gz')

Then, we'd just treat the 2nd part as a filesystem path, which would rightfully guess the extension as .tar.gz

However, switching to using parsing via urllib.parse.urlparse, we get:

    >>> print(urllib.parse.urlparse(';1.tar.gz')
    ParseResult(scheme='', netloc='', path='', params='1.tar.gz', query='', fragment='')

And then we get the ".path" attribute for further processing, which being empty, returns (None, None).

The format of all these parts is:

    scheme://netloc/path;parameters?query#fragment

A simple fix would be to just merge path, parameters, query and fragment together (with appropriate delimiters) and the proceed with further processing. That would fix parsing of Filesystem paths but would break (again) parsing of URLs like:

    >>> mimetypes.guess_type('http://example.com/index.html;1.tar.gz')
    ('application/x-tar', 'gzip')

It should return 'text/html' as the type, since this is a URL and everything after the ';' should not be used to determine the mimetype. But, if there is no scheme provided, we should treat it as a filesystem path and in that case 'application/x-tar' is the right type.

I hope I am not confusing everyone here. 

The right fix IMO would be to make "guess_type" not treat URLs and filesytem paths alike.
History
Date User Action Args
2019-10-11 21:26:30maxkingsetrecipients: + maxking, vstinner, ned.deily, Yaroslav.Halchenko, lukasz.langa, martin.panter, kyleam
2019-10-11 21:26:30maxkingsetmessageid: <1570829190.29.0.583026963073.issue38449@roundup.psfhosted.org>
2019-10-11 21:26:30maxkinglinkissue38449 messages
2019-10-11 21:26:30maxkingcreate