classification
Title: mimetypes.guess_type("//example.com") misinterprets host name as file name
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: 35939 Superseder:
Assigned To: Nosy List: corona10, martin.panter
Priority: normal Keywords: patch

Created on 2014-09-06 02:52 by martin.panter, last changed 2019-02-09 02:19 by corona10.

Files
File name Uploaded Description Edit
mimetypes-host.patch martin.panter, 2015-02-24 05:53 review
Messages (3)
msg226467 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-09-06 02:52
The documentation says that guess_type() takes a URL, but:

>>> mimetypes.guess_type("http://example.com")
('application/x-msdownload', None)

I suspect the MS download is a reference to *.com files (like DOS's command.com). My current workaround is to strip out the host name from the URL, since I cannot imagine it would be useful for determining the content type. I am also stripping the fragment part. An argument could probably be made for stripping the “;parameters” and “?query” parts as well.

>>> # Workaround for mimetypes.guess_type("//example.com")
... # interpreting host name as file name
... url = urlparse("http://example.com")
>>> url = net.url_replace(url, netloc="", fragment="")
>>> url
'http://'
>>> mimetypes.guess_type(url, strict=False)
(None, None)
msg236479 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-24 05:53
Posting a patch to fix this. It passes the URL through a urlsplit() → urlunsplit() stage, while removing the scheme://netloc parts.
msg335123 - (view) Author: Dong-hee Na (corona10) * Date: 2019-02-09 02:15
The proposed patch I mentioned on bpo-35939 also solve the above situation.

Python 3.8.0a1+ (heads/bpo-12317:96d37dbcd2, Feb  8 2019, 12:03:40)
[Clang 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import mimetypes
>>> mimetypes.guess_type("http://example.com")
(None, None)
>>> mimetypes.guess_type("example.com")
('application/x-msdownload', None)
>>>

I've also added the unit tests of mimetypes-host.patch. It works well.
I think that we close this issue also when the bpo-35939 is closed.

Thanks alot!
History
Date User Action Args
2019-02-09 02:19:48corona10setversions: + Python 3.7, Python 3.8, - Python 3.4
2019-02-09 02:15:26corona10setnosy: + corona10
messages: + msg335123
2019-02-08 23:25:29martin.pantersetdependencies: + Remove urllib.parse._splittype from mimetypes.guess_type
2015-02-24 05:53:55martin.pantersetfiles: + mimetypes-host.patch
keywords: + patch
messages: + msg236479
2014-09-06 02:52:37martin.pantercreate