classification
Title: mimetypes.guess_type("//example.com") misinterprets host name as file name
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: open Resolution: fixed
Dependencies: 35939 Superseder:
Assigned To: Nosy List: corona10, martin.panter, maxking, miss-islington, ned.deily
Priority: normal Keywords: patch

Created on 2014-09-06 02:52 by martin.panter, last changed 2019-10-15 07:30 by ned.deily.

Files
File name Uploaded Description Edit
mimetypes-host.patch martin.panter, 2015-02-24 05:53 review
Pull Requests
URL Status Linked Edit
PR 15522 merged corona10, 2019-08-26 16:04
PR 15685 merged miss-islington, 2019-09-05 00:34
PR 15687 merged corona10, 2019-09-05 00:49
PR 16724 merged maxking, 2019-10-12 00:42
PR 16725 closed miss-islington, 2019-10-12 05:41
PR 16727 merged maxking, 2019-10-12 15:52
PR 16728 merged maxking, 2019-10-12 16:04
Messages (15)
msg226467 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-09-06 02:52
The documentation says that guess_type() takes a URL, but:

>>> mimetypes.guess_type("http://example.com")
('application/x-msdownload', None)

I suspect the MS download is a reference to *.com files (like DOS's command.com). My current workaround is to strip out the host name from the URL, since I cannot imagine it would be useful for determining the content type. I am also stripping the fragment part. An argument could probably be made for stripping the “;parameters” and “?query” parts as well.

>>> # Workaround for mimetypes.guess_type("//example.com")
... # interpreting host name as file name
... url = urlparse("http://example.com")
>>> url = net.url_replace(url, netloc="", fragment="")
>>> url
'http://'
>>> mimetypes.guess_type(url, strict=False)
(None, None)
msg236479 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-24 05:53
Posting a patch to fix this. It passes the URL through a urlsplit() → urlunsplit() stage, while removing the scheme://netloc parts.
msg335123 - (view) Author: Dong-hee Na (corona10) * (Python triager) Date: 2019-02-09 02:15
The proposed patch I mentioned on bpo-35939 also solve the above situation.

Python 3.8.0a1+ (heads/bpo-12317:96d37dbcd2, Feb  8 2019, 12:03:40)
[Clang 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import mimetypes
>>> mimetypes.guess_type("http://example.com")
(None, None)
>>> mimetypes.guess_type("example.com")
('application/x-msdownload', None)
>>>

I've also added the unit tests of mimetypes-host.patch. It works well.
I think that we close this issue also when the bpo-35939 is closed.

Thanks alot!
msg351156 - (view) Author: miss-islington (miss-islington) Date: 2019-09-05 00:34
New changeset 87bd2071c756188b6cd577889fb1682831142ceb by Miss Islington (bot) (Dong-hee Na) in branch 'master':
bpo-22347: Update mimetypes.guess_type to allow proper parsing of URLs (GH-15522)
https://github.com/python/cpython/commit/87bd2071c756188b6cd577889fb1682831142ceb
msg351157 - (view) Author: miss-islington (miss-islington) Date: 2019-09-05 00:55
New changeset 6d7a786d2e4b48a6b50614e042ace9ff996f0238 by Miss Islington (bot) in branch '3.8':
bpo-22347: Update mimetypes.guess_type to allow proper parsing of URLs (GH-15522)
https://github.com/python/cpython/commit/6d7a786d2e4b48a6b50614e042ace9ff996f0238
msg351158 - (view) Author: miss-islington (miss-islington) Date: 2019-09-05 01:16
New changeset 8873bff2871078e9f23e6c7d942d3a8edbd0921f by Miss Islington (bot) (Dong-hee Na) in branch '3.7':
[3.7] bpo-22347: Update mimetypes.guess_type to allow proper parsing of URLs (GH-15522) (GH-15687)
https://github.com/python/cpython/commit/8873bff2871078e9f23e6c7d942d3a8edbd0921f
msg351162 - (view) Author: Dong-hee Na (corona10) * (Python triager) Date: 2019-09-05 01:26
@vstinner(my mentor) @maxking
Now this issue is solved.
I'd like to close this issue. Is it okay?
msg351164 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-09-05 01:29
I think so, yes.

Also, while you are at it, can you also close bpo-35939 with a comment that points to this issue and the right PR for the fix?
msg351167 - (view) Author: Dong-hee Na (corona10) * (Python triager) Date: 2019-09-05 01:34
Great! I will close bpo-35939 also.
msg354471 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-10-11 17:18
This change introduces a potential 3.7 regression; see Issue38449.
msg354521 - (view) Author: miss-islington (miss-islington) Date: 2019-10-12 05:41
New changeset 19a3d873005e5730eeabdc394c961e93f2ec02f0 by Miss Islington (bot) (Abhilash Raj) in branch 'master':
bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15522)" (GH-16724)
https://github.com/python/cpython/commit/19a3d873005e5730eeabdc394c961e93f2ec02f0
msg354535 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-10-12 16:30
I am going to re-open this since the fixes were reverted in all the branches.
msg354538 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-10-12 16:58
New changeset 5a638a805503131f4a9cc2bbc5944611295c1500 by Abhilash Raj in branch '3.8':
[3.8] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs" (GH-16724) (GH-16728)
https://github.com/python/cpython/commit/5a638a805503131f4a9cc2bbc5944611295c1500
msg354543 - (view) Author: miss-islington (miss-islington) Date: 2019-10-12 18:50
New changeset 164bee296ab1f87cc05566b39ee8fb9fb64b3e5a by Miss Islington (bot) (Abhilash Raj) in branch '3.7':
[3.7] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15685)" (GH-16724) (GH-16727)
https://github.com/python/cpython/commit/164bee296ab1f87cc05566b39ee8fb9fb64b3e5a
msg354697 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-10-15 07:30
New changeset 2a405598bbccbc42710dc5ecf3d44c8de4c16582 by Ned Deily (Abhilash Raj) in branch '3.7':
[3.7] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15685)" (GH-16724) (GH-16727)
https://github.com/python/cpython/commit/2a405598bbccbc42710dc5ecf3d44c8de4c16582
History
Date User Action Args
2019-10-15 07:30:25ned.deilysetmessages: + msg354697
2019-10-14 12:44:20vstinnersetnosy: - vstinner
2019-10-12 18:50:06miss-islingtonsetmessages: + msg354543
2019-10-12 16:58:15maxkingsetmessages: + msg354538
2019-10-12 16:30:21maxkingsetstatus: closed -> open

messages: + msg354535
2019-10-12 16:04:59maxkingsetpull_requests: + pull_request16310
2019-10-12 15:52:49maxkingsetpull_requests: + pull_request16307
2019-10-12 05:41:53miss-islingtonsetpull_requests: + pull_request16305
2019-10-12 05:41:50miss-islingtonsetmessages: + msg354521
2019-10-12 00:42:58maxkingsetpull_requests: + pull_request16303
2019-10-11 17:18:38ned.deilysetnosy: + ned.deily
messages: + msg354471
2019-09-05 12:19:43corona10linkissue35939 superseder
2019-09-05 01:44:00corona10setstatus: open -> closed
resolution: fixed
2019-09-05 01:34:56corona10setstage: patch review -> resolved
2019-09-05 01:34:34corona10setmessages: + msg351167
2019-09-05 01:29:06maxkingsetmessages: + msg351164
2019-09-05 01:26:52corona10setnosy: + vstinner, maxking
messages: + msg351162
2019-09-05 01:16:41miss-islingtonsetmessages: + msg351158
2019-09-05 00:55:01miss-islingtonsetmessages: + msg351157
2019-09-05 00:49:06corona10setpull_requests: + pull_request15345
2019-09-05 00:34:48miss-islingtonsetpull_requests: + pull_request15343
2019-09-05 00:34:39miss-islingtonsetnosy: + miss-islington
messages: + msg351156
2019-08-26 16:04:37corona10setstage: patch review
pull_requests: + pull_request15206
2019-02-09 02:19:48corona10setversions: + Python 3.7, Python 3.8, - Python 3.4
2019-02-09 02:15:26corona10setnosy: + corona10
messages: + msg335123
2019-02-08 23:25:29martin.pantersetdependencies: + Remove urllib.parse._splittype from mimetypes.guess_type
2015-02-24 05:53:55martin.pantersetfiles: + mimetypes-host.patch
keywords: + patch
messages: + msg236479
2014-09-06 02:52:37martin.pantercreate