This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: mimetypes.guess_type returns deprecated mimetype application/x-javascript
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder: validate mime types loaded from system files. Document that system files take precedence.
View: 32462
Assigned To: Nosy List: iritkatriel, milahu
Priority: normal Keywords:

Created on 2021-12-10 12:10 by milahu, last changed 2022-04-11 14:59 by admin.

Messages (7)
msg408197 - (view) Author: milahu (milahu) Date: 2021-12-10 12:10
deprecated mimetype?
per rfc4329, the technical term is "unregistered media type"

https://datatracker.ietf.org/doc/html/rfc4329#section-3

related

https://stackoverflow.com/a/9664327/10440128

https://github.com/danny0838/PyWebScrapBook/issues/53

quick fix

```py
# python/Lib/mimetypes.py

class MimeTypes:
# ...
    def guess_type(self, url, strict=True):
# ...

        if ext in _types_map_default:
            # prefer the python-internal values over /etc/mime.types
            return _types_map_default[ext], encoding

        if ext in types_map:
            return types_map[ext], encoding
```

why is `application/x-javascript` returned?

on linux, mimetypes.init() loads /etc/mime.types
source:
https://mirrors.kernel.org/gentoo/distfiles/mime-types-9.tar.bz2

/etc/mime.types is sorted by alphabet, so

```
cat /etc/mime.types | grep javascript
application/javascript										js
application/x-javascript									js
```

apparently, the last entry application/x-javascript
will overwrite the previous entry application/javascript
msg408198 - (view) Author: milahu (milahu) Date: 2021-12-10 12:20
patch

https://github.com/milahu/cpython/commit/8a50633bb1b0c3e39fbe2cd467bb34a839ad068f
msg410863 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-01-18 13:34
As noted in Issue32462, the fact that system files take precedence over the definitions in the sodlib is a feature, so the proposed patch to reverse this behaviour cannot be applied unless it is decided to change the API in this way.  That would require a discussion on python-ideas. If you want to being it up there, and there is a decision to change the behaviour, please create a new issue.
msg410868 - (view) Author: milahu (milahu) Date: 2022-01-18 14:41
this issue is different than Issue32462
because here, both entries are valid

```
cat /etc/mime.types | grep javascript
application/javascript        js
application/x-javascript      js
```

but the alphabetical ordering of the file
makes the last entry take precedence

python could be smarter at parsing the /etc/mime.types file
in that it could give lower precedence to the deprecated types

pseudocode:

deprecated_mimetypes = set(...) # values from rfc4329
mimetype_of_ext = dict()
# parser loop
for ...
  ext = "..."
  mimetype = "..."
  if ext in mimetype_of_ext:
    old_mimetype = mimetype_of_ext[ext]
    if old_mimetype in deprecated_mimetypes:
      mimetype_of_ext[ext] = mimetype # replace old with new
      # assume that mimetype is not deprecated
  mimetype_of_ext[ext] = mimetype
msg410869 - (view) Author: milahu (milahu) Date: 2022-01-18 14:43
edit:

-  mimetype_of_ext[ext] = mimetype
+  else:
+    # add new entry
+    mimetype_of_ext[ext] = mimetype
msg410871 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-01-18 14:53
Ok, I reopened this as an enhancement request for mimetypes to know about the 'deprecated' types. If you want to push it forward it might be a good idea to bring this up on python-ideas as well.
msg410877 - (view) Author: milahu (milahu) Date: 2022-01-18 16:31
python-ideas thread
https://mail.python.org/archives/list/python-ideas@python.org/thread/V53XGQPIY7ZAISMTQHPHKGWZNSN5EXQG/
History
Date User Action Args
2022-04-11 14:59:53adminsetgithub: 90193
2022-01-18 16:31:32milahusetmessages: + msg410877
2022-01-18 14:53:40iritkatrielsetmessages: + msg410871
2022-01-18 14:51:58iritkatrielsetstage: resolved ->
2022-01-18 14:51:45iritkatrielsetstatus: closed -> open
type: behavior -> enhancement
resolution: duplicate ->
2022-01-18 14:43:22milahusetmessages: + msg410869
2022-01-18 14:41:53milahusetmessages: + msg410868
2022-01-18 13:34:51iritkatrielsetstatus: open -> closed

superseder: validate mime types loaded from system files. Document that system files take precedence.

nosy: + iritkatriel
messages: + msg410863
resolution: duplicate
stage: resolved
2021-12-10 12:20:41milahusetmessages: + msg408198
2021-12-10 12:10:59milahucreate