classification
Title: mimetypes read from the registry should not overwrite standard mime mappings
Type: behavior Stage: committed/rejected
Components: Library (Lib), Windows Versions: Python 3.2, Python 2.7
process
Status: open Resolution: invalid
Dependencies: Superseder:
Assigned To: Nosy List: brian.curtin, gagenellina, kovid, loewis, ned.deily, ocean-city, pitrou, r.david.murray, tercero12, tim.golden
Priority: normal Keywords:

Created on 2010-11-27 19:15 by kovid, last changed 2010-11-30 18:56 by kovid.

Messages (9)
msg122542 - (view) Author: Kovid Goyal (kovid) Date: 2010-11-27 19:15
Hi,

I am the primary developer of calibre (http:/calibre-ebook.com) and yesterday I released an upgrade of calibre based on python 2.7. Here is a small sampling of all the diverse errors that my users experienced, related to reading mimetypes from the registry:

1. Permission denied if running from non privileged account
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 84, in run_entry_point
File "site-packages\calibre\__init__.py", line 31, in <module>
File "mimetypes.py", line 344, in add_type
File "mimetypes.py", line 355, in init
File "mimetypes.py", line 261, in read_windows_registry
WindowsError: [Error 5] Acceso denegado (Access not allowed)

The fix for this is to trap WindowsError and ignore it in mimetypes.py

2. Mishandling of encoding of registry entries

Traceback (most recent call last):      
  File "site.py", line 103, in main     
  File "site.py", line 84, in run_entry_point
  File "site-packages\calibre\__init__.py", line 31, in <module>                                                
  File "mimetypes.py", line 344, in add_type                                                                
  File "mimetypes.py", line 355, in init                                                                    
  File "mimetypes.py", line 260, in read_windows_registry                                                   
  File "mimetypes.py", line 250, in enum_types                                                              
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: invalid continuation byte

The fix for this is to change

except UnicodeEncodeError

to

except ValueError

3. python -c "import mimetypes; print mimetypes.guess_type('img.jpg')"
('image/pjpeg', None)

Where the output should have been

(image/jpeg', None)

The fix for this is to load the registry entries before the default entris defined in mimetypes.py


Of course, IMHO, the best possible fix is to simply remove the reading of mimetypes from the registry. But that is up to whoever maintains this module. 

Duplicate (less comprehensive) tickets ont his isuue in your traceker already are: 9291, 10490, 104314

If the maintainer of this module is unable to fix these issues, let me know and I will submit a patch, either removing _winreg or fixing the issues individually.
msg122543 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2010-11-27 19:34
The first issue you note appears to be a duplicate of Issue10162, a fix for which should be available in the 2.7.1 maintenance release.

The second issue appears to be a duplicate of Issue9291.  Since that issue is still open, I suggest any further discussion be pursued there.  You may want to add yourself to the nosy list of that issue.
msg122583 - (view) Author: Kovid Goyal (kovid) Date: 2010-11-27 22:54
And what about the third issue?

Allow me to elaborate:

mimetypes are a relatively standard set of mappings from well known file extensions to MIME descriptors. 

Reading mimetype mappings from the registry, a location that is writable to by random programs the user may have installed on his machine, let alone malware, is a BAD idea.

It leads to situations like asking for the mimetype of file.jpg and getting iage/pjpeg back. Or asking for the mimetype of file.png and getting image/x-png back.

If you still consider it good to read mimetypes from the registry, at the very least, they should be read before the standard mimetype mappings defined in mimetypes.py are applied. That way at least for that set of mappings, users of python can be assured of sane query results. 

As it stands now, mimetypes.py is useless and to workaround the problem I essentially had to define the mimetype mappings for all the mimetypes my program knows about by hand.
msg122587 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2010-11-27 23:12
(Sorry, I skipped over the third: this is one reason why one should not include multiple problems in one tracker issue.)

As to your third point, a quick search of "mimetypes" in the bugtracker shows that looking in the Windows registry for mimetypes was a new feature in 2.7 and the upcoming 3.2 added by Issue4969.

Adding the Windows maintainers and the Nosy List from that issue.
msg122589 - (view) Author: Kovid Goyal (kovid) Date: 2010-11-27 23:20
I apologize for the multiple issue in the ticket. To my mind they were all basically one issue, stemming from the decision to read mimetypes from the registry.

Since there are other tickets for the first two issues, I'll change the summary for this issue to reflect only the third.
msg122924 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-11-30 17:33
Kovid: so essentially what you are saying is that the windows platform is broken with respect to MIME types and with respect to its security model.  Why am I not surprised? :)

You would have the same problem if software installation altered the /etc/mimetypes file on a unix box and created weird entries.  Perhaps unix programmers are just better disciplined?

Reading the registry first and having the built in settings override would IMO defeat the purpose of reading the values from the registry: those are (theoretically!!) the settings the user chose to change.

However, working around it in your program should be simple: just call mimetypes.init with an empty file list.  The windows registry is only read if the files parameter is None.  This will also give you consistent behavior on windows and unix: only the default mime types in the mimetypes module will be used.  If, on the other hand, you want to retain the Unix behavior, you can pass init mimetypes.knownfiles instead of the empty list.

(By they way, thanks very much for calibre, I have used the CLI tools to great benefit, and love the fact that the CLI is the basis of the program.)
msg122925 - (view) Author: Kovid Goyal (kovid) Date: 2010-11-30 18:07
It is, of course, your decision, but IMO, since the mimetypes database in windows appears to be always broken, the default behavior of the mimetypes module in python 2.7 on windows is broken for most (all?) windows installs. For me personally, it doesn't matter anymore, as I have already fixed calibre, but it would be surprising/unexpected behavior for someone new to using mimetypes.py on windows. Certainly, my expectation (perhaps naively) was that guess_type('image.jpg') would always return 'image/jpeg'. 

Users on windows rarely (ever?) modify the registry to change mimetypes. The only thing that does change mimetypes is installed software, without the users' knowledge/consent. So treating the registry as a reliable store of mime information, is not a good idea. 

On unix, the knownfiles are system files. I dont know about OS X, but on linux, since most software is installed by package managers, the package managers usually have policies that prevent application installs from clobbering system files. And of course, running userland applications dont have the necessary privileges to modify the files. 

Out of curiosity, what is the upside of reading mimetypes from the registry, given that it's information cannot be trusted?

And you're most welcome, for calibre :)
msg122926 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-11-30 18:42
I would expect that it would not be people new to mimetypes that would have the issues, but people like you for whom the behavior on Windows has changed.  And this is indeed a concern.

The people involved in making the windows mimetypes enhancement are nosy on this ticket, perhaps they will have thoughts on the issue of the (in)validity of the windows mime data.
msg122928 - (view) Author: Kovid Goyal (kovid) Date: 2010-11-30 18:56
I actually had in mind people that (like me) develop primarily on unix and assume that mimetypes works the same way on both windows and unix. Of course, the changed behavior is also a concern.

At the very least, I would encourage the addition of a warning to the documentation of the mimetypes module.
History
Date User Action Args
2010-11-30 18:56:41kovidsetstatus: pending -> open

messages: + msg122928
2010-11-30 18:42:37r.david.murraysetstatus: closed -> pending

messages: + msg122926
2010-11-30 18:07:50kovidsetmessages: + msg122925
2010-11-30 17:33:56r.david.murraysetstatus: open -> closed

type: behavior

nosy: + r.david.murray
messages: + msg122924
resolution: invalid
stage: committed/rejected
2010-11-27 23:20:21kovidsetmessages: + msg122589
title: mimetypes reading from registry in windows completely broken -> mimetypes read from the registry should not overwrite standard mime mappings
2010-11-27 23:12:41ned.deilysetversions: + Python 3.2
nosy: + gagenellina, pitrou, tim.golden, ocean-city, tercero12, loewis

messages: + msg122587

superseder: mimetypes initialization fails on Windows because of non-Latin characters in registry ->
2010-11-27 22:54:31kovidsetstatus: closed -> open
resolution: duplicate -> (no value)
messages: + msg122583
2010-11-27 19:34:18ned.deilysetstatus: open -> closed

superseder: mimetypes initialization fails on Windows because of non-Latin characters in registry
components: + Windows

nosy: + brian.curtin, ned.deily
messages: + msg122543
resolution: duplicate
2010-11-27 19:15:19kovidcreate