msg122542 - (view) |
Author: Kovid Goyal (kovid) |
Date: 2010-11-27 19:15 |
Hi,
I am the primary developer of calibre (http:/calibre-ebook.com) and yesterday I released an upgrade of calibre based on python 2.7. Here is a small sampling of all the diverse errors that my users experienced, related to reading mimetypes from the registry:
1. Permission denied if running from non privileged account
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 84, in run_entry_point
File "site-packages\calibre\__init__.py", line 31, in <module>
File "mimetypes.py", line 344, in add_type
File "mimetypes.py", line 355, in init
File "mimetypes.py", line 261, in read_windows_registry
WindowsError: [Error 5] Acceso denegado (Access not allowed)
The fix for this is to trap WindowsError and ignore it in mimetypes.py
2. Mishandling of encoding of registry entries
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 84, in run_entry_point
File "site-packages\calibre\__init__.py", line 31, in <module>
File "mimetypes.py", line 344, in add_type
File "mimetypes.py", line 355, in init
File "mimetypes.py", line 260, in read_windows_registry
File "mimetypes.py", line 250, in enum_types
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: invalid continuation byte
The fix for this is to change
except UnicodeEncodeError
to
except ValueError
3. python -c "import mimetypes; print mimetypes.guess_type('img.jpg')"
('image/pjpeg', None)
Where the output should have been
(image/jpeg', None)
The fix for this is to load the registry entries before the default entris defined in mimetypes.py
Of course, IMHO, the best possible fix is to simply remove the reading of mimetypes from the registry. But that is up to whoever maintains this module.
Duplicate (less comprehensive) tickets ont his isuue in your traceker already are: 9291, 10490, 104314
If the maintainer of this module is unable to fix these issues, let me know and I will submit a patch, either removing _winreg or fixing the issues individually.
|
msg122543 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2010-11-27 19:34 |
The first issue you note appears to be a duplicate of Issue10162, a fix for which should be available in the 2.7.1 maintenance release.
The second issue appears to be a duplicate of Issue9291. Since that issue is still open, I suggest any further discussion be pursued there. You may want to add yourself to the nosy list of that issue.
|
msg122583 - (view) |
Author: Kovid Goyal (kovid) |
Date: 2010-11-27 22:54 |
And what about the third issue?
Allow me to elaborate:
mimetypes are a relatively standard set of mappings from well known file extensions to MIME descriptors.
Reading mimetype mappings from the registry, a location that is writable to by random programs the user may have installed on his machine, let alone malware, is a BAD idea.
It leads to situations like asking for the mimetype of file.jpg and getting iage/pjpeg back. Or asking for the mimetype of file.png and getting image/x-png back.
If you still consider it good to read mimetypes from the registry, at the very least, they should be read before the standard mimetype mappings defined in mimetypes.py are applied. That way at least for that set of mappings, users of python can be assured of sane query results.
As it stands now, mimetypes.py is useless and to workaround the problem I essentially had to define the mimetype mappings for all the mimetypes my program knows about by hand.
|
msg122587 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2010-11-27 23:12 |
(Sorry, I skipped over the third: this is one reason why one should not include multiple problems in one tracker issue.)
As to your third point, a quick search of "mimetypes" in the bugtracker shows that looking in the Windows registry for mimetypes was a new feature in 2.7 and the upcoming 3.2 added by Issue4969.
Adding the Windows maintainers and the Nosy List from that issue.
|
msg122589 - (view) |
Author: Kovid Goyal (kovid) |
Date: 2010-11-27 23:20 |
I apologize for the multiple issue in the ticket. To my mind they were all basically one issue, stemming from the decision to read mimetypes from the registry.
Since there are other tickets for the first two issues, I'll change the summary for this issue to reflect only the third.
|
msg122924 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-11-30 17:33 |
Kovid: so essentially what you are saying is that the windows platform is broken with respect to MIME types and with respect to its security model. Why am I not surprised? :)
You would have the same problem if software installation altered the /etc/mimetypes file on a unix box and created weird entries. Perhaps unix programmers are just better disciplined?
Reading the registry first and having the built in settings override would IMO defeat the purpose of reading the values from the registry: those are (theoretically!!) the settings the user chose to change.
However, working around it in your program should be simple: just call mimetypes.init with an empty file list. The windows registry is only read if the files parameter is None. This will also give you consistent behavior on windows and unix: only the default mime types in the mimetypes module will be used. If, on the other hand, you want to retain the Unix behavior, you can pass init mimetypes.knownfiles instead of the empty list.
(By they way, thanks very much for calibre, I have used the CLI tools to great benefit, and love the fact that the CLI is the basis of the program.)
|
msg122925 - (view) |
Author: Kovid Goyal (kovid) |
Date: 2010-11-30 18:07 |
It is, of course, your decision, but IMO, since the mimetypes database in windows appears to be always broken, the default behavior of the mimetypes module in python 2.7 on windows is broken for most (all?) windows installs. For me personally, it doesn't matter anymore, as I have already fixed calibre, but it would be surprising/unexpected behavior for someone new to using mimetypes.py on windows. Certainly, my expectation (perhaps naively) was that guess_type('image.jpg') would always return 'image/jpeg'.
Users on windows rarely (ever?) modify the registry to change mimetypes. The only thing that does change mimetypes is installed software, without the users' knowledge/consent. So treating the registry as a reliable store of mime information, is not a good idea.
On unix, the knownfiles are system files. I dont know about OS X, but on linux, since most software is installed by package managers, the package managers usually have policies that prevent application installs from clobbering system files. And of course, running userland applications dont have the necessary privileges to modify the files.
Out of curiosity, what is the upside of reading mimetypes from the registry, given that it's information cannot be trusted?
And you're most welcome, for calibre :)
|
msg122926 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-11-30 18:42 |
I would expect that it would not be people new to mimetypes that would have the issues, but people like you for whom the behavior on Windows has changed. And this is indeed a concern.
The people involved in making the windows mimetypes enhancement are nosy on this ticket, perhaps they will have thoughts on the issue of the (in)validity of the windows mime data.
|
msg122928 - (view) |
Author: Kovid Goyal (kovid) |
Date: 2010-11-30 18:56 |
I actually had in mind people that (like me) develop primarily on unix and assume that mimetypes works the same way on both windows and unix. Of course, the changed behavior is also a concern.
At the very least, I would encourage the addition of a warning to the documentation of the mimetypes module.
|
msg172531 - (view) |
Author: Ben Hoyt (benhoyt) * |
Date: 2012-10-09 21:21 |
This is definitely a real issue, and makes mimetypes.guess_type() useless out of the box on Windows.
However, I believe the reason it's broken is that the fix for Issue4969 doesn't actually work, and I'm not sure this is possible with the Windows registry.
You see, "MIME\Database\Content Type" in the Windows registry is a mime type -> file extension mapping, *not the other way around*. But read_windows_registry() tries to use it as a file extension -> mime type mapping, and bad things happen, because there are multiple mime types for certain file extensions.
As far as I can tell, there's nothing in the Windows registry that says which is the "canonical" mime type for a given extension. Again, this is because Microsoft intends it (and uses it) as a mime type -> extension mapping. See more here: http://msdn.microsoft.com/en-us/library/ms775148(v=vs.85).aspx
For example, in my "MIME\Database\Content Type" we have:
image/jpeg -> .jpg
image/jpg -> .jpg
image/pjpeg -> .jpg
And read_windows_registry() picks the last one for .jpg, which in this case is image/pjpeg -- NOT what users expect.
In short, I think the fix for Issue4969 is broken as is, and that you can't actually use the mime types database in the Windows registry in this way. I suggest reverting the fix for Issue4969.
Or, we could get clever and only use the Windows registry value if there's a single mime type -> extension mapping for a given extension, and if there's more than one (meaning it'd be ambiguous), use the mimetypes default from types_map / common_types.
|
msg224275 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2014-07-30 00:02 |
msg185039 from #4969 also complains about this issue. I agree with the solution put forward in the last sentence of msg172531. If we think this is the best idea I'll work on a patch unless anybody else wants to pick this up.
|
msg224281 - (view) |
Author: Ben Hoyt (benhoyt) * |
Date: 2014-07-30 00:54 |
Mark, are you referring to part 3 of this issue, the image/pjpeg type of problem? This was fixed in Python 2.7.6 -- see changeset http://hg.python.org/cpython/rev/e8cead08c556 and http://bugs.python.org/issue15207
|
msg224317 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2014-07-30 15:43 |
Ben you're correct. The other issues have been addressed in #10162 and #9291 so I believe this can be closed. One 2.7 regression regarding mixed str and unicode objects is addressed in #21652.
|
msg380470 - (view) |
Author: Irit Katriel (iritkatriel) * |
Date: 2020-11-06 19:37 |
From the discussion I conclude that all three issues reported here have been resolved. If nobody objects I will close this issue.
|
msg397665 - (view) |
Author: Shane Harvey (ShaneHarvey) * |
Date: 2021-07-16 20:40 |
This issue says "mimetypes read from the registry should not overwrite standard mime mappings". Was this change ever made? the following issue claims that the "HKEY_CLASSES_ROOT\.js\Content Type" registry can still overrides ".js" files: https://bugs.python.org/issue43975?
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:09 | admin | set | github: 54760 |
2021-07-16 20:40:42 | ShaneHarvey | set | nosy:
+ ShaneHarvey messages:
+ msg397665
|
2020-11-22 18:20:04 | iritkatriel | set | status: pending -> closed |
2020-11-06 19:37:38 | iritkatriel | set | status: open -> pending
nosy:
+ iritkatriel messages:
+ msg380470
resolution: not a bug -> out of date |
2019-04-26 20:13:25 | BreamoreBoy | set | nosy:
- BreamoreBoy
|
2014-07-30 15:43:15 | BreamoreBoy | set | messages:
+ msg224317 |
2014-07-30 00:54:05 | benhoyt | set | messages:
+ msg224281 |
2014-07-30 00:02:30 | BreamoreBoy | set | nosy:
+ fhamand, BreamoreBoy, - brian.curtin messages:
+ msg224275
|
2012-10-10 18:39:26 | ned.deily | set | nosy:
- ned.deily
|
2012-10-09 21:21:43 | benhoyt | set | nosy:
+ benhoyt messages:
+ msg172531
|
2010-11-30 18:56:41 | kovid | set | status: pending -> open
messages:
+ msg122928 |
2010-11-30 18:42:37 | r.david.murray | set | status: closed -> pending
messages:
+ msg122926 |
2010-11-30 18:07:50 | kovid | set | messages:
+ msg122925 |
2010-11-30 17:33:56 | r.david.murray | set | status: open -> closed
type: behavior
nosy:
+ r.david.murray messages:
+ msg122924 resolution: not a bug stage: resolved |
2010-11-27 23:20:21 | kovid | set | messages:
+ msg122589 title: mimetypes reading from registry in windows completely broken -> mimetypes read from the registry should not overwrite standard mime mappings |
2010-11-27 23:12:41 | ned.deily | set | versions:
+ Python 3.2 nosy:
+ ggenellina, pitrou, tim.golden, ocean-city, tercero12, loewis
messages:
+ msg122587
superseder: mimetypes initialization fails on Windows because of non-Latin characters in registry -> |
2010-11-27 22:54:31 | kovid | set | status: closed -> open resolution: duplicate -> (no value) messages:
+ msg122583
|
2010-11-27 19:34:18 | ned.deily | set | status: open -> closed
superseder: mimetypes initialization fails on Windows because of non-Latin characters in registry components:
+ Windows
nosy:
+ brian.curtin, ned.deily messages:
+ msg122543 resolution: duplicate |
2010-11-27 19:15:19 | kovid | create | |