classification
Title: UnicodeDecodeError in mimetypes.guess_type on Windows
Type: behavior Stage: resolved
Components: Library (Lib), Windows Versions: Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: mimetypes initialization fails on Windows because of non-Latin characters in registry
View: 9291
Assigned To: Nosy List: r.david.murray, vldmit
Priority: normal Keywords:

Created on 2010-10-15 10:57 by vldmit, last changed 2010-10-15 12:52 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
mime content types.reg vldmit, 2010-10-15 10:57 non ASCII content-type keys are at the bottom of the reg file
Messages (2)
msg118758 - (view) Author: Vladimir Dmitriev (vldmit) Date: 2010-10-15 11:05
Windows 7, Python 2.7

Some windows applications (QuickTime) add content-types to Windows registry with non-ascii names. mimetypes in unaware of that and fails with UnicodeDecodeError:

>>> mimetypes.guess_type('test.js')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Python27\lib\mimetypes.py", line 294, in guess_type
    init()
  File "c:\Python27\lib\mimetypes.py", line 355, in init
    db.read_windows_registry()
  File "c:\Python27\lib\mimetypes.py", line 260, in read_windows_registry
    for ctype in enum_types(mimedb):
  File "c:\Python27\lib\mimetypes.py", line 250, in enum_types
    ctype = ctype.encode(default_encoding) # omit in 3.x!
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)

Example registry leaf is attached to previous message.

I believe the correct behavior would be either to wrap UnicodeDecodeError exception and skip those content-typer or use .decode() method for registry keys and get encoding using locale.getdefaultlocale()
msg118764 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-15 12:52
This is a duplicate of #9291.
History
Date User Action Args
2010-10-15 12:52:38r.david.murraysetstatus: open -> closed

superseder: mimetypes initialization fails on Windows because of non-Latin characters in registry

nosy: + r.david.murray
messages: + msg118764
resolution: duplicate
stage: resolved
2010-10-15 11:05:04vldmitsetmessages: + msg118758
2010-10-15 10:57:52vldmitcreate