Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add preferred extensions for MIME types #40993

Closed
kxroberto mannequin opened this issue Oct 8, 2004 · 21 comments
Closed

Add preferred extensions for MIME types #40993

kxroberto mannequin opened this issue Oct 8, 2004 · 21 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@kxroberto
Copy link
Mannequin

kxroberto mannequin commented Oct 8, 2004

BPO 1043134
Nosy @devdanzin, @ezio-melotti, @merwok, @evanj, @sandrotosi, @vadmium, @The-Compiler, @iritkatriel
Files
  • issue1043134.patch
  • mimetypes.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2022-01-18.13:02:58.535>
    created_at = <Date 2004-10-08.15:44:17.000>
    labels = ['3.8', 'type-feature', 'library']
    title = 'Add preferred extensions for MIME types'
    updated_at = <Date 2022-01-18.13:02:58.531>
    user = 'https://bugs.python.org/kxroberto'

    bugs.python.org fields:

    activity = <Date 2022-01-18.13:02:58.531>
    actor = 'iritkatriel'
    assignee = 'none'
    closed = True
    closed_date = <Date 2022-01-18.13:02:58.535>
    closer = 'iritkatriel'
    components = ['Library (Lib)']
    creation = <Date 2004-10-08.15:44:17.000>
    creator = 'kxroberto'
    dependencies = []
    files = ['19752', '34262']
    hgrepos = []
    issue_num = 1043134
    keywords = ['patch']
    message_count = 21.0
    messages = ['54278', '54279', '54280', '54281', '54282', '82101', '114379', '114461', '121798', '121867', '121953', '121966', '140264', '143665', '212518', '214951', '215571', '226466', '277024', '384346', '410859']
    nosy_count = 19.0
    nosy_names = ['jlgijsbers', 'kxroberto', 'ajaksu2', 'wichert', 'ezio.melotti', 'eric.araujo', 'lambacck', 'cvrebert', 'sascha_silbe', 'evanj', 'ptarjan', 'sandro.tosi', 'leos', 'elesbom', 'martin.panter', 'The Compiler', 'david.lindquist', 'Tom.Christie', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1043134'
    versions = ['Python 3.8']

    @kxroberto
    Copy link
    Mannequin Author

    kxroberto mannequin commented Oct 8, 2004

    Instead of returning the first in the list of
    extensions it should return the most reasonable . here:
    to have a *.txt on disk after saveing?

    @kxroberto kxroberto mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Oct 8, 2004
    @jlgijsbers
    Copy link
    Mannequin

    jlgijsbers mannequin commented Oct 9, 2004

    Logged In: YES
    user_id=469548

    How would you suggest finding out what the most reasonable
    extension for a mime type is?

    @kxroberto
    Copy link
    Mannequin Author

    kxroberto mannequin commented Oct 10, 2004

    Logged In: YES
    user_id=972995

    in mimetypes.py there is already a

    common_types = {
        '.jpg' : 'image/jpg',
    ...

    .txt could be added,
    mayby guess_extension should first reverse-take it out of
    there, not random ...?

    background: my intent was to save MIME attachment as
    (startable) temporary file. yet got wonderful .ksh's for
    textfiles, and had to fumble ...

    @jlgijsbers
    Copy link
    Mannequin

    jlgijsbers mannequin commented Oct 11, 2004

    Logged In: YES
    user_id=469548

    common_types is for adding some non-standard types, not for
    determining which extension is most reasonable. I'll be
    happy to look at a decent patch, but I'm moving this to
    feature request until then.

    @josiahcarlson
    Copy link
    Mannequin

    josiahcarlson mannequin commented Dec 19, 2004

    Logged In: YES
    user_id=341410

    While I agree with the original poster that returning '.txt'
    is preferable to the others in the list returned by
    mimetypes.guess_all_extensions() at least 9 times out of 10,
    being able to prioritize all of the types is not necessarily
    the easiest thing to do for all of the possible returned lists.

    Is using a custom comparison function along with the list
    returned by guess_all_extensions() sufficient?

    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Feb 14, 2009

    Confirmed on trunk.

    @devdanzin devdanzin mannequin added easy labels Apr 22, 2009
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Aug 19, 2010

    I'll close this in a couple of weeks unless someone wants it kept open.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 20, 2010

    I think you are closing too aggressively.

    Python 3.2a0 (py3k:81783, Jun  6 2010, 16:07:26) 
    [GCC 4.1.3 20080623 (prerelease) (Ubuntu 4.1.2-23ubuntu3)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import mimetypes
    >>> mimetypes.guess_extension('text/plain')
    '.ksh'
    >>>

    @lambacck
    Copy link
    Mannequin

    lambacck mannequin commented Nov 20, 2010

    While I agree that getting .ksh is an unfortunate guess, I am not sure how you can guess in the face of many options (especially when the those options are parsed out of a mimetypes file or the windows registry).

    Perhaps there should be a "resonable_defaults" map that is checked first for very basic types where there are multiple extensions for a type?

    @ptarjan
    Copy link
    Mannequin

    ptarjan mannequin commented Nov 21, 2010

    6 years old and still not fixed?

    http://www.stdicon.com/mimetype/text/plain

    Please return txt

    @elesbom
    Copy link
    Mannequin

    elesbom mannequin commented Nov 21, 2010

    ksh is a text/plain to, all this extension are text/plain:
    '.ksh', '.pl', '.bat', '.h', '.c', '.txt', '.asc', '.text', '.pot', '.brf'.
    The problem is: the code return the first of list:
    return extensions[0]
    So, I add one boolean parameter in method guess_extension called all_exts. Putting True in this parameter the method returns a tuple with all possible extensions.

    I hope helped

    @lambacck
    Copy link
    Mannequin

    lambacck mannequin commented Nov 21, 2010

    Rafael,

    There is already a method which returns all the extensions. What is required is a flag (or separate dict) which provides a canonical extension. The questions is whether it is sufficient to rely on the default provided mimetypes for the default in the face of mimetypes read out of the mimetypes files or windows registry.

    I don't see a way to fix the bug, without also providing an API to "pick the winner" for those cases that are not provided in the default list.

    @merwok
    Copy link
    Member

    merwok commented Jul 13, 2011

    The proposed patch does not solve the issue. In the current API, there is no way to do it, so this bug requires a new feature. I think it would involve a new dict, like preferred_extensions, which would be seeded with default values, like .jpg for image/jpeg and .txt for text/plain, and a few functions/methods to query the dict or add items.

    @merwok merwok removed the easy label Jul 13, 2011
    @merwok merwok changed the title mimetypes.guess_extension('text/plain') == '.ksh' ??? Add preferred extensions for MIME types Jul 13, 2011
    @merwok merwok removed the easy label Jul 13, 2011
    @merwok merwok changed the title mimetypes.guess_extension('text/plain') == '.ksh' ??? Add preferred extensions for MIME types Jul 13, 2011
    @leos
    Copy link
    Mannequin

    leos mannequin commented Sep 7, 2011

    I'm running into a similar issue with this function. My bug is that get_type('foo.png') returns image/x-png. This occurs on windows because there are mappings to both image/png and image/x-png in the registry (as there should be, since that key is actually a reverse mapping) and the code simply picks the first key that it enumerates over. This issue strikes in both directions.

    Chris and others bring up a valid issue: how to decide what the winning result is?

    I think the answer is pretty clear - you use the common_types mapping already in the file and expand it as appropriate. If the mimetype can't be found, only then do you go to the windows registry. The behavior on Linux is even stranger to me (now we'll dig through an arbitrary list of files that might contain MIME info or may have completely irrelevant data) but it's a pragmatic solution.

    If someone needs to customize what guess_type returns, they can simply wrap the guess_type function in their own code or monkey patch if they don't have access to the source they're running. Changing such a mime type is a really advanced and unusual operation. If that's unacceptable, the code can provide a hook for an 'apache MIME config' file on windows in a standard place (either pythonpath, or %system% or wherever) that it will check before going to common_types or to the registry.

    Making this change doesn't require changing the API at all, just the implementation changes.

    @davidlindquist
    Copy link
    Mannequin

    davidlindquist mannequin commented Mar 1, 2014

    I don't think it is unreasonable to return a well-known extension for certain mime types, text/plain being the most obvious (and most in need of repair; .ksh??).

    I've attached a patch based on the previous discussion.

    @wichert
    Copy link
    Mannequin

    wichert mannequin commented Mar 27, 2014

    @davidlindquist
    Copy link
    Mannequin

    davidlindquist mannequin commented Apr 4, 2014

    Anyone interested in picking this up, or at least commenting on the approach I suggested in the patch? Seems like an easy fix for a long-standing bug.

    @vadmium
    Copy link
    Member

    vadmium commented Sep 6, 2014

    See also <https://bugs.python.org/issue6626#msg91205\>, which mentions using a list of tuples instead of a dictionary, which sounds like it might help with this issue. Doing it that way you might be able avoid some duplication in the lists.

    @tomchristie
    Copy link
    Mannequin

    tomchristie mannequin commented Sep 20, 2016

    Confirming that I've also bumped into this for Python 3.5.

    A docs update would seem to be the lowest-cost option to start with.

    Right now mimetypes.guess_extension() isn't terribly useful, and it'd be better to at least know that upfront.

    @The-Compiler
    Copy link
    Mannequin

    The-Compiler mannequin commented Jan 4, 2021

    I think this has been fixed in Python 3.7+ via #14375 - at least for a couple of types.

    Comparing Python 3.6 with the current state, the following changed (which can be used as an "override" dict before calling mimetypes.guess_extension):

    "application/manifest+json": ".webmanifest",  # not None
    "application/octet-stream": ".bin",  # not .a
    "application/postscript": ".ps",  # not .ai
    "application/vnd.ms-excel": ".xls",  # not .xlb
    "application/vnd.ms-powerpoint": ".ppt",  # not .pot
    "application/wasm": ".wasm",  # not None
    "application/x-hdf5": ".h5",  # not None
    "application/xml": ".xsl",  # not .rdf
    "audio/mpeg": ".mp3",  # not .mp2
    "image/jpeg": ".jpg",  # not .jpe
    "image/tiff": ".tiff",  # not .tif
    "text/html": ".html",  # not .htm
    "text/plain": ".txt",  # not .bat
    "video/mpeg": ".mpeg",  # not .m1v
    

    @iritkatriel
    Copy link
    Member

    PR14375 indeed adds a test for this as well (test_preferred_extension).

    @iritkatriel iritkatriel added the 3.8 only security fixes label Jan 18, 2022
    @iritkatriel iritkatriel added the 3.8 only security fixes label Jan 18, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants