Title: imghdr doesn't recognize variant jpeg formats
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.5
Assigned To: Nosy List: Claudiu.Popa, ezio.melotti, intgr, jcea, joril, kovid, mvignali, r.david.murray, vstinner
Priority: normal Keywords: patch

Created on 2012-11-20 10:21 by joril, last changed 2022-04-11 14:57 by admin.

peanuts15.jpg joril, 2012-11-20 10:21 JPEG including an ICC profile
imghdr_icc_jpeg.patch joril, 2012-11-20 17:17 patch against hg head review
PR 8322 open gov_vj, 2018-07-18 12:18
PR 14862 open pchopin, 2019-07-19 14:24
msg175984 - (view) Author: Joril (joril) Date: 2012-11-20 10:21
imghdr doesn't support jpegs that include an ICC Profile.
This is because imghdr looks for "JFIF" somewhere at the beginning of the file, but the ICC_PROFILE shifts that further.
(The ICC spec is here, annex B)
msg175985 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-20 11:15
Can you provide a patch?
msg175986 - (view) Author: Joril (joril) Date: 2012-11-20 11:16
I can try, yes. I'll add one ASAP
msg176009 - (view) Author: Joril (joril) Date: 2012-11-20 17:17
Here it is... It is against the latest hg version, should I write one for 2.7 too?
msg176010 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-20 17:31
Thanks for the patch.

> should I write one for 2.7 too?

Not necessary, 2.7 only gets bugs fixes.
OTOH it would be nice to have some tests for this new features (and for the module in general), but there doesn't seem to be any Lib/test/ file.  The module itself seems to contain some kind of tests at the end though.
msg176045 - (view) Author: Joril (joril) Date: 2012-11-21 08:19
It looks like the test just walks a directory recursively while trying to identify its files, there's no "classic" test of the "this is a JPEG, is it detected correctly"-type
msg184742 - (view) Author: Kovid Goyal (kovid) Date: 2013-03-20 06:23
The attached patch is insufficient, for example, it fails on

Note that the linux file utility identifies a files as "JPEG Image data" if the first two bytes of the file are \xff\xd8.

A slightly stricter test that catches more jpeg files:

def test_jpeg(h, f):
    if (h[6:10] in (b'JFIF', b'Exif')) or (h[:2] == b'\xff\xd8' and b'JFIF' in h[:32]):
        return 'jpeg'
msg198034 - (view) Author: (intgr) * Date: 2013-09-18 20:30
I vote we forget about JFIF/Exif headers and only use \xff\xd8 to identify the file. They are optional and there are tons of files out in the wild without such headers, for example:

Proposed patch at
msg220345 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-06-12 12:54
Using \xff\xd8 sounds good to me.
msg220346 - (view) Author: Kovid Goyal (kovid) Date: 2014-06-12 13:09
FYI, the test I currently use in calibre, which has not failed so far for millions of users:

def test_jpeg(h, f):    
    if (h[6:10] in (b'JFIF', b'Exif')) or (h[:2] == b'\xff\xd8' and (b'JFIF' in h[:32] or b'8BIM' in h[:32])):
        return 'jpeg'
msg220409 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-13 01:11
Issue 21230 reports a parallel problem with recognizing photoshop images.

We need a patch with tests covering the variant types we know about.  I don't have a strong opinion on the simple two byte test versus the more complex test in msg220346, but following 'file' makes sense to me.
msg221185 - (view) Author: Martin Vignali (mvignali) Date: 2014-06-21 18:44
I'm okay with just testing the first two bytes, it's the method we currently use for our
internal tools.

But maybe it can be interesting, to add another test, in order to detect incomplete file
(created when a camera make a recording error for example, and very useful to detect, because an incomplete jpeg file, make a crash for a lot of application)

We use this patch of imghdr :

def test_jpeg(h, f):
    """JPEG data in JFIF or Exif format"""
    if not h.startswith(b'\xff\xd8'):#Test empty files, and incorrect start of file
        return None
        if f:#if we test a file, test end of jpeg
                return 'jpeg'
        else:#if we just test the header, consider this is a valid jpeg and not test end of file
            return 'jpeg'
msg221214 - (view) Author: Kovid Goyal (kovid) Date: 2014-06-22 03:39
You cannot assume the file like object passed to imghdr is seekable. And IMO it is not the job of imghdr to check file validity, especially since it does not do that for all formats.
